Density Estimation for a Gaussian mixture

Plot the density estimation of a mixture of two Gaussians. Data is generated from two Gaussians with different centers and covariance matrices. import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import LogNorm from sklearn import mixture n_samples = 300 # generate random sample, two components np.random.seed(0) # generate spherical data centered on (20, 20) shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20]) # generate zero centered stretched Ga

sklearn.covariance.ledoit_wolf(X, assume_centered=False, block_size=1000) [source] Estimates the shrunk Ledoit-Wolf covariance matrix. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Data from which to compute the covariance estimate assume_centered : boolean, default=False If True, data are not centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data are centered before

sklearn.datasets.make_classification()

sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None) [source] Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of a 2 * class_sep-sided hypercube, and assigns an equal number of clusters to each cla

Prediction Intervals for Gradient Boosting Regression

This example shows how quantile regression can be used to create prediction intervals. import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import GradientBoostingRegressor np.random.seed(1) def f(x): """The function to predict.""" return x * np.sin(x) #---------------------------------------------------------------------- # First the noiseless case X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T X = X.astype(np.float32) # Observations y = f(X).

sklearn.utils.resample()

sklearn.utils.resample(*arrays, **options) [source] Resample arrays or sparse matrices in a consistent way The default strategy implements one step of the bootstrapping procedure. Parameters: *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension. replace : boolean, True by default Implements resampling with replacement. If False, this will implement (sliced) random permutations.

sklearn.metrics.consensus_score()

sklearn.metrics.consensus_score(a, b, similarity='jaccard') [source] The similarity of two sets of biclusters. Similarity between individual biclusters is computed. Then the best matching between sets is found using the Hungarian algorithm. The final score is the sum of similarities divided by the size of the larger set. Read more in the User Guide. Parameters: a : (rows, columns) Tuple of row and column indicators for a set of biclusters. b : (rows, columns) Another set of biclusters l

sklearn.metrics.pairwise.sigmoid_kernel()

sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, gamma=None, coef0=1) [source] Compute the sigmoid kernel between X and Y: K(X, Y) = tanh(gamma <X, Y> + coef0) Read more in the User Guide. Parameters: X : ndarray of shape (n_samples_1, n_features) Y : ndarray of shape (n_samples_2, n_features) gamma : float, default None If None, defaults to 1.0 / n_samples_1 coef0 : int, default 1 Returns: Gram matrix : array of shape (n_samples_1, n_samples_2)

The Iris Dataset

This data sets consists of 3 different types of irises? (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width. The below plot uses the first two features. See here for more information on this dataset. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import matplotlib.pyplot as

scikit-learn Tutorials

An introduction to machine learning with scikit-learnMachine learning: the problem setting Loading an example dataset Learning and predicting Model persistence Conventions A tutorial on statistical-learning for scientific data processingStatistical learning: the setting and the estimator object in scikit-learn Supervised learning: predicting an output variable from high-dimensional observations Model selection: choosing estimators and their parameters Unsupervised learning: seeking repres

Demonstration of k-means assumptions

This example is meant to illustrate situations where k-means will produce unintuitive and possibly unexpected clusters. In the first three plots, the input data does not conform to some implicit assumption that k-means makes and undesirable clusters are produced as a result. In the last plot, k-means returns intuitive clusters despite unevenly sized blobs. print(__doc__) # Author: Phil Roth <mr.phil.roth@gmail.com> # License: BSD 3 clause import numpy as np import matplotlib.pyplot