Density Estimation for a Gaussian mixture

Plot the density estimation of a mixture of two Gaussians. Data is generated from two Gaussians with different centers and covariance matrices. import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import LogNorm from sklearn import mixture n_samples = 300 # generate random sample, two components np.random.seed(0) # generate spherical data centered on (20, 20) shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20]) # generate zero centered stretched Ga

sklearn.covariance.ledoit_wolf()

sklearn.covariance.ledoit_wolf(X, assume_centered=False, block_size=1000) [source] Estimates the shrunk Ledoit-Wolf covariance matrix. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Data from which to compute the covariance estimate assume_centered : boolean, default=False If True, data are not centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data are centered before

sklearn.preprocessing.normalize()

sklearn.preprocessing.normalize(X, norm='l2', axis=1, copy=True, return_norm=False) [source] Scale input vectors individually to unit norm (vector length). Read more in the User Guide. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] The data to normalize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy. norm : ?l1?, ?l2?, or ?max?, optional (?l2? by default) The norm to use to normalize each non zero sample (or e

model_selection.GroupShuffleSplit()

class sklearn.model_selection.GroupShuffleSplit(n_splits=5, test_size=0.2, train_size=None, random_state=None) [source] Shuffle-Group(s)-Out cross-validation iterator Provides randomized train/test indices to split data according to a third-party provided group. This group information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the groups could be the year of collection of the samples and thus allow for cross-validation against ti

sklearn.metrics.pairwise.kernel_metrics()

sklearn.metrics.pairwise.kernel_metrics() [source] Valid metrics for pairwise_kernels This function simply returns the valid pairwise distance metrics. It exists, however, to allow for a verbose description of the mapping for each of the valid strings. The valid distance metrics, and the function they map to, are: metric Function ?additive_chi2? sklearn.pairwise.additive_chi2_kernel ?chi2? sklearn.pairwise.chi2_kernel ?linear? sklearn.pairwise.linear_kernel ?poly? sklearn.pairwise.polynomi

sklearn.preprocessing.maxabs_scale()

sklearn.preprocessing.maxabs_scale(X, axis=0, copy=True) [source] Scale each feature to the [-1, 1] range without breaking the sparsity. This estimator scales each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. This scaler can also be applied to sparse CSR or CSC matrices. Parameters: axis : int (0 by default) axis used to scale along. If 0, independently scale each feature, otherwise (if 1) scale each sample. copy : boolean, op

A demo of the Spectral Biclustering algorithm

This example demonstrates how to generate a checkerboard dataset and bicluster it using the Spectral Biclustering algorithm. The data is generated with the make_checkerboard function, then shuffled and passed to the Spectral Biclustering algorithm. The rows and columns of the shuffled matrix are rearranged to show the biclusters found by the algorithm. The outer product of the row and column label vectors shows a representation of the checkerboard structure. Out: consensus score: 1.0

sklearn.metrics.precision_score()

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None) [source] Compute the precision The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0. Read more in the User Guide. Parameters: y_true : 1d array-like, or label

linear_model.PassiveAggressiveClassifier()

class sklearn.linear_model.PassiveAggressiveClassifier(C=1.0, fit_intercept=True, n_iter=5, shuffle=True, verbose=0, loss='hinge', n_jobs=1, random_state=None, warm_start=False, class_weight=None) [source] Passive Aggressive Classifier Read more in the User Guide. Parameters: C : float Maximum step size (regularization). Defaults to 1.0. fit_intercept : bool, default=False Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. n_iter : i

3.5. Validation curves

Every estimator has its advantages and drawbacks. Its generalization error can be decomposed in terms of bias, variance and noise. The bias of an estimator is its average error for different training sets. The variance of an estimator indicates how sensitive it is to varying training sets. Noise is a property of the data. In the following plot, we see a function and some noisy samples from that function. We use three different estimators to fit the function: linear regression with polynomial