Concentration Prior Type Analysis of Variation Bayesian Gaussian Mixture

This example plots the ellipsoids obtained from a toy dataset (mixture of three Gaussians) fitted by the BayesianGaussianMixture class models with a Dirichlet distribution prior (weight_concentration_prior_type='dirichlet_distribution') and a Dirichlet process prior (weight_concentration_prior_type='dirichlet_process'). On each figure, we plot the results for three different values of the weight concentration prior. The BayesianGaussianMixture class can adapt its number of mixture componentsau

preprocessing.Binarizer()

class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True) [source] Binarize data (set feature values to 0 or 1) according to a threshold Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1. Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurrences f

Shrinkage covariance estimation

When working with covariance estimation, the usual approach is to use a maximum likelihood estimator, such as the sklearn.covariance.EmpiricalCovariance. It is unbiased, i.e. it converges to the true (population) covariance when given many observations. However, it can also be beneficial to regularize it, in order to reduce its variance; this, in turn, introduces some bias. This example illustrates the simple regularization used in Shrunk Covariance estimators. In particular, it focuses on how

sklearn.metrics.pairwise.additive_chi2_kernel()

sklearn.metrics.pairwise.additive_chi2_kernel(X, Y=None) [source] Computes the additive chi-squared kernel between observations in X and Y The chi-squared kernel is computed between each pair of rows in X and Y. X and Y have to be non-negative. This kernel is most commonly applied to histograms. The chi-squared kernel is given by: k(x, y) = -Sum [(x - y)^2 / (x + y)] It can be interpreted as a weighted difference per entry. Read more in the User Guide. Parameters: X : array-like of shape

decomposition.IncrementalPCA()

class sklearn.decomposition.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None) [source] Incremental principal components analysis (IPCA). Linear dimensionality reduction using Singular Value Decomposition of centered data, keeping only the most significant singular vectors to project the data to a lower dimensional space. Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA. This algorithm has constant memory complexit

1.3. Kernel ridge regression

Kernel ridge regression (KRR) [M2012] combines Ridge Regression (linear least squares with l2-norm regularization) with the kernel trick. It thus learns a linear function in the space induced by the respective kernel and the data. For non-linear kernels, this corresponds to a non-linear function in the original space. The form of the model learned by KernelRidge is identical to support vector regression (SVR). However, different loss functions are used: KRR uses squared error loss while suppor

sklearn.preprocessing.scale()

sklearn.preprocessing.scale(X, axis=0, with_mean=True, with_std=True, copy=True) [source] Standardize a dataset along any axis Center to the mean and component wise scale to unit variance. Read more in the User Guide. Parameters: X : {array-like, sparse matrix} The data to center and scale. axis : int (0 by default) axis used to compute the means and standard deviations along. If 0, independently standardize each feature, otherwise (if 1) standardize each sample. with_mean : boolean, T

sklearn.model_selection.learning_curve()

sklearn.model_selection.learning_curve(estimator, X, y, groups=None, train_sizes=array([ 0.1, 0.33, 0.55, 0.78, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', verbose=0) [source] Learning curve. Determines cross-validated training and test scores for different training set sizes. A cross-validation generator splits the whole dataset k times in training and test data. Subsets of the training set with varying sizes will be used to train the est

isotonic.IsotonicRegression()

class sklearn.isotonic.IsotonicRegression(y_min=None, y_max=None, increasing=True, out_of_bounds='nan') [source] Isotonic regression model. The isotonic regression optimization problem is defined by: min sum w_i (y[i] - y_[i]) ** 2 subject to y_[i] <= y_[j] whenever X[i] <= X[j] and min(y_) = y_min, max(y_) = y_max where: y[i] are inputs (real numbers) y_[i] are fitted X specifies the order. If X is non-decreasing then y_ is non-decreasing. w[i] are optional strictly positive w

naive_bayes.BernoulliNB()

class sklearn.naive_bayes.BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None) [source] Naive Bayes classifier for multivariate Bernoulli models. Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features. Read more in the User Guide. Parameters: alpha : float, optional (default=1.0) Additive (Laplace/Lidstone) smoothing parameter (0 for no