Gaussian Mixture Model Selection

This example shows that model selection can be performed with Gaussian Mixture Models using information-theoretic criteria (BIC). Model selection concerns both the covariance type and the number of components in the model. In that case, AIC also provides the right result (not shown to save time), but BIC is better suited if the problem is to identify the right model. Unlike Bayesian procedures, such inferences are prior-free. In that case, the model with 2 components and full covariance (which

Gaussian Mixture Model Ellipsoids

Plot the confidence ellipsoids of a mixture of two Gaussians obtained with Expectation Maximisation (GaussianMixture class) and Variational Inference (BayesianGaussianMixture class models with a Dirichlet process prior). Both models have access to five components with which to fit the data. Note that the Expectation Maximisation model will necessarily use all five components while the Variational Inference model will effectively only use as many as are needed for a good fit. Here we can see th

feature_selection.VarianceThreshold()

class sklearn.feature_selection.VarianceThreshold(threshold=0.0) [source] Feature selector that removes all low-variance features. This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning. Read more in the User Guide. Parameters: threshold : float, optional Features with a training-set variance lower than this threshold will be removed. The default is to keep all features with non-zero variance, i.e. remov

feature_selection.SelectPercentile()

class sklearn.feature_selection.SelectPercentile(score_func=, percentile=10) [source] Select features according to a percentile of the highest scores. Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below ?See also?). The default function only works with classification tasks. percentile : int, optional, default=10 Percent of feature

feature_selection.SelectKBest()

class sklearn.feature_selection.SelectKBest(score_func=, k=10) [source] Select features according to the k highest scores. Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below ?See also?). The default function only works with classification tasks. k : int or ?all?, optional, default=10 Number of top features to select. The ?all? op

feature_selection.SelectFwe()

class sklearn.feature_selection.SelectFwe(score_func=, alpha=0.05) [source] Filter: Select the p-values corresponding to Family-wise error rate Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below ?See also?). The default function only works with classification tasks. alpha : float, optional The highest uncorrected p-value for features to keep. Attributes: s

feature_selection.SelectFromModel()

class sklearn.feature_selection.SelectFromModel(estimator, threshold=None, prefit=False) [source] Meta-transformer for selecting features based on importance weights. New in version 0.17. Parameters: estimator : object The base estimator from which the transformer is built. This can be both a fitted (if prefit is set to True) or a non-fitted estimator. threshold : string, float, optional default None The threshold value to use for feature selection. Features whose importance is greate

feature_selection.SelectFpr()

class sklearn.feature_selection.SelectFpr(score_func=, alpha=0.05) [source] Filter: Select the pvalues below alpha based on a FPR test. FPR test stands for False Positive Rate test. It controls the total amount of false detections. Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below ?See also?). The default function only works with classification tasks. alpha :

feature_selection.SelectFdr()

class sklearn.feature_selection.SelectFdr(score_func=, alpha=0.05) [source] Filter: Select the p-values for an estimated false discovery rate This uses the Benjamini-Hochberg procedure. alpha is an upper bound on the expected false discovery rate. Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below ?See also?). The default function only works with classification

feature_selection.RFECV()

class sklearn.feature_selection.RFECV(estimator, step=1, cv=None, scoring=None, verbose=0, n_jobs=1) [source] Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. Read more in the User Guide. Parameters: estimator : object A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array. For instance, th