model_selection.GroupKFold()

class sklearn.model_selection.GroupKFold(n_splits=3) [source] K-fold iterator variant with non-overlapping groups. The same group will not appear in two different folds (the number of distinct groups has to be at least equal to the number of folds). The folds are approximately balanced in the sense that the number of distinct groups is approximately the same in each fold. Parameters: n_splits : int, default=3 Number of folds. Must be at least 2. See also LeaveOneGroupOut For splitti

model_selection.GridSearchCV()

class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True) [source] Exhaustive search over specified parameter values for an estimator. Important members are fit, predict. GridSearchCV implements a ?fit? and a ?score? method. It also implements ?predict?, ?predict_proba?, ?decision_function?, ?transform? and ?inverse_transform? if t

Model selection with Probabilistic PCA and Factor Analysis

Probabilistic PCA and Factor Analysis are probabilistic models. The consequence is that the likelihood of new data can be used for model selection and covariance estimation. Here we compare PCA and FA with cross-validation on low rank data corrupted with homoscedastic noise (noise variance is the same for each feature) or heteroscedastic noise (noise variance is the different for each feature). In a second step we compare the model likelihood to the likelihoods obtained from shrinkage covarian

Model selection

Score, and cross-validated scores As we have seen, every estimator exposes a score method that can judge the quality of the fit (or the prediction) on new data. Bigger is better. >>> from sklearn import datasets, svm >>> digits = datasets.load_digits() >>> X_digits = digits.data >>> y_digits = digits.target >>> svc = svm.SVC(C=1, kernel='linear') >>> svc.fit(X_digits[:-100], y_digits[:-100]).score(X_digits[-100:], y_digits[-100:]) 0.979999

Model Complexity Influence

Demonstrate how model complexity influences both prediction accuracy and computational performance. The dataset is the Boston Housing dataset (resp. 20 Newsgroups) for regression (resp. classification). For each class of models we make the model complexity vary through the choice of relevant model parameters and measure the influence on both computational performance (latency) and predictive power (MSE or Hamming Loss). print(__doc__) # Author: Eustache Diemert <eustache@diemert.fr> # L

mixture.VBGMM()

Warning DEPRECATED class sklearn.mixture.VBGMM(*args, **kwargs) [source] Variational Inference for the Gaussian Mixture Model Deprecated since version 0.18: This class will be removed in 0.20. Use sklearn.mixture.BayesianGaussianMixture with parameter weight_concentration_prior_type='dirichlet_distribution' instead. Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over

mixture.GMM()

Warning DEPRECATED class sklearn.mixture.GMM(*args, **kwargs) [source] Legacy Gaussian Mixture Model Deprecated since version 0.18: This class will be removed in 0.20. Use sklearn.mixture.GaussianMixture instead. Methods aic(X) Akaike information criterion for the current model fit and the proposed data. bic(X) Bayesian information criterion for the current model fit and the proposed data. fit(X[, y]) Estimate model parameters with the EM algorithm. fit_predict(X[, y]) Fit and then

mixture.GaussianMixture()

class sklearn.mixture.GaussianMixture(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10) [source] Gaussian Mixture. Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution. New in version 0.18. GaussianMixture.

mixture.DPGMM()

Warning DEPRECATED class sklearn.mixture.DPGMM(*args, **kwargs) [source] Dirichlet Process Gaussian Mixture Models Deprecated since version 0.18: This class will be removed in 0.20. Use sklearn.mixture.BayesianGaussianMixture with parameter weight_concentration_prior_type='dirichlet_process' instead. Methods aic(X) Akaike information criterion for the current model fit and the proposed data. bic(X) Bayesian information criterion for the current model fit and the proposed data. fit(X[

mixture.BayesianGaussianMixture()

class sklearn.mixture.BayesianGaussianMixture(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=None, mean_precision_prior=None, mean_prior=None, degrees_of_freedom_prior=None, covariance_prior=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10) [source] Variational Bayesian estimation of a Gaussian mixture. This class allows