decomposition.DictionaryLearning()

class sklearn.decomposition.DictionaryLearning(n_components=None, alpha=1, max_iter=1000, tol=1e-08, fit_algorithm='lars', transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, n_jobs=1, code_init=None, dict_init=None, verbose=False, split_sign=False, random_state=None) [source] Dictionary learning Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code. Solves the optimization problem: (U^*,V^*) = argmin 0.5 || Y - U V ||_2

sklearn.metrics.adjusted_rand_score()

sklearn.metrics.adjusted_rand_score(labels_true, labels_pred) [source] Rand index adjusted for chance. The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings. The raw RI score is then ?adjusted for chance? into the ARI score using the following scheme: ARI = (RI - Expected_RI) / (max(RI) - Expected_RI) The adjusted Rand index is thus

Illustration of prior and posterior Gaussian process for different kernels

This example illustrates the prior and posterior of a GPR with different kernels. Mean, standard deviation, and 10 samples are shown for both prior and posterior. print(__doc__) # Authors: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de> # # License: BSD 3 clause import numpy as np from matplotlib import pyplot as plt from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,

3.3. Model evaluation

There are 3 different approaches to evaluate the quality of predictions of a model: Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they are designed to solve. This is not discussed on this page, but in each estimator?s documentation. Scoring parameter: Model-evaluation tools using cross-validation (such as model_selection.cross_val_score and model_selection.GridSearchCV) rely on an internal scoring strategy. This is discussed i

ensemble.RandomForestClassifier()

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None) [source] A random forest classifier. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of th

Gaussian Mixture Model Selection

This example shows that model selection can be performed with Gaussian Mixture Models using information-theoretic criteria (BIC). Model selection concerns both the covariance type and the number of components in the model. In that case, AIC also provides the right result (not shown to save time), but BIC is better suited if the problem is to identify the right model. Unlike Bayesian procedures, such inferences are prior-free. In that case, the model with 2 components and full covariance (which

1.16. Probability calibration

When performing classification you often want not only to predict the class label, but also obtain a probability of the respective label. This probability gives you some kind of confidence on the prediction. Some models can give you poor estimates of the class probabilities and some even do not support probability prediction. The calibration module allows you to better calibrate the probabilities of a given model, or to add support for probability prediction. Well calibrated classifiers are pr

preprocessing.OneHotEncoder()

class sklearn.preprocessing.OneHotEncoder(n_values='auto', categorical_features='all', dtype=, sparse=True, handle_unknown='error') [source] Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature. It is assumed that input features take on values in

gaussian_process.GaussianProcessClassifier()

class sklearn.gaussian_process.GaussianProcessClassifier(kernel=None, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, max_iter_predict=100, warm_start=False, copy_X_train=True, random_state=None, multi_class='one_vs_rest', n_jobs=1) [source] Gaussian process classification (GPC) based on Laplace approximation. The implementation is based on Algorithm 3.1, 3.2, and 5.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams. Internally, the Laplace approximation is use

ensemble.BaggingRegressor()

class sklearn.ensemble.BaggingRegressor(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0) [source] A Bagging regressor. A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. S