calibration.CalibratedClassifierCV()

class sklearn.calibration.CalibratedClassifierCV(base_estimator=None, method='sigmoid', cv=3) [source] Probability calibration with isotonic regression or sigmoid. With this class, the base_estimator is fit on the train set of the cross-validation generator and the test set is used for calibration. The probabilities for each of the folds are then averaged for prediction. In case that cv=?prefit? is passed to __init__, it is assumed that base_estimator has been fitted already and all data is

Gaussian Mixture Model Sine Curve

This example demonstrates the behavior of Gaussian mixture models fit on data that was not sampled from a mixture of Gaussian random variables. The dataset is formed by 100 points loosely spaced following a noisy sine curve. There is therefore no ground truth value for the number of Gaussian components. The first model is a classical Gaussian Mixture Model with 10 components fit with the Expectation-Maximization algorithm. The second model is a Bayesian Gaussian Mixture Model with a Dirichlet

sklearn.model_selection.train_test_split()

sklearn.model_selection.train_test_split(*arrays, **options) [source] Split arrays or matrices into random train and test subsets Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner. Read more in the User Guide. Parameters: *arrays : sequence of indexables with same length / shape[0] Allowed inputs are lists, numpy arrays, scipy-sparse matrices or panda

neighbors.RadiusNeighborsClassifier()

class sklearn.neighbors.RadiusNeighborsClassifier(radius=1.0, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', outlier_label=None, metric_params=None, **kwargs) [source] Classifier implementing a vote among neighbors within a given radius Read more in the User Guide. Parameters: radius : float, optional (default = 1.0) Range of parameter space to use by default for :meth`radius_neighbors` queries. weights : str or callable weight function used in prediction. P

Color Quantization using K-Means

Performs a pixel-wise Vector Quantization (VQ) of an image of the summer palace (China), reducing the number of colors required to show the image from 96,615 unique colors to 64, while preserving the overall appearance quality. In this example, pixels are represented in a 3D-space and K-means is used to find 64 color clusters. In the image processing literature, the codebook obtained from K-means (the cluster centers) is called the color palette. Using a single byte, up to 256 colors can be ad

preprocessing.Normalizer()

class sklearn.preprocessing.Normalizer(norm='l2', copy=True) [source] Normalize samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one. This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion). Scaling inputs to unit norms is a common operation for

2.1. Gaussian mixture models

sklearn.mixture is a package which enables one to learn Gaussian Mixture Models (diagonal, spherical, tied and full covariance matrices supported), sample them, and estimate them from data. Facilities to help determine the appropriate number of components are also provided. Two-component Gaussian mixture model: data points, and equi-probability surfaces of the model. A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite

1.9. Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes? theorem with the ?naive? assumption of independence between every pair of features. Given a class variable and a dependent feature vector through , Bayes? theorem states the following relationship: Using the naive independence assumption that for all , this relationship is simplified to Since is constant given the input, we can use the following classification rule: and we can use Maximum A

ensemble.AdaBoostRegressor()

class sklearn.ensemble.AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None) [source] An AdaBoost regressor. An AdaBoost [1] regressor is a meta-estimator that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. As such, subsequent regressors focus more on difficult case

decomposition.SparseCoder()

class sklearn.decomposition.SparseCoder(dictionary, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, split_sign=False, n_jobs=1) [source] Sparse coding Finds a sparse representation of data against a fixed, precomputed dictionary. Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that: X ~= code * dictionary Read more in the User Guide. Parameters: dictionary : array, [n_components, n_features]