sklearn.cross_validation.permutation_test_score()

Warning DEPRECATED sklearn.cross_validation.permutation_test_score(estimator, X, y, cv=None, n_permutations=100, n_jobs=1, labels=None, random_state=0, verbose=0, scoring=None) [source] Evaluate the significance of a cross-validated score with permutations Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.permutation_test_score instead. Read more in the User Guide. Parameters: estimator : estimator object implementing ?fit? The object to u

neighbors.DistanceMetric

class sklearn.neighbors.DistanceMetric DistanceMetric class This class provides a uniform interface to fast distance metric functions. The various metrics can be accessed via the get_metric class method and the metric string identifier (see below). For example, to use the Euclidean distance: >>> dist = DistanceMetric.get_metric('euclidean') >>> X = [[0, 1, 2], [3, 4, 5]]) >>> dist.pairwise(X) array([[ 0. , 5.19615242], [ 5.19615242, 0.

sklearn.model_selection.train_test_split()

sklearn.model_selection.train_test_split(*arrays, **options) [source] Split arrays or matrices into random train and test subsets Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner. Read more in the User Guide. Parameters: *arrays : sequence of indexables with same length / shape[0] Allowed inputs are lists, numpy arrays, scipy-sparse matrices or panda

neighbors.RadiusNeighborsClassifier()

class sklearn.neighbors.RadiusNeighborsClassifier(radius=1.0, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', outlier_label=None, metric_params=None, **kwargs) [source] Classifier implementing a vote among neighbors within a given radius Read more in the User Guide. Parameters: radius : float, optional (default = 1.0) Range of parameter space to use by default for :meth`radius_neighbors` queries. weights : str or callable weight function used in prediction. P

Color Quantization using K-Means

Performs a pixel-wise Vector Quantization (VQ) of an image of the summer palace (China), reducing the number of colors required to show the image from 96,615 unique colors to 64, while preserving the overall appearance quality. In this example, pixels are represented in a 3D-space and K-means is used to find 64 color clusters. In the image processing literature, the codebook obtained from K-means (the cluster centers) is called the color palette. Using a single byte, up to 256 colors can be ad

preprocessing.Normalizer()

class sklearn.preprocessing.Normalizer(norm='l2', copy=True) [source] Normalize samples individually to unit norm. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one. This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion). Scaling inputs to unit norms is a common operation for

2.1. Gaussian mixture models

sklearn.mixture is a package which enables one to learn Gaussian Mixture Models (diagonal, spherical, tied and full covariance matrices supported), sample them, and estimate them from data. Facilities to help determine the appropriate number of components are also provided. Two-component Gaussian mixture model: data points, and equi-probability surfaces of the model. A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite

1.9. Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes? theorem with the ?naive? assumption of independence between every pair of features. Given a class variable and a dependent feature vector through , Bayes? theorem states the following relationship: Using the naive independence assumption that for all , this relationship is simplified to Since is constant given the input, we can use the following classification rule: and we can use Maximum A

ensemble.AdaBoostRegressor()

class sklearn.ensemble.AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss='linear', random_state=None) [source] An AdaBoost regressor. An AdaBoost [1] regressor is a meta-estimator that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. As such, subsequent regressors focus more on difficult case

decomposition.SparseCoder()

class sklearn.decomposition.SparseCoder(dictionary, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, split_sign=False, n_jobs=1) [source] Sparse coding Finds a sparse representation of data against a fixed, precomputed dictionary. Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that: X ~= code * dictionary Read more in the User Guide. Parameters: dictionary : array, [n_components, n_features]