Compare cross decomposition methods

Simple usage of various cross decomposition algorithms: - PLSCanonical - PLSRegression, with multivariate response, a.k.a. PLS2 - PLSRegression, with univariate response, a.k.a. PLS1 - CCA Given 2 multivariate covarying two-dimensional datasets, X, and Y, PLS extracts the ?directions of covariance?, i.e. the components of each datasets that explain the most shared variance between both datasets. This is apparent on the scatterplot matrix display: components 1 in dataset X and dataset Y are max

sklearn.datasets.fetch_20newsgroups()

sklearn.datasets.fetch_20newsgroups(data_home=None, subset='train', categories=None, shuffle=True, random_state=42, remove=(), download_if_missing=True) [source] Load the filenames and data from the 20 newsgroups dataset. Read more in the User Guide. Parameters: subset : ?train? or ?test?, ?all?, optional Select the dataset to load: ?train? for the training set, ?test? for the test set, ?all? for both, with shuffled ordering. data_home : optional, default: None Specify a download and ca

Confusion matrix

Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions. The figures show the confusion matrix with and without normalization by class support size (numb

neural_network.BernoulliRBM()

class sklearn.neural_network.BernoulliRBM(n_components=256, learning_rate=0.1, batch_size=10, n_iter=10, verbose=0, random_state=None) [source] Bernoulli Restricted Boltzmann Machine (RBM). A Restricted Boltzmann Machine with binary visible units and binary hidden units. Parameters are estimated using Stochastic Maximum Likelihood (SML), also known as Persistent Contrastive Divergence (PCD) [2]. The time complexity of this implementation is O(d ** 2) assuming d ~ n_features ~ n_components.

preprocessing.PolynomialFeatures()

class sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True) [source] Generate polynomial and interaction features. Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2]. Parameters: degree : integer The degree of the polynomial

gaussian_process.kernels.WhiteKernel()

class sklearn.gaussian_process.kernels.WhiteKernel(noise_level=1.0, noise_level_bounds=(1e-05, 100000.0)) [source] White kernel. The main use-case of this kernel is as part of a sum-kernel where it explains the noise-component of the signal. Tuning its parameter corresponds to estimating the noise-level. k(x_1, x_2) = noise_level if x_1 == x_2 else 0 New in version 0.18. Parameters: noise_level : float, default: 1.0 Parameter controlling the noise level noise_level_bounds : pair of flo

sklearn.metrics.v_measure_score()

sklearn.metrics.v_measure_score(labels_true, labels_pred) [source] V-measure cluster labeling given a ground truth. This score is identical to normalized_mutual_info_score. The V-measure is the harmonic mean between homogeneity and completeness: v = 2 * (homogeneity * completeness) / (homogeneity + completeness) This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won?t change the score value in any way. This metric is furtherm

feature_selection.RFECV()

class sklearn.feature_selection.RFECV(estimator, step=1, cv=None, scoring=None, verbose=0, n_jobs=1) [source] Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. Read more in the User Guide. Parameters: estimator : object A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array. For instance, th

Scalability of Approximate Nearest Neighbors

This example studies the scalability profile of approximate 10-neighbors queries using the LSHForest with n_estimators=20 and n_candidates=200 when varying the number of samples in the dataset. The first plot demonstrates the relationship between query time and index size of LSHForest. Query time is compared with the brute force method in exact nearest neighbor search for the same index sizes. The brute force queries have a very predictable linear scalability with the index (full scan). LSHFor

tree.DecisionTreeClassifier()

class sklearn.tree.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_split=1e-07, class_weight=None, presort=False) [source] A decision tree classifier. Read more in the User Guide. Parameters: criterion : string, optional (default=?gini?) The function to measure the quality of a split. Supported criteria are ?gini? for the