sklearn.metrics.v_measure_score(labels_true, labels_pred) [source] V-measure cluster labeling given a ground truth. This score is identical to normalized_mutual_info_score. The V-measure is the harmonic mean between homogeneity and completeness: v = 2 * (homogeneity * completeness) / (homogeneity + completeness) This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won?t change the score value in any way. This metric is furtherm

class sklearn.decomposition.ProjectedGradientNMF(*args, **kwargs) [source] Non-Negative Matrix Factorization (NMF) Find two non-negative matrices (W, H) whose product approximates the non- negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction. The objective function is: 0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alph

feature_selection.RFECV()

class sklearn.feature_selection.RFECV(estimator, step=1, cv=None, scoring=None, verbose=0, n_jobs=1) [source] Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. Read more in the User Guide. Parameters: estimator : object A supervised learning estimator with a fit method that updates a coef_ attribute that holds the fitted parameters. Important features must correspond to high absolute values in the coef_ array. For instance, th

gaussian_process.kernels.CompoundKernel()

class sklearn.gaussian_process.kernels.CompoundKernel(kernels) [source] Kernel which is composed of a set of other kernels. New in version 0.18. Methods clone_with_theta(theta) Returns a clone of self with given hyperparameters theta. diag(X) Returns the diagonal of the kernel k(X, X). get_params([deep]) Get parameters of this kernel. is_stationary() Returns whether the kernel is stationary. set_params(\*\*params) Set the parameters of this kernel. __init__(kernels) [source] boun

sklearn.neighbors.radius_neighbors_graph()

sklearn.neighbors.radius_neighbors_graph(X, radius, mode='connectivity', metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=1) [source] Computes the (weighted) graph of Neighbors for points in X Neighborhoods are restricted the points at a distance lower than radius. Read more in the User Guide. Parameters: X : array-like or BallTree, shape = [n_samples, n_features] Sample data, in the form of a numpy array or a precomputed BallTree. radius : float Radius of neighbo

model_selection.TimeSeriesSplit()

class sklearn.model_selection.TimeSeriesSplit(n_splits=3) [source] Time Series cross-validator Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate. This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set. Note that unlike st

cross_decomposition.PLSCanonical()

class sklearn.cross_decomposition.PLSCanonical(n_components=2, scale=True, algorithm='nipals', max_iter=500, tol=1e-06, copy=True) [source] PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, referred as PLS-C2A in [Wegelin 2000]. This class inherits from PLS with mode=?A? and deflation_mode=?canonical?, norm_y_weights=True and algorithm=?nipals?, but svd should provide similar results up to numerical errors. Read more in the User Guide.

Adjustment for chance in clustering performance evaluation

The following plots demonstrate the impact of the number of clusters and number of samples on various clustering performance evaluation metrics. Non-adjusted measures such as the V-Measure show a dependency between the number of clusters and the number of samples: the mean V-Measure of random labeling increases significantly as the number of clusters is closer to the total number of samples used to compute the measure. Adjusted for chance measure such as ARI display some random variations cent

linear_model.RidgeClassifier()

class sklearn.linear_model.RidgeClassifier(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, class_weight=None, solver='auto', random_state=None) [source] Classifier using Ridge regression. Read more in the User Guide. Parameters: alpha : float Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha correspo

Hierarchical clustering

Example builds a swiss roll dataset and runs hierarchical clustering on their position. For more information, see Hierarchical clustering. In a first step, the hierarchical clustering is performed without connectivity constraints on the structure and is solely based on distance, whereas in a second step the clustering is restricted to the k-Nearest Neighbors graph: it?s a hierarchical clustering with structure prior. Some of the clusters learned without connectivity constraints do not respect