Adjustment for chance in clustering performance evaluation

The following plots demonstrate the impact of the number of clusters and number of samples on various clustering performance evaluation metrics. Non-adjusted measures such as the V-Measure show a dependency between the number of clusters and the number of samples: the mean V-Measure of random labeling increases significantly as the number of clusters is closer to the total number of samples used to compute the measure. Adjusted for chance measure such as ARI display some random variations cent

neighbors.KDTree

class sklearn.neighbors.KDTree KDTree for fast generalized N-point problems KDTree(X, leaf_size=40, metric=?minkowski?, **kwargs) Parameters: X : array-like, shape = [n_samples, n_features] n_samples is the number of points in the data set, and n_features is the dimension of the parameter space. Note: if X is a C-contiguous array of doubles then data will not be copied. Otherwise, an internal copy will be made. leaf_size : positive integer (default = 40) Number of points at which to swi

feature_selection.RFE()

class sklearn.feature_selection.RFE(estimator, n_features_to_select=None, step=1, verbose=0) [source] Feature ranking with recursive feature elimination. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and weights are assigned to each one o

linear_model.LogisticRegressionCV()

class sklearn.linear_model.LogisticRegressionCV(Cs=10, fit_intercept=True, cv=None, dual=False, penalty='l2', scoring=None, solver='lbfgs', tol=0.0001, max_iter=100, class_weight=None, n_jobs=1, verbose=0, refit=True, intercept_scaling=1.0, multi_class='ovr', random_state=None) [source] Logistic Regression CV (aka logit, MaxEnt) classifier. This class implements logistic regression using liblinear, newton-cg, sag of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regul

preprocessing.MaxAbsScaler()

class sklearn.preprocessing.MaxAbsScaler(copy=True) [source] Scale each feature by its maximum absolute value. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity. This scaler can also be applied to sparse CSR or CSC matrices. New in version 0.17. Parameters: copy : boolean, optional, default is True Set to False to pe

feature_extraction.image.PatchExtractor()

class sklearn.feature_extraction.image.PatchExtractor(patch_size=None, max_patches=None, random_state=None) [source] Extracts patches from a collection of images Read more in the User Guide. Parameters: patch_size : tuple of ints (patch_height, patch_width) the dimensions of one patch max_patches : integer or float, optional default is None The maximum number of patches per image to extract. If max_patches is a float in (0, 1), it is taken to mean a proportion of the total number of pat

1.2. Linear and Quadratic Discriminant Analysis

Linear Discriminant Analysis (discriminant_analysis.LinearDiscriminantAnalysis) and Quadratic Discriminant Analysis (discriminant_analysis.QuadraticDiscriminantAnalysis) are two classic classifiers, with, as their names suggest, a linear and a quadratic decision surface, respectively. These classifiers are attractive because they have closed-form solutions that can be easily computed, are inherently multiclass, have proven to work well in practice and have no hyperparameters to tune. The plo

ensemble.ExtraTreesClassifier()

class sklearn.ensemble.ExtraTreesClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None) [source] An extra-trees classifier. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on var

sklearn.metrics.log_loss()

sklearn.metrics.log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None, labels=None) [source] Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier?s predictions. The log loss is only defined for two or more labels. For a single sample with true label yt in {0,1} and estimated

1.12. Multiclass and multilabel algorithms

Warning All classifiers in scikit-learn do multiclass classification out-of-the-box. You don?t need to use the sklearn.multiclass module unless you want to experiment with different multiclass strategies. The sklearn.multiclass module implements meta-estimators to solve multiclass and multilabel classification problems by decomposing such problems into binary classification problems. Multitarget regression is also supported. Multiclass classification means a classification task with more t