Test with permutations the significance of a classification score

In order to test if a classification score is significative a technique in repeating the classification procedure after randomizing, permuting, the labels. The p-value is then given by the percentage of runs for which the score obtained is greater than the classification score obtained in the first place. # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr> # License: BSD 3 clause print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.svm import SVC from

sklearn.svm.libsvm.fit()

sklearn.svm.libsvm.fit() Train the model using libsvm (low-level method) Parameters: X : array-like, dtype=float64, size=[n_samples, n_features] Y : array, dtype=float64, size=[n_samples] target vector svm_type : {0, 1, 2, 3, 4}, optional Type of SVM: C_SVC, NuSVC, OneClassSVM, EpsilonSVR or NuSVR respectively. 0 by default. kernel : {?linear?, ?rbf?, ?poly?, ?sigmoid?, ?precomputed?}, optional Kernel to use in the model: linear, polynomial, RBF, sigmoid or precomputed. ?rbf? by defau

sklearn.cluster.mean_shift()

sklearn.cluster.mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, max_iter=300, n_jobs=1) [source] Perform mean shift clustering of data using a flat kernel. Read more in the User Guide. Parameters: X : array-like, shape=[n_samples, n_features] Input data. bandwidth : float, optional Kernel bandwidth. If bandwidth is not given, it is determined using a heuristic based on the median of all pairwise distances. This will take quadratic time in t

sklearn.metrics.median_absolute_error()

sklearn.metrics.median_absolute_error(y_true, y_pred) [source] Median absolute error regression loss Read more in the User Guide. Parameters: y_true : array-like of shape = (n_samples) Ground truth (correct) target values. y_pred : array-like of shape = (n_samples) Estimated target values. Returns: loss : float A positive floating point value (the best value is 0.0). Examples >>> from sklearn.metrics import median_absolute_error >>> y_true = [3, -0.5, 2, 7] >&

sklearn.metrics.label_ranking_loss()

sklearn.metrics.label_ranking_loss(y_true, y_score, sample_weight=None) [source] Compute Ranking loss measure Compute the average number of label pairs that are incorrectly ordered given y_score weighted by the size of the label set and the number of labels not in the label set. This is similar to the error set size, but weighted by the number of relevant and irrelevant labels. The best performance is achieved with a ranking loss of zero. Read more in the User Guide. New in version 0.17: A

sklearn.feature_selection.chi2()

sklearn.feature_selection.chi2(X, y) [source] Compute chi-squared stats between each non-negative feature and class. This score can be used to select the n_features features with the highest values for the test chi-squared statistic from X, which must contain only non-negative features such as booleans or frequencies (e.g., term counts in document classification), relative to the classes. Recall that the chi-square test measures dependence between stochastic variables, so using this functio

linear_model.MultiTaskLasso()

class sklearn.linear_model.MultiTaskLasso(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, random_state=None, selection='cyclic') [source] Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer The optimization objective for Lasso is: (1 / (2 * n_samples)) * ||Y - XW||^2_Fro + alpha * ||W||_21 Where: ||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2} i.e. the sum of norm of each row. Read more in the User Guide. Parameters: alph

cross_validation.LeavePOut()

Warning DEPRECATED class sklearn.cross_validation.LeavePOut(n, p) [source] Leave-P-Out cross validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeavePOut instead. Provides train/test indices to split data in train test sets. This results in testing on all distinct samples of size p, while the remaining n - p samples form the training set in each iteration. Note: LeavePOut(n, p) is NOT equivalent to KFold(n, n_folds=n // p)

sklearn.covariance.shrunk_covariance()

sklearn.covariance.shrunk_covariance(emp_cov, shrinkage=0.1) [source] Calculates a covariance matrix shrunk on the diagonal Read more in the User Guide. Parameters: emp_cov : array-like, shape (n_features, n_features) Covariance matrix to be shrunk shrinkage : float, 0 <= shrinkage <= 1 Coefficient in the convex combination used for the computation of the shrunk estimate. Returns: shrunk_cov : array-like Shrunk covariance. Notes The regularized (shrunk) covariance is given b

sklearn.metrics.make_scorer()

sklearn.metrics.make_scorer(score_func, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs) [source] Make a scorer from a performance metric or loss function. This factory function wraps scoring functions for use in GridSearchCV and cross_val_score. It takes a score function, such as accuracy_score, mean_squared_error, adjusted_rand_index or average_precision and returns a callable that scores an estimator?s output. Read more in the User Guide. Parameters: score_fun