sklearn.preprocessing.robust_scale()

sklearn.preprocessing.robust_scale(X, axis=0, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True) [source] Standardize a dataset along any axis Center to the median and component wise scale according to the interquartile range. Read more in the User Guide. Parameters: X : array-like The data to center and scale. axis : int (0 by default) axis used to compute the medians and IQR along. If 0, independently scale each feature, otherwise (if 1) scale each sample.

sklearn.metrics.roc_auc_score()

sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None) [source] Compute Area Under the Curve (AUC) from prediction scores Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format. Read more in the User Guide. Parameters: y_true : array, shape = [n_samples] or [n_samples, n_classes] True binary labels in binary label indicators. y_score : array, shape = [n_samples] or [n_samples, n_cl

1.11. Ensemble methods

The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. Two families of ensemble methods are usually distinguished: In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its varianc

Scaling the regularization parameter for SVCs

The following example illustrates the effect of scaling the regularization parameter when using Support Vector Machines for classification. For SVC classification, we are interested in a risk minimization for the equation: where is used to set the amount of regularization is a loss function of our samples and our model parameters. is a penalty function of our model parameters If we consider the loss function to be the individual error per sample, then the data-fit term, or the sum

1.7. Gaussian Processes

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations (at least for regular kernels). The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. Versatile: differen

sklearn.svm.libsvm.cross_validation()

sklearn.svm.libsvm.cross_validation() Binding of the cross-validation routine (low-level routine) Parameters: X: array-like, dtype=float, size=[n_samples, n_features] : Y: array, dtype=float, size=[n_samples] : target vector svm_type : {0, 1, 2, 3, 4} Type of SVM: C SVC, nu SVC, one class, epsilon SVR, nu SVR kernel : {?linear?, ?rbf?, ?poly?, ?sigmoid?, ?precomputed?} Kernel to use in the model: linear, polynomial, RBF, sigmoid or precomputed. degree : int Degree of the polynomial

PCA example with Iris Data-set

Principal Component Analysis applied to the Iris dataset. See here for more information on this dataset. print(__doc__) # Code source: Ga Varoquaux # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition from sklearn import datasets np.random.seed(5) centers = [[1, 1], [-1, -1], [1, -1]] iris = datasets.load_iris() X = iris.data y = iris.target fig = plt.figure(1, figsize=(4, 3)) plt.clf()

decomposition.MiniBatchDictionaryLearning()

class sklearn.decomposition.MiniBatchDictionaryLearning(n_components=None, alpha=1, n_iter=1000, fit_algorithm='lars', n_jobs=1, batch_size=3, shuffle=True, dict_init=None, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, verbose=False, split_sign=False, random_state=None) [source] Mini-batch dictionary learning Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code. Solves the optimization problem: (U^*,V^*) = argmin

cross_validation.LeaveOneLabelOut()

Warning DEPRECATED class sklearn.cross_validation.LeaveOneLabelOut(labels) [source] Leave-One-Label_Out cross-validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeaveOneGroupOut instead. Provides train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the labels could be the ye

Comparing various online solvers

An example showing how different online solvers perform on the hand-written digits dataset. Out: training SGD training ASGD training Perceptron training Passive-Aggressive I training Passive-Aggressive II training SAG # Author: Rob Zinkov <rob at zinkov dot com> # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.linear_model import SGDClassifier, Perceptron fro