linear_model.HuberRegressor()

class sklearn.linear_model.HuberRegressor(epsilon=1.35, max_iter=100, alpha=0.0001, warm_start=False, fit_intercept=True, tol=1e-05) [source] Linear regression model that is robust to outliers. The Huber Regressor optimizes the squared loss for the samples where |(y - X'w) / sigma| < epsilon and the absolute loss for the samples where |(y - X'w) / sigma| > epsilon, where w and sigma are parameters to be optimized. The parameter sigma makes sure that if y is scaled up or down by a cert

Plot the decision surface of a decision tree on the iris dataset

Plot the decision surface of a decision tree trained on pairs of features of the iris dataset. See decision tree for more information on the estimator. For each pair of iris features, the decision tree learns decision boundaries made of combinations of simple thresholding rules inferred from the training samples. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier # Parameters n_classes

Agglomerative clustering with and without structure

This example shows the effect of imposing a connectivity graph to capture local structure in the data. The graph is simply the graph of 20 nearest neighbors. Two consequences of imposing a connectivity can be seen. First clustering with a connectivity matrix is much faster. Second, when using a connectivity matrix, average and complete linkage are unstable and tend to create a few clusters that grow very quickly. Indeed, average and complete linkage fight this percolation behavior by consideri

OOB Errors for Random Forests

The RandomForestClassifier is trained using bootstrap aggregation, where each new tree is fit from a bootstrap sample of the training observations . The out-of-bag (OOB) error is the average error for each calculated using predictions from the trees that do not contain in their respective bootstrap sample. This allows the RandomForestClassifier to be fit and validated whilst being trained [1]. The example below demonstrates how the OOB error can be measured at the addition of each new tree d

Comparison of F-test and mutual information

This example illustrates the differences between univariate F-test statistics and mutual information. We consider 3 features x_1, x_2, x_3 distributed uniformly over [0, 1], the target depends on them as follows: y = x_1 + sin(6 * pi * x_2) + 0.1 * N(0, 1), that is the third features is completely irrelevant. The code below plots the dependency of y against individual x_i and normalized values of univariate F-tests statistics and mutual information. As F-test captures only linear dependency, i

sklearn.grid_search.fit_grid_point()

Warning DEPRECATED sklearn.grid_search.fit_grid_point(X, y, estimator, parameters, train, test, scorer, verbose, error_score='raise', **fit_params) [source] Run fit on one set of parameters. Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.fit_grid_point instead. Parameters: X : array-like, sparse matrix or list Input data. y : array-like or None Targets for input data. estimator : estimator object A object of that type is instantiate

Plot classification probability

Plot the classification probability for different classifiers. We use a 3 class dataset, and we classify it with a Support Vector classifier, L1 and L2 penalized logistic regression with either a One-Vs-Rest or multinomial setting, and Gaussian process classification. The logistic regression is not a multiclass classifier out of the box. As a result it can identify only the first class. Out: classif_rate for GPC : 82.666667 classif_rate for L2 logistic (OvR) : 76.666667 classif_rate for

Plot the decision surfaces of ensembles of trees on the iris dataset

Plot the decision surfaces of forests of randomized trees trained on pairs of features of the iris dataset. This plot compares the decision surfaces learned by a decision tree classifier (first column), by a random forest classifier (second column), by an extra- trees classifier (third column) and by an AdaBoost classifier (fourth column). In the first row, the classifiers are built using the sepal width and the sepal length features only, on the second row using the petal length and sepal len

preprocessing.KernelCenterer

class sklearn.preprocessing.KernelCenterer [source] Center a kernel matrix Let K(x, z) be a kernel defined by phi(x)^T phi(z), where phi is a function mapping x to a Hilbert space. KernelCenterer centers (i.e., normalize to have zero mean) the data without explicitly computing phi(x). It is equivalent to centering phi(x) with sklearn.preprocessing.StandardScaler(with_std=False). Read more in the User Guide. Methods fit(K[, y]) Fit KernelCenterer fit_transform(X[, y]) Fit to data, then tra

preprocessing.LabelBinarizer()

class sklearn.preprocessing.LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False) [source] Binarize labels in a one-vs-all fashion Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme. At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels