sklearn.metrics.pairwise.paired_distances()

sklearn.metrics.pairwise.paired_distances(X, Y, metric='euclidean', **kwds) [source] Computes the paired distances between X and Y. Computes the distances between (X[0], Y[0]), (X[1], Y[1]), etc... Read more in the User Guide. Parameters: X : ndarray (n_samples, n_features) Array 1 for distance computation. Y : ndarray (n_samples, n_features) Array 2 for distance computation. metric : string or callable The metric to use when calculating distance between instances in a feature array.

gaussian_process.kernels.Product()

class sklearn.gaussian_process.kernels.Product(k1, k2) [source] Product-kernel k1 * k2 of two kernels k1 and k2. The resulting kernel is defined as k_prod(X, Y) = k1(X, Y) * k2(X, Y) New in version 0.18. Parameters: k1 : Kernel object The first base-kernel of the product-kernel k2 : Kernel object The second base-kernel of the product-kernel Methods clone_with_theta(theta) Returns a clone of self with given hyperparameters theta. diag(X) Returns the diagonal of the kernel k(X, X).

Feature importances with forests of trees

This examples shows the use of forests of trees to evaluate the importance of features on an artificial classification task. The red bars are the feature importances of the forest, along with their inter-trees variability. As expected, the plot suggests that 3 features are informative, while the remaining are not. Out: Feature ranking: 1. feature 1 (0.295902) 2. feature 2 (0.208351) 3. feature 0 (0.177632) 4. feature 3 (0.047121) 5. feature 6 (0.046303) 6. feature 8 (0.046013) 7. feature

Nearest Neighbors Classification

Sample usage of Nearest Neighbors classification. It will plot the decision boundaries for each class. print(__doc__) import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from sklearn import neighbors, datasets n_neighbors = 15 # import some data to play with iris = datasets.load_iris() X = iris.data[:, :2] # we only take the first two features. We could # avoid this ugly slicing by using a two-dim dataset y = iris.targ

4.8. Transforming the prediction target

4.8.1. Label binarization LabelBinarizer is a utility class to help create a label indicator matrix from a list of multi-class labels: >>> from sklearn import preprocessing >>> lb = preprocessing.LabelBinarizer() >>> lb.fit([1, 2, 6, 4, 2]) LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False) >>> lb.classes_ array([1, 2, 4, 6]) >>> lb.transform([1, 6]) array([[1, 0, 0, 0], [0, 0, 0, 1]]) For multiple labels per instance, use MultiL

Statistical learning

Datasets Scikit-learn deals with learning information from one or more datasets that are represented as 2D arrays. They can be understood as a list of multi-dimensional observations. We say that the first axis of these arrays is the samples axis, while the second is the features axis. A simple example shipped with the scikit: iris dataset >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> data = iris.data >>> data.shape (150, 4) It is ma

sklearn.pipeline.make_pipeline()

sklearn.pipeline.make_pipeline(*steps) [source] Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically. Returns: p : Pipeline Examples >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.preprocessing import StandardScaler >>> make_pipeline(StandardScaler(), GaussianN

linear_model.RandomizedLasso()

class sklearn.linear_model.RandomizedLasso(alpha='aic', scaling=0.5, sample_fraction=0.75, n_resampling=200, selection_threshold=0.25, fit_intercept=True, verbose=False, normalize=True, precompute='auto', max_iter=500, eps=2.2204460492503131e-16, random_state=None, n_jobs=1, pre_dispatch='3*n_jobs', memory=Memory(cachedir=None)) [source] Randomized Lasso. Randomized Lasso works by subsampling the training data and computing a Lasso estimate where the penalty of a random subset of coefficien

dummy.DummyRegressor()

class sklearn.dummy.DummyRegressor(strategy='mean', constant=None, quantile=None) [source] DummyRegressor is a regressor that makes predictions using simple rules. This regressor is useful as a simple baseline to compare with other (real) regressors. Do not use it for real problems. Read more in the User Guide. Parameters: strategy : str Strategy to use to generate predictions. ?mean?: always predicts the mean of the training set ?median?: always predicts the median of the training set ?q

Faces recognition example using eigenfaces and SVMs

The dataset used in this example is a preprocessed excerpt of the ?Labeled Faces in the Wild?, aka LFW: http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB) Expected results for the top 5 most represented people in the dataset: Ariel Sharon 0.67 0.92 0.77 13 Colin Powell 0.75 0.78 0.76 60 Donald Rumsfeld 0.78 0.67 0.72 27 George W Bush 0.86 0.86 0.86 146 Gerhard Schroeder 0.76 0.76 0.76 25 Hugo Chavez 0.67 0.67 0.67 15 Tony Blair 0.81 0.69 0.75 36 avg / total 0.80 0.80 0.80