Putting it all together

Pipelining We have seen that some estimators can transform data and that some estimators can predict variables. We can also create combined estimators: from sklearn import linear_model, decomposition, datasets from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV logistic = linear_model.LogisticRegression() pca = decomposition.PCA() pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)]) digits = datasets.load_digits() X_digits = digits.data y_di

sklearn.linear_model.orthogonal_mp()

sklearn.linear_model.orthogonal_mp(X, y, n_nonzero_coefs=None, tol=None, precompute=False, copy_X=True, return_path=False, return_n_iter=False) [source] Orthogonal Matching Pursuit (OMP) Solves n_targets Orthogonal Matching Pursuit problems. An instance of the problem has the form: When parametrized by the number of non-zero coefficients using n_nonzero_coefs: argmin ||y - Xgamma||^2 subject to ||gamma||_0 <= n_{nonzero coefs} When parametrized by error using the parameter tol: argmin ||

Sparse coding with a precomputed dictionary

Transform a signal as a sparse combination of Ricker wavelets. This example visually compares different sparse coding methods using the sklearn.decomposition.SparseCoder estimator. The Ricker (also known as Mexican hat or the second derivative of a Gaussian) is not a particularly good kernel to represent piecewise constant signals like this one. It can therefore be seen how much adding different widths of atoms matters and it therefore motivates learning the dictionary to best fit your type of

1.10. Decision Trees

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model. Some advantages of deci

Supervised learning

The problem solved in supervised learning Supervised learning consists in learning the link between two datasets: the observed data X and an external variable y that we are trying to predict, usually called ?target? or ?labels?. Most often, y is a 1D array of length n_samples. All supervised estimators in scikit-learn implement a fit(X, y) method to fit the model and a predict(X) method that, given unlabeled observations X, returns the predicted labels y. Vocabulary: classification and regr

Comparison of LDA and PCA 2D projection of Iris dataset

The Iris dataset represents 3 kind of Iris flowers (Setosa, Versicolour and Virginica) with 4 attributes: sepal length, sepal width, petal length and petal width. Principal Component Analysis (PCA) applied to this data identifies the combination of attributes (principal components, or directions in the feature space) that account for the most variance in the data. Here we plot the different samples on the 2 first principal components. Linear Discriminant Analysis (LDA) tries to identify attrib

decomposition.PCA()

class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None) [source] Principal component analysis (PCA) Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of compone

One-class SVM with non-linear kernel

An example using a one-class SVM for novelty detection. One-class SVM is an unsupervised algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set. print(__doc__) import numpy as np import matplotlib.pyplot as plt import matplotlib.font_manager from sklearn import svm xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500)) # Generate train data X = 0.3 * np.random.randn(100, 2) X_train = np.r_[X + 2, X

K-means Clustering

The plots display firstly what a K-means algorithm would yield using three clusters. It is then shown what the effect of a bad initialization is on the classification process: By setting n_init to only 1 (default is 10), the amount of times that the algorithm will be run with different centroid seeds is reduced. The next plot displays what using eight clusters would deliver and finally the ground truth. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaqu

Multilabel classification

This example simulates a multi-label document classification problem. The dataset is generated randomly based on the following process: pick the number of labels: n ~ Poisson(n_labels) n times, choose a class c: c ~ Multinomial(theta) pick the document length: k ~ Poisson(length) k times, choose a word: w ~ Multinomial(theta_c) In the above process, rejection sampling is used to make sure that n is more than 2, and that the document length is never zero. Likewise, we reject classes which ha