Statistical learning

Datasets Scikit-learn deals with learning information from one or more datasets that are represented as 2D arrays. They can be understood as a list of multi-dimensional observations. We say that the first axis of these arrays is the samples axis, while the second is the features axis. A simple example shipped with the scikit: iris dataset >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> data = iris.data >>> data.shape (150, 4) It is ma

sklearn.pipeline.make_pipeline()

sklearn.pipeline.make_pipeline(*steps) [source] Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically. Returns: p : Pipeline Examples >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.preprocessing import StandardScaler >>> make_pipeline(StandardScaler(), GaussianN

linear_model.RandomizedLasso()

class sklearn.linear_model.RandomizedLasso(alpha='aic', scaling=0.5, sample_fraction=0.75, n_resampling=200, selection_threshold=0.25, fit_intercept=True, verbose=False, normalize=True, precompute='auto', max_iter=500, eps=2.2204460492503131e-16, random_state=None, n_jobs=1, pre_dispatch='3*n_jobs', memory=Memory(cachedir=None)) [source] Randomized Lasso. Randomized Lasso works by subsampling the training data and computing a Lasso estimate where the penalty of a random subset of coefficien

dummy.DummyRegressor()

class sklearn.dummy.DummyRegressor(strategy='mean', constant=None, quantile=None) [source] DummyRegressor is a regressor that makes predictions using simple rules. This regressor is useful as a simple baseline to compare with other (real) regressors. Do not use it for real problems. Read more in the User Guide. Parameters: strategy : str Strategy to use to generate predictions. ?mean?: always predicts the mean of the training set ?median?: always predicts the median of the training set ?q

sklearn.datasets.load_lfw_people()

Warning DEPRECATED sklearn.datasets.load_lfw_people(*args, **kwargs) [source] DEPRECATED: Function ?load_lfw_people? has been deprecated in 0.17 and will be removed in 0.19.Use fetch_lfw_people(download_if_missing=False) instead. Alias for fetch_lfw_people(download_if_missing=False) Deprecated since version 0.17: This function will be removed in 0.19. Use sklearn.datasets.fetch_lfw_people with parameter download_if_missing=False instead. Check fetch_lfw_people.__doc__ for the documentat

sklearn.covariance.ledoit_wolf()

sklearn.covariance.ledoit_wolf(X, assume_centered=False, block_size=1000) [source] Estimates the shrunk Ledoit-Wolf covariance matrix. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Data from which to compute the covariance estimate assume_centered : boolean, default=False If True, data are not centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data are centered before

sklearn.datasets.load_mlcomp()

sklearn.datasets.load_mlcomp(name_or_id, set_='raw', mlcomp_root=None, **kwargs) [source] Load a datasets as downloaded from http://mlcomp.org Parameters: name_or_id : the integer id or the string name metadata of the MLComp dataset to load set_ : select the portion to load: ?train?, ?test? or ?raw? mlcomp_root : the filesystem path to the root folder where MLComp datasets are stored, if mlcomp_root is None, the MLCOMP_DATASETS_HOME environment variable is looked up instead. **kwargs :

Faces recognition example using eigenfaces and SVMs

The dataset used in this example is a preprocessed excerpt of the ?Labeled Faces in the Wild?, aka LFW: http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB) Expected results for the top 5 most represented people in the dataset: Ariel Sharon 0.67 0.92 0.77 13 Colin Powell 0.75 0.78 0.76 60 Donald Rumsfeld 0.78 0.67 0.72 27 George W Bush 0.86 0.86 0.86 146 Gerhard Schroeder 0.76 0.76 0.76 25 Hugo Chavez 0.67 0.67 0.67 15 Tony Blair 0.81 0.69 0.75 36 avg / total 0.80 0.80 0.80

Swiss Roll reduction with LLE

An illustration of Swiss Roll reduction with locally linear embedding Out: Computing LLE embedding Done. Reconstruction error: 9.45487e-08 # Author: Fabian Pedregosa -- <fabian.pedregosa@inria.fr> # License: BSD 3 clause (C) INRIA 2011 print(__doc__) import matplotlib.pyplot as plt # This import is needed to modify the way figure behaves from mpl_toolkits.mplot3d import Axes3D Axes3D #---------------------------------------------------------------------- # Locally linear embe

sklearn.datasets.make_friedman2()

sklearn.datasets.make_friedman2(n_samples=100, noise=0.0, random_state=None) [source] Generate the ?Friedman #2? regression problem This dataset is described in Friedman [1] and Breiman [2]. Inputs X are 4 independent features uniformly distributed on the intervals: 0 <= X[:, 0] <= 100, 40 * pi <= X[:, 1] <= 560 * pi, 0 <= X[:, 2] <= 1, 1 <= X[:, 3] <= 11. The output y is created according to the formula: y(X) = (X[:, 0] ** 2 + (X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X