scikit-learn Tutorials

An introduction to machine learning with scikit-learnMachine learning: the problem setting Loading an example dataset Learning and predicting Model persistence Conventions A tutorial on statistical-learning for scientific data processingStatistical learning: the setting and the estimator object in scikit-learn Supervised learning: predicting an output variable from high-dimensional observations Model selection: choosing estimators and their parameters Unsupervised learning: seeking repres

sklearn.datasets.fetch_species_distributions()

sklearn.datasets.fetch_species_distributions(data_home=None, download_if_missing=True) [source] Loader for species distribution dataset from Phillips et. al. (2006) Read more in the User Guide. Parameters: data_home : optional, default: None Specify another download and cache folder for the datasets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders. download_if_missing : optional, True by default If False, raise a IOError if the data is not locally availabl

Face completion with a multi-output estimators

This example shows the use of multi-output estimator to complete images. The goal is to predict the lower half of a face given its upper half. The first column of images shows true faces. The next columns illustrate how extremely randomized trees, k nearest neighbors, linear regression and ridge regression complete the lower half of those faces. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import fetch_olivetti_faces from sklearn.utils.validatio

sklearn.metrics.pairwise.pairwise_distances()

sklearn.metrics.pairwise.pairwise_distances(X, Y=None, metric='euclidean', n_jobs=1, **kwds) [source] Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix, and returns a distance matrix. If the input is a vector array, the distances are computed. If the input is a distances matrix, it is returned instead. This method provides a safe way to take a distance matrix as input, while preserving compatibility with many other

sklearn.metrics.pairwise.cosine_similarity()

sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True) [source] Compute cosine similarity between samples in X and Y. Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||) On L2-normalized data, this function is equivalent to linear_kernel. Read more in the User Guide. Parameters: X : ndarray or sparse array, shape: (n_samples_X, n_features) Input data. Y : ndarray or sparse array,

covariance.LedoitWolf()

class sklearn.covariance.LedoitWolf(store_precision=True, assume_centered=False, block_size=1000) [source] LedoitWolf Estimator Ledoit-Wolf is a particular form of shrinkage, where the shrinkage coefficient is computed using O. Ledoit and M. Wolf?s formula as described in ?A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices?, Ledoit and Wolf, Journal of Multivariate Analysis, Volume 88, Issue 2, February 2004, pages 365-411. Read more in the User Guide. Parameters: store

ensemble.GradientBoostingRegressor()

class sklearn.ensemble.GradientBoostingRegressor(loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_split=1e-07, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto') [source] Gradient Boosting for regression. GB builds an additive model in a forward stage-wise fashion; it allows for the

Linear Regression Example

This example uses the only the first feature of the diabetes dataset, in order to illustrate a two-dimensional plot of this regression technique. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. The coefficients, the residual sum of squares and the variance score are also calculated.

sklearn.covariance.shrunk_covariance()

sklearn.covariance.shrunk_covariance(emp_cov, shrinkage=0.1) [source] Calculates a covariance matrix shrunk on the diagonal Read more in the User Guide. Parameters: emp_cov : array-like, shape (n_features, n_features) Covariance matrix to be shrunk shrinkage : float, 0 <= shrinkage <= 1 Coefficient in the convex combination used for the computation of the shrunk estimate. Returns: shrunk_cov : array-like Shrunk covariance. Notes The regularized (shrunk) covariance is given b

covariance.EmpiricalCovariance()

class sklearn.covariance.EmpiricalCovariance(store_precision=True, assume_centered=False) [source] Maximum likelihood covariance estimator Read more in the User Guide. Parameters: store_precision : bool Specifies if the estimated precision is stored. assume_centered : bool If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False (default), data are centered before computation. Attributes: covariance_ : 2D nd