sklearn.linear_model.lasso_stability_path()

sklearn.linear_model.lasso_stability_path(X, y, scaling=0.5, random_state=None, n_resampling=200, n_grid=100, sample_fraction=0.75, eps=8.8817841970012523e-16, n_jobs=1, verbose=False) [source] Stability path based on randomized Lasso estimates Read more in the User Guide. Parameters: X : array-like, shape = [n_samples, n_features] training data. y : array-like, shape = [n_samples] target values. scaling : float, optional, default=0.5 The alpha parameter in the stability selection art

sklearn.datasets.load_boston()

sklearn.datasets.load_boston(return_X_y=False) [source] Load and return the boston house-prices dataset (regression). Samples total 506 Dimensionality 13 Features real, positive Targets real 5. - 50. Parameters: return_X_y : boolean, default=False. If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object. New in version 0.18. Returns: data : Bunch Dictionary-like object, the interesting attributes are: ?data?, the dat

gaussian_process.kernels.PairwiseKernel()

class sklearn.gaussian_process.kernels.PairwiseKernel(gamma=1.0, gamma_bounds=(1e-05, 100000.0), metric='linear', pairwise_kernels_kwargs=None) [source] Wrapper for kernels in sklearn.metrics.pairwise. A thin wrapper around the functionality of the kernels in sklearn.metrics.pairwise. Note: Evaluation of eval_gradient is not analytic but numeric and all kernels support only isotropic distances. The parameter gamma is considered to be a hyperparameter and may be optimized. The other kernel p

feature_extraction.text.TfidfVectorizer()

class sklearn.feature_extraction.text.TfidfVectorizer(input=u'content', encoding=u'utf-8', decode_error=u'strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer=u'word', stop_words=None, token_pattern=u'(?u)\b\w\w+\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=, norm=u'l2', use_idf=True, smooth_idf=True, sublinear_tf=False) [source] Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.metrics.classification_report()

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2) [source] Build a text report showing the main classification metrics Read more in the User Guide. Parameters: y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) target values. y_pred : 1d array-like, or label indicator array / sparse matrix Estimated targets as returned by a classifier. labels : array, shape = [n_labels] Optional list

Train error vs Test error

Illustration of how the performance of an estimator on unseen data (test data) is not the same as the performance on training data. As the regularization increases the performance on train decreases while the performance on test is optimal within a range of values of the regularization parameter. The example with an Elastic-Net regression model and the performance is measured using the explained variance a.k.a. R^2. print(__doc__) # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr&g

sklearn.datasets.fetch_olivetti_faces()

sklearn.datasets.fetch_olivetti_faces(data_home=None, shuffle=False, random_state=0, download_if_missing=True) [source] Loader for the Olivetti faces data-set from AT&T. Read more in the User Guide. Parameters: data_home : optional, default: None Specify another download and cache folder for the datasets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders. shuffle : boolean, optional If True the order of the dataset is shuffled to avoid having images of t

4.5. Random Projection

The sklearn.random_projection module implements a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes. This module implements two types of unstructured random matrix: Gaussian random matrix and sparse random matrix. The dimensions and distribution of random projections matrices are controlled so as to preserve the pairwise distances between any t

Robust covariance estimation and Mahalanobis distances relevance

An example to show covariance estimation with the Mahalanobis distances on Gaussian distributed data. For Gaussian distributed data, the distance of an observation to the mode of the distribution can be computed using its Mahalanobis distance: where and are the location and the covariance of the underlying Gaussian distribution. In practice, and are replaced by some estimates. The usual covariance maximum likelihood estimate is very sensitive to the presence of outliers in the data set a

Outlier detection on a real data set

This example illustrates the need for robust covariance estimation on a real data set. It is useful both for outlier detection and for a better understanding of the data structure. We selected two sets of two variables from the Boston housing data set as an illustration of what kind of analysis can be done with several outlier detection tools. For the purpose of visualization, we are working with two-dimensional examples, but one should be aware that things are not so trivial in high-dimension