sklearn.cross_validation.cross_val_score()

Warning DEPRECATED sklearn.cross_validation.cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs') [source] Evaluate a score by cross-validation Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.cross_val_score instead. Read more in the User Guide. Parameters: estimator : estimator object implementing ?fit? The object to use to fit the data. X : array-like The data to f

Classification of text documents using sparse features

This is an example showing how scikit-learn can be used to classify documents by topics using a bag-of-words approach. This example uses a scipy.sparse matrix to store the features and demonstrates various classifiers that can efficiently handle sparse matrices. The dataset used in this example is the 20 newsgroups dataset. It will be automatically downloaded, then cached. The bar plot indicates the accuracy, training time (normalized) and test time (normalized) of each classifier. # Author: P

Lasso and Elastic Net for Sparse Signals

Estimates Lasso and Elastic-Net regression models on a manually generated sparse signal corrupted with an additive noise. Estimated coefficients are compared with the ground-truth. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import r2_score generate some sparse data to play with np.random.seed(42) n_samples, n_features = 50, 200 X = np.random.randn(n_samples, n_features) coef = 3 * np.random.randn(n_features) inds = np.arange(n_features) np.random

linear_model.Perceptron()

class sklearn.linear_model.Perceptron(penalty=None, alpha=0.0001, fit_intercept=True, n_iter=5, shuffle=True, verbose=0, eta0=1.0, n_jobs=1, random_state=0, class_weight=None, warm_start=False) [source] Read more in the User Guide. Parameters: penalty : None, ?l2? or ?l1? or ?elasticnet? The penalty (aka regularization term) to be used. Defaults to None. alpha : float Constant that multiplies the regularization term if regularization is used. Defaults to 0.0001 fit_intercept : bool Wh

covariance.ShrunkCovariance()

class sklearn.covariance.ShrunkCovariance(store_precision=True, assume_centered=False, shrinkage=0.1) [source] Covariance estimator with shrinkage Read more in the User Guide. Parameters: store_precision : boolean, default True Specify if the estimated precision is stored shrinkage : float, 0 <= shrinkage <= 1, default 0.1 Coefficient in the convex combination used for the computation of the shrunk estimate. assume_centered : boolean, default False If True, data are not centered

Bayesian Ridge Regression

Computes a Bayesian Ridge Regression on a synthetic dataset. See Bayesian Ridge Regression for more information on the regressor. Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them. As the prior on the weights is a Gaussian prior, the histogram of the estimated weights is Gaussian. The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations. print(__doc__) im

sklearn.neighbors.kneighbors_graph()

sklearn.neighbors.kneighbors_graph(X, n_neighbors, mode='connectivity', metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=1) [source] Computes the (weighted) graph of k-Neighbors for points in X Read more in the User Guide. Parameters: X : array-like or BallTree, shape = [n_samples, n_features] Sample data, in the form of a numpy array or a precomputed BallTree. n_neighbors : int Number of neighbors for each sample. mode : {?connectivity?, ?distance?}, optional T

Robust Scaling on Toy Data

Making sure that each Feature has approximately the same scale can be a crucial preprocessing step. However, when data contains outliers, StandardScaler can often be mislead. In such cases, it is better to use a scaler that is robust against outliers. Here, we demonstrate this on a toy dataset, where one single datapoint is a large outlier. Out: Testset accuracy using standard scaler: 0.545 Testset accuracy using robust scaler: 0.705 from __future__ import print_function print(__doc_

preprocessing.Binarizer()

class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True) [source] Binarize data (set feature values to 0 or 1) according to a threshold Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1. Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurrences f

Concentration Prior Type Analysis of Variation Bayesian Gaussian Mixture

This example plots the ellipsoids obtained from a toy dataset (mixture of three Gaussians) fitted by the BayesianGaussianMixture class models with a Dirichlet distribution prior (weight_concentration_prior_type='dirichlet_distribution') and a Dirichlet process prior (weight_concentration_prior_type='dirichlet_process'). On each figure, we plot the results for three different values of the weight concentration prior. The BayesianGaussianMixture class can adapt its number of mixture componentsau