linear_model.LassoCV()

class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1, positive=False, random_state=None, selection='cyclic') [source] Lasso linear model with iterative fitting along a regularization path The best model is selected by cross-validation. The optimization objective for Lasso is: (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1 Read more i

sklearn.feature_selection.f_classif()

sklearn.feature_selection.f_classif(X, y) [source] Compute the ANOVA F-value for the provided sample. Read more in the User Guide. Parameters: X : {array-like, sparse matrix} shape = [n_samples, n_features] The set of regressors that will be tested sequentially. y : array of shape(n_samples) The data matrix. Returns: F : array, shape = [n_features,] The set of F values. pval : array, shape = [n_features,] The set of p-values. See also chi2 Chi-squared stats of non-negative f

sklearn.metrics.pairwise_distances_argmin()

sklearn.metrics.pairwise_distances_argmin(X, Y, axis=1, metric='euclidean', batch_size=500, metric_kwargs=None) [source] Compute minimum distances between one point and a set of points. This function computes for each row in X, the index of the row of Y which is closest (according to the specified distance). This is mostly equivalent to calling: pairwise_distances(X, Y=Y, metric=metric).argmin(axis=axis) but uses much less memory, and is faster for large arrays. This function works with de

The Iris Dataset

This data sets consists of 3 different types of irises? (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width. The below plot uses the first two features. See here for more information on this dataset. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import matplotlib.pyplot as

Concatenating multiple feature extraction methods

In many real-world examples, there are many ways to extract features from a dataset. Often it is beneficial to combine several methods to obtain good performance. This example shows how to use FeatureUnion to combine features obtained by PCA and univariate selection. Combining features using this transformer has the benefit that it allows cross validation and grid searches over the whole process. The combination used in this example is not particularly helpful on this dataset and is only used

Blind source separation using FastICA

An example of estimating sources from noisy data. Independent component analysis (ICA) is used to estimate sources given noisy measurements. Imagine 3 instruments playing simultaneously and 3 microphones recording the mixed signals. ICA is used to recover the sources ie. what is played by each instrument. Importantly, PCA fails at recovering our instruments since the related signals reflect non-Gaussian processes. print(__doc__) import numpy as np import matplotlib.pyplot as plt from scipy im

scikit-learn Tutorials

An introduction to machine learning with scikit-learnMachine learning: the problem setting Loading an example dataset Learning and predicting Model persistence Conventions A tutorial on statistical-learning for scientific data processingStatistical learning: the setting and the estimator object in scikit-learn Supervised learning: predicting an output variable from high-dimensional observations Model selection: choosing estimators and their parameters Unsupervised learning: seeking repres

sklearn.metrics.pairwise.paired_cosine_distances()

sklearn.metrics.pairwise.paired_cosine_distances(X, Y) [source] Computes the paired cosine distances between X and Y Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Y : array-like, shape (n_samples, n_features) Returns: distances : ndarray, shape (n_samples, ) Notes The cosine distance is equivalent to the half the squared euclidean distance if each sample is normalized to unit norm

Demonstration of k-means assumptions

This example is meant to illustrate situations where k-means will produce unintuitive and possibly unexpected clusters. In the first three plots, the input data does not conform to some implicit assumption that k-means makes and undesirable clusters are produced as a result. In the last plot, k-means returns intuitive clusters despite unevenly sized blobs. print(__doc__) # Author: Phil Roth <mr.phil.roth@gmail.com> # License: BSD 3 clause import numpy as np import matplotlib.pyplot

sklearn.metrics.pairwise.rbf_kernel()

sklearn.metrics.pairwise.rbf_kernel(X, Y=None, gamma=None) [source] Compute the rbf (gaussian) kernel between X and Y: K(x, y) = exp(-gamma ||x-y||^2) for each pair of rows x in X and y in Y. Read more in the User Guide. Parameters: X : array of shape (n_samples_X, n_features) Y : array of shape (n_samples_Y, n_features) gamma : float, default None If None, defaults to 1.0 / n_samples_X Returns: kernel_matrix : array of shape (n_samples_X, n_samples_Y)