sklearn.linear_model.orthogonal_mp_gram()

sklearn.linear_model.orthogonal_mp_gram(Gram, Xy, n_nonzero_coefs=None, tol=None, norms_squared=None, copy_Gram=True, copy_Xy=True, return_path=False, return_n_iter=False) [source] Gram Orthogonal Matching Pursuit (OMP) Solves n_targets Orthogonal Matching Pursuit problems using only the Gram matrix X.T * X and the product X.T * y. Read more in the User Guide. Parameters: Gram : array, shape (n_features, n_features) Gram matrix of the input data: X.T * X Xy : array, shape (n_features,) o

The Johnson-Lindenstrauss bound for embedding with random projections

The Johnson-Lindenstrauss lemma states that any high dimensional dataset can be randomly projected into a lower dimensional Euclidean space while controlling the distortion in the pairwise distances. Theoretical bounds The distortion introduced by a random projection p is asserted by the fact that p is defining an eps-embedding with good probability as defined by: Where u and v are any rows taken from a dataset of shape [n_samples, n_features] and p is a projection by a random Gaussian N(0

2.7. Novelty and Outlier Detection

Many applications require being able to decide whether a new observation belongs to the same distribution as existing observations (it is an inlier), or should be considered as different (it is an outlier). Often, this ability is used to clean real data sets. Two important distinction must be made: novelty detection: The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations. outlier detection: The training data contains outliers, and we n

sklearn.datasets.load_digits()

sklearn.datasets.load_digits(n_class=10, return_X_y=False) [source] Load and return the digits dataset (classification). Each datapoint is a 8x8 image of a digit. Classes 10 Samples per class ~180 Samples total 1797 Dimensionality 64 Features integers 0-16 Read more in the User Guide. Parameters: n_class : integer, between 0 and 10, optional (default=10) The number of classes to return. return_X_y : boolean, default=False. If True, returns (data, target) instead of a Bunch object. See b

Manifold Learning methods on a severed sphere

An application of the different Manifold learning techniques on a spherical data-set. Here one can see the use of dimensionality reduction in order to gain some intuition regarding the manifold learning methods. Regarding the dataset, the poles are cut from the sphere, as well as a thin slice down its side. This enables the manifold learning techniques to ?spread it open? whilst projecting it onto two dimensions. For a similar example, where the methods are applied to the S-curve dataset, see

covariance.ShrunkCovariance()

class sklearn.covariance.ShrunkCovariance(store_precision=True, assume_centered=False, shrinkage=0.1) [source] Covariance estimator with shrinkage Read more in the User Guide. Parameters: store_precision : boolean, default True Specify if the estimated precision is stored shrinkage : float, 0 <= shrinkage <= 1, default 0.1 Coefficient in the convex combination used for the computation of the shrunk estimate. assume_centered : boolean, default False If True, data are not centered

Lasso and Elastic Net for Sparse Signals

Estimates Lasso and Elastic-Net regression models on a manually generated sparse signal corrupted with an additive noise. Estimated coefficients are compared with the ground-truth. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import r2_score generate some sparse data to play with np.random.seed(42) n_samples, n_features = 50, 200 X = np.random.randn(n_samples, n_features) coef = 3 * np.random.randn(n_features) inds = np.arange(n_features) np.random

sklearn.datasets.load_files()

sklearn.datasets.load_files(container_path, description=None, categories=None, load_content=True, shuffle=True, encoding=None, decode_error='strict', random_state=0) [source] Load text files with categories as subfolder names. Individual samples are assumed to be files stored a two levels folder structure such as the following: container_folder/ category_1_folder/ file_1.txt file_2.txt ... file_42.txt category_2_folder/ file_43.txt file_44.txt ... The folder names are used as supervised

Shrinkage covariance estimation

When working with covariance estimation, the usual approach is to use a maximum likelihood estimator, such as the sklearn.covariance.EmpiricalCovariance. It is unbiased, i.e. it converges to the true (population) covariance when given many observations. However, it can also be beneficial to regularize it, in order to reduce its variance; this, in turn, introduces some bias. This example illustrates the simple regularization used in Shrunk Covariance estimators. In particular, it focuses on how

sklearn.metrics.pairwise.additive_chi2_kernel()

sklearn.metrics.pairwise.additive_chi2_kernel(X, Y=None) [source] Computes the additive chi-squared kernel between observations in X and Y The chi-squared kernel is computed between each pair of rows in X and Y. X and Y have to be non-negative. This kernel is most commonly applied to histograms. The chi-squared kernel is given by: k(x, y) = -Sum [(x - y)^2 / (x + y)] It can be interpreted as a weighted difference per entry. Read more in the User Guide. Parameters: X : array-like of shape