model_selection.StratifiedKFold()

class sklearn.model_selection.StratifiedKFold(n_splits=3, shuffle=False, random_state=None) [source] Stratified K-Folds cross-validator Provides train/test indices to split data in train/test sets. This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class. Read more in the User Guide. Parameters: n_splits : int, default=3 Number of folds. Must be at least 2. shuffle : boolean, optional Wh

sklearn.metrics.pairwise.paired_manhattan_distances()

sklearn.metrics.pairwise.paired_manhattan_distances(X, Y) [source] Compute the L1 distances between the vectors in X and Y. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Y : array-like, shape (n_samples, n_features) Returns: distances : ndarray (n_samples, )

gaussian_process.kernels.RBF()

class sklearn.gaussian_process.kernels.RBF(length_scale=1.0, length_scale_bounds=(1e-05, 100000.0)) [source] Radial-basis function kernel (aka squared-exponential kernel). The RBF kernel is a stationary kernel. It is also known as the ?squared exponential? kernel. It is parameterized by a length-scale parameter length_scale>0, which can either be a scalar (isotropic variant of the kernel) or a vector with the same number of dimensions as the inputs X (anisotropic variant of the kernel).

Feature agglomeration vs. univariate selection

This example compares 2 dimensionality reduction strategies: univariate feature selection with Anova feature agglomeration with Ward hierarchical clustering Both methods are compared in a regression problem using a BayesianRidge as supervised estimator. # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr> # License: BSD 3 clause print(__doc__) import shutil import tempfile import numpy as np import matplotlib.pyplot as plt from scipy import linalg, ndimage from sklearn.featur

Recursive feature elimination with cross-validation

A recursive feature elimination example with automatic tuning of the number of features selected with cross-validation. Out: Optimal number of features : 3 print(__doc__) import matplotlib.pyplot as plt from sklearn.svm import SVC from sklearn.model_selection import StratifiedKFold from sklearn.feature_selection import RFECV from sklearn.datasets import make_classification # Build a classification task using 3 informative features X, y = make_classification(n_samples=1000, n_features=2

Robust linear model estimation using RANSAC

In this example we see how to robustly fit a linear model to faulty data using the RANSAC algorithm. Out: Estimated coefficients (true, normal, RANSAC): 82.1903908408 [ 54.17236387] [ 82.08533159] import numpy as np from matplotlib import pyplot as plt from sklearn import linear_model, datasets n_samples = 1000 n_outliers = 50 X, y, coef = datasets.make_regression(n_samples=n_samples, n_features=1, n_informative=1, noise=10,

exceptions.DataDimensionalityWarning

class sklearn.exceptions.DataDimensionalityWarning [source] Custom warning to notify potential issues with data dimensionality. For example, in random projection, this warning is raised when the number of components, which quantifies the dimensionality of the target projection space, is higher than the number of features, which quantifies the dimensionality of the original source space, to imply that the dimensionality of the problem will not be reduced. Changed in version 0.18: Moved from

FeatureHasher and DictVectorizer Comparison

Compares FeatureHasher and DictVectorizer by using both to vectorize text documents. The example demonstrates syntax and speed only; it doesn?t actually do anything useful with the extracted vectors. See the example scripts {document_classification_20newsgroups,clustering}.py for actual learning on text documents. A discrepancy between the number of terms reported for DictVectorizer and for FeatureHasher is to be expected due to hash collisions. # Author: Lars Buitinck # License: BSD 3 clause

mixture.VBGMM()

Warning DEPRECATED class sklearn.mixture.VBGMM(*args, **kwargs) [source] Variational Inference for the Gaussian Mixture Model Deprecated since version 0.18: This class will be removed in 0.20. Use sklearn.mixture.BayesianGaussianMixture with parameter weight_concentration_prior_type='dirichlet_distribution' instead. Variational inference for a Gaussian mixture model probability distribution. This class allows for easy and efficient inference of an approximate posterior distribution over

sklearn.datasets.load_lfw_people()

Warning DEPRECATED sklearn.datasets.load_lfw_people(*args, **kwargs) [source] DEPRECATED: Function ?load_lfw_people? has been deprecated in 0.17 and will be removed in 0.19.Use fetch_lfw_people(download_if_missing=False) instead. Alias for fetch_lfw_people(download_if_missing=False) Deprecated since version 0.17: This function will be removed in 0.19. Use sklearn.datasets.fetch_lfw_people with parameter download_if_missing=False instead. Check fetch_lfw_people.__doc__ for the documentat