class sklearn.decomposition.FactorAnalysis(n_components=None, tol=0.01, copy=True, max_iter=1000, noise_variance_init=None, svd_method='randomized', iterated_power=3, random_state=0) [source] Factor Analysis (FA) A simple linear generative model with Gaussian latent variables. The observations are assumed to be caused by a linear transformation of lower dimensional latent factors and added Gaussian noise. Without loss of generality the factors are distributed according to a Gaussian with ze

Cross-validation on diabetes Dataset Exercise

A tutorial exercise which uses cross-validation with linear models. This exercise is used in the Cross-validated estimators part of the Model selection: choosing estimators and their parameters section of the A tutorial on statistical-learning for scientific data processing. from __future__ import print_function print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.linear_model import LassoCV from sklearn.linear_model import Lasso from sk

sklearn.preprocessing.robust_scale()

sklearn.preprocessing.robust_scale(X, axis=0, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True) [source] Standardize a dataset along any axis Center to the median and component wise scale according to the interquartile range. Read more in the User Guide. Parameters: X : array-like The data to center and scale. axis : int (0 by default) axis used to compute the medians and IQR along. If 0, independently scale each feature, otherwise (if 1) scale each sample.

1.6. Nearest Neighbors

sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in

cross_validation.ShuffleSplit()

Warning DEPRECATED class sklearn.cross_validation.ShuffleSplit(n, n_iter=10, test_size=0.1, train_size=None, random_state=None) [source] Random permutation cross-validation iterator. Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.ShuffleSplit instead. Yields indices to split data into training and test sets. Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this

sklearn.metrics.pairwise.manhattan_distances()

sklearn.metrics.pairwise.manhattan_distances(X, Y=None, sum_over_features=True, size_threshold=500000000.0) [source] Compute the L1 distances between the vectors in X and Y. With sum_over_features equal to False it returns the componentwise distances. Read more in the User Guide. Parameters: X : array_like An array with shape (n_samples_X, n_features). Y : array_like, optional An array with shape (n_samples_Y, n_features). sum_over_features : bool, default=True If True the function re

1.15. Isotonic regression

The class IsotonicRegression fits a non-decreasing function to data. It solves the following problem: minimize subject to where each is strictly positive and each is an arbitrary real number. It yields the vector which is composed of non-decreasing elements the closest in terms of mean squared error. In practice this list of elements forms a function that is piecewise linear.

sklearn.feature_extraction.image.reconstruct_from_patches_2d()

sklearn.feature_extraction.image.reconstruct_from_patches_2d(patches, image_size) [source] Reconstruct the image from all of its patches. Patches are assumed to overlap and the image is constructed by filling in the patches from left to right, top to bottom, averaging the overlapping regions. Read more in the User Guide. Parameters: patches : array, shape = (n_patches, patch_height, patch_width) or (n_patches, patch_height, patch_width, n_channels) The complete set of patches. If the patc

Decision Tree Regression

A 1D regression with decision tree. The decision trees is used to fit a sine curve with addition noisy observation. As a result, it learns local linear regressions approximating the sine curve. We can see that if the maximum depth of the tree (controlled by the max_depth parameter) is set too high, the decision trees learn too fine details of the training data and learn from the noise, i.e. they overfit. print(__doc__) # Import the necessary modules and libraries import numpy as np from s

cross_validation.LeaveOneLabelOut()

Warning DEPRECATED class sklearn.cross_validation.LeaveOneLabelOut(labels) [source] Leave-One-Label_Out cross-validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeaveOneGroupOut instead. Provides train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the labels could be the ye