Multi-dimensional scaling

An illustration of the metric and non-metric MDS on generated noisy data. The reconstructed points using the metric MDS and non metric MDS are slightly shifted to avoid overlapping. # Author: Nelle Varoquaux <nelle.varoquaux@gmail.com> # License: BSD print(__doc__) import numpy as np from matplotlib import pyplot as plt from matplotlib.collections import LineCollection from sklearn import manifold from sklearn.metrics import euclidean_distances from sklearn.decomposition import PC

Segmenting the picture of a raccoon face in regions

This example uses Spectral clustering on a graph created from voxel-to-voxel difference on an image to break this image into multiple partly-homogeneous regions. This procedure (spectral clustering on an image) is an efficient approximate solution for finding normalized graph cuts. There are two options to assign labels: with ?kmeans? spectral clustering will cluster samples in the embedding space using a kmeans algorithm whereas ?discrete? will iteratively search for the closest partition spa

preprocessing.FunctionTransformer()

class sklearn.preprocessing.FunctionTransformer(func=None, inverse_func=None, validate=True, accept_sparse=False, pass_y=False, kw_args=None, inv_kw_args=None) [source] Constructs a transformer from an arbitrary callable. A FunctionTransformer forwards its X (and optionally y) arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc. A Functio

Model Complexity Influence

Demonstrate how model complexity influences both prediction accuracy and computational performance. The dataset is the Boston Housing dataset (resp. 20 Newsgroups) for regression (resp. classification). For each class of models we make the model complexity vary through the choice of relevant model parameters and measure the influence on both computational performance (latency) and predictive power (MSE or Hamming Loss). print(__doc__) # Author: Eustache Diemert <eustache@diemert.fr> # L

Comparing different clustering algorithms on toy datasets

This example aims at showing characteristics of different clustering algorithms on datasets that are ?interesting? but still in 2D. The last dataset is an example of a ?null? situation for clustering: the data is homogeneous, and there is no good clustering. While these examples give some intuition about the algorithms, this intuition might not apply to very high dimensional data. The results could be improved by tweaking the parameters for each clustering strategy, for instance setting the nu

sklearn.utils.shuffle()

sklearn.utils.shuffle(*arrays, **options) [source] Shuffle arrays or sparse matrices in a consistent way This is a convenience alias to resample(*arrays, replace=False) to do random permutations of the collections. Parameters: *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension. random_state : int or RandomState instance Control the shuffling for reproducible behavior. n_samp

decomposition.KernelPCA()

class sklearn.decomposition.KernelPCA(n_components=None, kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, alpha=1.0, fit_inverse_transform=False, eigen_solver='auto', tol=0, max_iter=None, remove_zero_eig=False, random_state=None, copy_X=True, n_jobs=1) [source] Kernel Principal component analysis (KPCA) Non-linear dimensionality reduction through the use of kernels (see Pairwise metrics, Affinities and Kernels). Read more in the User Guide. Parameters: n_components : in

Comparison of Manifold Learning methods

An illustration of dimensionality reduction on the S-curve dataset with various manifold learning methods. For a discussion and comparison of these algorithms, see the manifold module page For a similar example, where the methods are applied to a sphere dataset, see Manifold Learning methods on a severed sphere Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional spa

sklearn.metrics.classification_report()

sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2) [source] Build a text report showing the main classification metrics Read more in the User Guide. Parameters: y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) target values. y_pred : 1d array-like, or label indicator array / sparse matrix Estimated targets as returned by a classifier. labels : array, shape = [n_labels] Optional list

sklearn.metrics.zero_one_loss()

sklearn.metrics.zero_one_loss(y_true, y_pred, normalize=True, sample_weight=None) [source] Zero-one classification loss. If normalize is True, return the fraction of misclassifications (float), else it returns the number of misclassifications (int). The best performance is 0. Read more in the User Guide. Parameters: y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) labels. y_pred : 1d array-like, or label indicator array / sparse matrix Predicted la