Density Estimation for a Gaussian mixture

Plot the density estimation of a mixture of two Gaussians. Data is generated from two Gaussians with different centers and covariance matrices. import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import LogNorm from sklearn import mixture n_samples = 300 # generate random sample, two components np.random.seed(0) # generate spherical data centered on (20, 20) shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20]) # generate zero centered stretched Ga

Demonstration of k-means assumptions

This example is meant to illustrate situations where k-means will produce unintuitive and possibly unexpected clusters. In the first three plots, the input data does not conform to some implicit assumption that k-means makes and undesirable clusters are produced as a result. In the last plot, k-means returns intuitive clusters despite unevenly sized blobs. print(__doc__) # Author: Phil Roth <mr.phil.roth@gmail.com> # License: BSD 3 clause import numpy as np import matplotlib.pyplot

Demo of DBSCAN clustering algorithm

Finds core samples of high density and expands clusters from them. print(__doc__) import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs from sklearn.preprocessing import StandardScaler Generate sample data centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4, random_state=0) X = StandardScaler().fit_transform(X) Comp

Demo of affinity propagation clustering algorithm

Reference: Brendan J. Frey and Delbert Dueck, ?Clustering by Passing Messages Between Data Points?, Science Feb. 2007 print(__doc__) from sklearn.cluster import AffinityPropagation from sklearn import metrics from sklearn.datasets.samples_generator import make_blobs Generate sample data centers = [[1, 1], [-1, -1], [1, -1]] X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5, random_state=0) Compute Affinity Propagation af = AffinityPropag

decomposition.TruncatedSVD()

class sklearn.decomposition.TruncatedSVD(n_components=2, algorithm='randomized', n_iter=5, random_state=None, tol=0.0) [source] Dimensionality reduction using truncated SVD (aka LSA). This transformer performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Contrary to PCA, this estimator does not center the data before computing the singular value decomposition. This means it can work with scipy.sparse matrices efficiently. In particular, truncated

decomposition.SparsePCA()

class sklearn.decomposition.SparsePCA(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e-08, method='lars', n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None) [source] Sparse Principal Components Analysis (SparsePCA) Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha. Read more in the User Guide. Parameters: n_components :

decomposition.SparseCoder()

class sklearn.decomposition.SparseCoder(dictionary, transform_algorithm='omp', transform_n_nonzero_coefs=None, transform_alpha=None, split_sign=False, n_jobs=1) [source] Sparse coding Finds a sparse representation of data against a fixed, precomputed dictionary. Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that: X ~= code * dictionary Read more in the User Guide. Parameters: dictionary : array, [n_components, n_features]

decomposition.RandomizedPCA()

Warning DEPRECATED class sklearn.decomposition.RandomizedPCA(*args, **kwargs) [source] Principal component analysis (PCA) using randomized SVD Deprecated since version 0.18: This class will be removed in 0.20. Use PCA with parameter svd_solver ?randomized? instead. The new implementation DOES NOT store whiten components_. Apply transform to get them. Linear dimensionality reduction using approximated Singular Value Decomposition of the data and keeping only the most significant singular

decomposition.ProjectedGradientNMF()

class sklearn.decomposition.ProjectedGradientNMF(*args, **kwargs) [source] Non-Negative Matrix Factorization (NMF) Find two non-negative matrices (W, H) whose product approximates the non- negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction. The objective function is: 0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alph

decomposition.PCA()

class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None) [source] Principal component analysis (PCA) Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of compone