class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto') [source] K-Means clustering Read more in the User Guide. Parameters: n_clusters : int, optional, default: 8 The number of clusters to form as well as the number of centroids to generate. max_iter : int, default: 300 Maximum number of iterations of the k-means algorithm for a single run. n_in

class sklearn.cluster.FeatureAgglomeration(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=) [source] Agglomerate features. Similar to AgglomerativeClustering, but recursively merges features instead of samples. Read more in the User Guide. Parameters: n_clusters : int, default 2 The number of clusters to find. connectivity : array-like or callable, optional Connectivity matrix. Defines for each

cluster.DBSCAN()

class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric='euclidean', algorithm='auto', leaf_size=30, p=None, n_jobs=1) [source] Perform DBSCAN clustering from vector array or distance matrix. DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density. Read more in the User Guide. Parameters: eps : float, optional The maximum distance between two sample

cluster.Birch()

class sklearn.cluster.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True) [source] Implements the Birch clustering algorithm. Every new sample is inserted into the root of the Clustering Feature Tree. It is then clubbed together with the subcluster that has the centroid closest to the new sample. This is done recursively till it ends up at the subcluster of the leaf of the tree has the closest centroid. Read more in the User Guide. Parameters: threshold

cluster.bicluster.SpectralCoclustering()

class sklearn.cluster.bicluster.SpectralCoclustering(n_clusters=3, svd_method='randomized', n_svd_vecs=None, mini_batch=False, init='k-means++', n_init=10, n_jobs=1, random_state=None) [source] Spectral Co-Clustering algorithm (Dhillon, 2001). Clusters rows and columns of an array X to solve the relaxed normalized cut of the bipartite graph created from X as follows: the edge between row vertex i and column vertex j has weight X[i, j]. The resulting bicluster structure is block-diagonal, si

cluster.bicluster.SpectralBiclustering()

class sklearn.cluster.bicluster.SpectralBiclustering(n_clusters=3, method='bistochastic', n_components=6, n_best=3, svd_method='randomized', n_svd_vecs=None, mini_batch=False, init='k-means++', n_init=10, n_jobs=1, random_state=None) [source] Spectral biclustering (Kluger, 2003). Partitions rows and columns under the assumption that the data has an underlying checkerboard structure. For instance, if there are two row partitions and three column partitions, each row will belong to three bicl

cluster.AgglomerativeClustering()

class sklearn.cluster.AgglomerativeClustering(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=) [source] Agglomerative Clustering Recursively merges the pair of clusters that minimally increases a given linkage distance. Read more in the User Guide. Parameters: n_clusters : int, default=2 The number of clusters to find. connectivity : array-like or callable, optional Connectivity matrix. Defines

cluster.AffinityPropagation()

class sklearn.cluster.AffinityPropagation(damping=0.5, max_iter=200, convergence_iter=15, copy=True, preference=None, affinity='euclidean', verbose=False) [source] Perform Affinity Propagation Clustering of data. Read more in the User Guide. Parameters: damping : float, optional, default: 0.5 Damping factor between 0.5 and 1. convergence_iter : int, optional, default: 15 Number of iterations with no change in the number of estimated clusters that stops the convergence. max_iter : int,

Classifier comparison

A comparison of a several classifiers in scikit-learn on synthetic datasets. The point of this example is to illustrate the nature of decision boundaries of different classifiers. This should be taken with a grain of salt, as the intuition conveyed by these examples does not necessarily carry over to real datasets. Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better genera

Classification of text documents using sparse features

This is an example showing how scikit-learn can be used to classify documents by topics using a bag-of-words approach. This example uses a scipy.sparse matrix to store the features and demonstrates various classifiers that can efficiently handle sparse matrices. The dataset used in this example is the 20 newsgroups dataset. It will be automatically downloaded, then cached. The bar plot indicates the accuracy, training time (normalized) and test time (normalized) of each classifier. # Author: P