sklearn.cluster.spectral_clustering()

sklearn.cluster.spectral_clustering(affinity, n_clusters=8, n_components=None, eigen_solver=None, random_state=None, n_init=10, eigen_tol=0.0, assign_labels='kmeans') [source] Apply clustering to a projection to the normalized laplacian. In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance wh

sklearn.cluster.mean_shift()

sklearn.cluster.mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, max_iter=300, n_jobs=1) [source] Perform mean shift clustering of data using a flat kernel. Read more in the User Guide. Parameters: X : array-like, shape=[n_samples, n_features] Input data. bandwidth : float, optional Kernel bandwidth. If bandwidth is not given, it is determined using a heuristic based on the median of all pairwise distances. This will take quadratic time in t

sklearn.cluster.k_means()

sklearn.cluster.k_means(X, n_clusters, init='k-means++', precompute_distances='auto', n_init=10, max_iter=300, verbose=False, tol=0.0001, random_state=None, copy_x=True, n_jobs=1, algorithm='auto', return_n_iter=False) [source] K-means clustering algorithm. Read more in the User Guide. Parameters: X : array-like or sparse matrix, shape (n_samples, n_features) The observations to cluster. n_clusters : int The number of clusters to form as well as the number of centroids to generate. max

sklearn.cluster.estimate_bandwidth()

sklearn.cluster.estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0, n_jobs=1) [source] Estimate the bandwidth to use with the mean-shift algorithm. That this function takes time at least quadratic in n_samples. For large datasets, it?s wise to set that parameter to a small value. Parameters: X : array-like, shape=[n_samples, n_features] Input points. quantile : float, default 0.3 should be between [0, 1] 0.5 means that the median of all pairwise distances is used. n_sam

sklearn.cluster.affinity_propagation()

sklearn.cluster.affinity_propagation(S, preference=None, convergence_iter=15, max_iter=200, damping=0.5, copy=True, verbose=False, return_n_iter=False) [source] Perform Affinity Propagation Clustering of data Read more in the User Guide. Parameters: S : array-like, shape (n_samples, n_samples) Matrix of similarities between points preference : array-like, shape (n_samples,) or float, optional Preferences for each point - points with larger values of preferences are more likely to be cho

sklearn.calibration.calibration_curve()

sklearn.calibration.calibration_curve(y_true, y_prob, normalize=False, n_bins=5) [source] Compute true and predicted probabilities for a calibration curve. Read more in the User Guide. Parameters: y_true : array, shape (n_samples,) True targets. y_prob : array, shape (n_samples,) Probabilities of the positive class. normalize : bool, optional, default=False Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not a proper probability. If True, the smallest value in y_pro

sklearn.base.clone()

sklearn.base.clone(estimator, safe=True) [source] Constructs a new estimator with the same parameters. Clone does a deep copy of the model in an estimator without actually copying attached data. It yields a new estimator with the same parameters that has not been fit on any data. Parameters: estimator: estimator object, or list, tuple or set of objects : The estimator or group of estimators to be cloned safe: boolean, optional : If safe is false, clone will fall back to a deepcopy on ob

Single estimator versus bagging

This example illustrates and compares the bias-variance decomposition of the expected mean squared error of a single estimator against a bagging ensemble. In regression, the expected mean squared error of an estimator can be decomposed in terms of bias, variance and noise. On average over datasets of the regression problem, the bias term measures the average amount by which the predictions of the estimator differ from the predictions of the best possible estimator for the problem (i.e., the Ba

Simple 1D Kernel Density Estimation

This example uses the sklearn.neighbors.KernelDensity class to demonstrate the principles of Kernel Density Estimation in one dimension. The first plot shows one of the problems with using histograms to visualize the density of points in 1D. Intuitively, a histogram can be thought of as a scheme in which a unit ?block? is stacked above each point on a regular grid. As the top two panels show, however, the choice of gridding for these blocks can lead to wildly divergent ideas about the underlyi

Shrinkage covariance estimation

When working with covariance estimation, the usual approach is to use a maximum likelihood estimator, such as the sklearn.covariance.EmpiricalCovariance. It is unbiased, i.e. it converges to the true (population) covariance when given many observations. However, it can also be beneficial to regularize it, in order to reduce its variance; this, in turn, introduces some bias. This example illustrates the simple regularization used in Shrunk Covariance estimators. In particular, it focuses on how