sklearn.datasets.make_biclusters()

sklearn.datasets.make_biclusters(shape, n_clusters, noise=0.0, minval=10, maxval=100, shuffle=True, random_state=None) [source] Generate an array with constant block diagonal structure for biclustering. Read more in the User Guide. Parameters: shape : iterable (n_rows, n_cols) The shape of the result. n_clusters : integer The number of biclusters. noise : float, optional (default=0.0) The standard deviation of the gaussian noise. minval : int, optional (default=10) Minimum value of

Ledoit-Wolf vs OAS estimation

The usual covariance maximum likelihood estimate can be regularized using shrinkage. Ledoit and Wolf proposed a close formula to compute the asymptotically optimal shrinkage parameter (minimizing a MSE criterion), yielding the Ledoit-Wolf covariance estimate. Chen et al. proposed an improvement of the Ledoit-Wolf shrinkage parameter, the OAS coefficient, whose convergence is significantly better under the assumption that the data are Gaussian. This example, inspired from Chen?s publication [1]

feature_selection.SelectFpr()

class sklearn.feature_selection.SelectFpr(score_func=, alpha=0.05) [source] Filter: Select the pvalues below alpha based on a FPR test. FPR test stands for False Positive Rate test. It controls the total amount of false detections. Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below ?See also?). The default function only works with classification tasks. alpha :

cross_validation.LeavePLabelOut()

Warning DEPRECATED class sklearn.cross_validation.LeavePLabelOut(labels, p) [source] Leave-P-Label_Out cross-validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeavePGroupsOut instead. Provides train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the labels could be the year

Image denoising using dictionary learning

An example comparing the effect of reconstructing noisy fragments of a raccoon face image using firstly online Dictionary Learning and various transform methods. The dictionary is fitted on the distorted left half of the image, and subsequently used to reconstruct the right half. Note that even better performance could be achieved by fitting to an undistorted (i.e. noiseless) image, but here we start from the assumption that it is not available. A common practice for evaluating the results of

gaussian_process.kernels.Sum()

class sklearn.gaussian_process.kernels.Sum(k1, k2) [source] Sum-kernel k1 + k2 of two kernels k1 and k2. The resulting kernel is defined as k_sum(X, Y) = k1(X, Y) + k2(X, Y) New in version 0.18. Parameters: k1 : Kernel object The first base-kernel of the sum-kernel k2 : Kernel object The second base-kernel of the sum-kernel Methods clone_with_theta(theta) Returns a clone of self with given hyperparameters theta. diag(X) Returns the diagonal of the kernel k(X, X). get_params([deep

manifold.TSNE()

class sklearn.manifold.TSNE(n_components=2, perplexity=30.0, early_exaggeration=4.0, learning_rate=1000.0, n_iter=1000, n_iter_without_progress=30, min_grad_norm=1e-07, metric='euclidean', init='random', verbose=0, random_state=None, method='barnes_hut', angle=0.5) [source] t-distributed Stochastic Neighbor Embedding. t-SNE [1] is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergen

Selecting the number of clusters with silhouette analysis on KMeans clustering

Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters like number of clusters visually. This measure has a range of [-1, 1]. Silhouette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that t

RBF SVM parameters

This example illustrates the effect of the parameters gamma and C of the Radial Basis Function (RBF) kernel SVM. Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ?far? and high values meaning ?close?. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. The C parameter trades off misclassification of training examples against simplicity of the d

sklearn.datasets.make_hastie_10_2()

sklearn.datasets.make_hastie_10_2(n_samples=12000, random_state=None) [source] Generates data for binary classification used in Hastie et al. 2009, Example 10.2. The ten features are standard independent Gaussian and the target y is defined by: y[i] = 1 if np.sum(X[i] ** 2) > 9.34 else -1 Read more in the User Guide. Parameters: n_samples : int, optional (default=12000) The number of samples. random_state : int, RandomState instance or None, optional (default=None) If int, random_st