sklearn.feature_selection.f_classif()

sklearn.feature_selection.f_classif(X, y) [source] Compute the ANOVA F-value for the provided sample. Read more in the User Guide. Parameters: X : {array-like, sparse matrix} shape = [n_samples, n_features] The set of regressors that will be tested sequentially. y : array of shape(n_samples) The data matrix. Returns: F : array, shape = [n_features,] The set of F values. pval : array, shape = [n_features,] The set of p-values. See also chi2 Chi-squared stats of non-negative f

sklearn.metrics.pairwise.sigmoid_kernel()

sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, gamma=None, coef0=1) [source] Compute the sigmoid kernel between X and Y: K(X, Y) = tanh(gamma <X, Y> + coef0) Read more in the User Guide. Parameters: X : ndarray of shape (n_samples_1, n_features) Y : ndarray of shape (n_samples_2, n_features) gamma : float, default None If None, defaults to 1.0 / n_samples_1 coef0 : int, default 1 Returns: Gram matrix : array of shape (n_samples_1, n_samples_2)

svm.SVC()

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None) [source] C-Support Vector Classification. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support

sklearn.metrics.consensus_score()

sklearn.metrics.consensus_score(a, b, similarity='jaccard') [source] The similarity of two sets of biclusters. Similarity between individual biclusters is computed. Then the best matching between sets is found using the Hungarian algorithm. The final score is the sum of similarities divided by the size of the larger set. Read more in the User Guide. Parameters: a : (rows, columns) Tuple of row and column indicators for a set of biclusters. b : (rows, columns) Another set of biclusters l

linear_model.LassoCV()

class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1, positive=False, random_state=None, selection='cyclic') [source] Lasso linear model with iterative fitting along a regularization path The best model is selected by cross-validation. The optimization objective for Lasso is: (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1 Read more i

sklearn.utils.resample()

sklearn.utils.resample(*arrays, **options) [source] Resample arrays or sparse matrices in a consistent way The default strategy implements one step of the bootstrapping procedure. Parameters: *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension. replace : boolean, True by default Implements resampling with replacement. If False, this will implement (sliced) random permutations.

sklearn.datasets.get_data_home()

sklearn.datasets.get_data_home(data_home=None) [source] Return the path of the scikit-learn data dir. This folder is used by some large dataset loaders to avoid downloading the data several times. By default the data dir is set to a folder named ?scikit_learn_data? in the user home folder. Alternatively, it can be set by the ?SCIKIT_LEARN_DATA? environment variable or programmatically by giving an explicit folder path. The ?~? symbol is expanded to the user home folder. If the folder does n

sklearn.datasets.make_classification()

sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None) [source] Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of a 2 * class_sep-sided hypercube, and assigns an equal number of clusters to each cla

Density Estimation for a Gaussian mixture

Plot the density estimation of a mixture of two Gaussians. Data is generated from two Gaussians with different centers and covariance matrices. import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import LogNorm from sklearn import mixture n_samples = 300 # generate random sample, two components np.random.seed(0) # generate spherical data centered on (20, 20) shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20]) # generate zero centered stretched Ga

sklearn.covariance.ledoit_wolf()

sklearn.covariance.ledoit_wolf(X, assume_centered=False, block_size=1000) [source] Estimates the shrunk Ledoit-Wolf covariance matrix. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Data from which to compute the covariance estimate assume_centered : boolean, default=False If True, data are not centered before computation. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, data are centered before