mixture.GaussianMixture()

class sklearn.mixture.GaussianMixture(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10) [source] Gaussian Mixture. Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution. New in version 0.18. GaussianMixture.

sklearn.metrics.pairwise_distances_argmin()

sklearn.metrics.pairwise_distances_argmin(X, Y, axis=1, metric='euclidean', batch_size=500, metric_kwargs=None) [source] Compute minimum distances between one point and a set of points. This function computes for each row in X, the index of the row of Y which is closest (according to the specified distance). This is mostly equivalent to calling: pairwise_distances(X, Y=Y, metric=metric).argmin(axis=axis) but uses much less memory, and is faster for large arrays. This function works with de

linear_model.Lars()

class sklearn.linear_model.Lars(fit_intercept=True, verbose=False, normalize=True, precompute='auto', n_nonzero_coefs=500, eps=2.2204460492503131e-16, copy_X=True, fit_path=True, positive=False) [source] Least Angle Regression model a.k.a. LAR Read more in the User Guide. Parameters: n_nonzero_coefs : int, optional Target number of non-zero coefficients. Use np.inf for no limit. fit_intercept : boolean Whether to calculate the intercept for this model. If set to false, no intercept will

1.5. Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. SGD has been successfully applied to large-scale and sparse machine learning problems often e

svm.SVC()

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None) [source] C-Support Vector Classification. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support

feature_selection.SelectFwe()

class sklearn.feature_selection.SelectFwe(score_func=, alpha=0.05) [source] Filter: Select the p-values corresponding to Family-wise error rate Read more in the User Guide. Parameters: score_func : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below ?See also?). The default function only works with classification tasks. alpha : float, optional The highest uncorrected p-value for features to keep. Attributes: s

linear_model.RidgeCV()

class sklearn.linear_model.RidgeCV(alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, store_cv_values=False) [source] Ridge regression with built-in cross-validation. By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Read more in the User Guide. Parameters: alphas : numpy array of shape [n_alphas] Array of alpha values to try. Regularization strength; must be a positive float.

sklearn.utils.resample()

sklearn.utils.resample(*arrays, **options) [source] Resample arrays or sparse matrices in a consistent way The default strategy implements one step of the bootstrapping procedure. Parameters: *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension. replace : boolean, True by default Implements resampling with replacement. If False, this will implement (sliced) random permutations.

Pixel importances with a parallel forest of trees

This example shows the use of forests of trees to evaluate the importance of the pixels in an image classification task (faces). The hotter the pixel, the more important. The code below also illustrates how the construction and the computation of the predictions can be parallelized within multiple jobs. Out: Fitting ExtraTreesClassifier on faces data with 1 cores... done in 3.341s print(__doc__) from time import time import matplotlib.pyplot as plt from sklearn.datasets import fetch_

sklearn.preprocessing.add_dummy_feature()

sklearn.preprocessing.add_dummy_feature(X, value=1.0) [source] Augment dataset with an additional dummy feature. This is useful for fitting an intercept term with implementations which cannot otherwise fit it directly. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] Data. value : float Value to use for the dummy feature. Returns: X : {array, sparse matrix}, shape [n_samples, n_features + 1] Same data with dummy feature added as first column. Examples >