class sklearn.mixture.BayesianGaussianMixture(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=None, mean_precision_prior=None, mean_prior=None, degrees_of_freedom_prior=None, covariance_prior=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10) [source] Variational Bayesian estimation of a Gaussian mixture. This class allows

Sample pipeline for text feature extraction and evaluation

The dataset used in this example is the 20 newsgroups dataset which will be automatically downloaded and then cached and reused for the document classification example. You can adjust the number of categories by giving their names to the dataset loader or setting them to None to get the 20 of them. Here is a sample output of a run on a quad-core machine: Loading 20 newsgroups dataset for categories: ['alt.atheism', 'talk.religion.misc'] 1427 documents 2 categories Performing grid search... pi

2.1. Gaussian mixture models

sklearn.mixture is a package which enables one to learn Gaussian Mixture Models (diagonal, spherical, tied and full covariance matrices supported), sample them, and estimate them from data. Facilities to help determine the appropriate number of components are also provided. Two-component Gaussian mixture model: data points, and equi-probability surfaces of the model. A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite

neighbors.RadiusNeighborsClassifier()

class sklearn.neighbors.RadiusNeighborsClassifier(radius=1.0, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', outlier_label=None, metric_params=None, **kwargs) [source] Classifier implementing a vote among neighbors within a given radius Read more in the User Guide. Parameters: radius : float, optional (default = 1.0) Range of parameter space to use by default for :meth`radius_neighbors` queries. weights : str or callable weight function used in prediction. P

FastICA on 2D point clouds

This example illustrates visually in the feature space a comparison by results using two different component analysis techniques. Independent component analysis (ICA) vs Principal component analysis (PCA). Representing ICA in the feature space gives the view of ?geometric ICA?: ICA is an algorithm that finds directions in the feature space corresponding to projections with high non-Gaussianity. These directions need not be orthogonal in the original feature space, but they are orthogonal in th

svm.SVR()

class sklearn.svm.SVR(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1) [source] Epsilon-Support Vector Regression. The free parameters in the model are C and epsilon. The implementation is based on libsvm. Read more in the User Guide. Parameters: C : float, optional (default=1.0) Penalty parameter C of the error term. epsilon : float, optional (default=0.1) Epsilon in the epsilon-SVR model. It spe

3.4. Model persistence

After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain. The following section gives you an example of how to persist a model with pickle. We?ll also review a few security and maintainability issues when working with pickle serialization. 3.4.1. Persistence example It is possible to save a model in the scikit by using Python?s built-in persistence model, namely pickle: >>> from sklearn import svm >>> fr

2.2. Manifold learning

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high. 2.2.1. Introduction High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension

4.3. Preprocessing data

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. 4.3.1. Standardization, or mean removal and variance scaling Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed da

qda.QDA()

Warning DEPRECATED class sklearn.qda.QDA(priors=None, reg_param=0.0, store_covariances=False, tol=0.0001) [source] Alias for sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis. Deprecated since version 0.17: This class will be removed in 0.19. Use sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis instead. Methods decision_function(X) Apply decision function to an array of samples. fit(X, y[, store_covariances, tol]) Fit the model according to the given training data