Gaussian Processes regression

A simple one-dimensional regression example computed in two different ways: A noise-free case A noisy case with known noise-level per datapoint In both cases, the kernel?s parameters are estimated using the maximum likelihood principle. The figures illustrate the interpolating property of the Gaussian Process model as well as its probabilistic nature in the form of a pointwise 95% confidence interval. Note that the parameter alpha is applied as a Tikhonov regularization of the assumed covari

Probabilistic predictions with Gaussian process classification

This example illustrates the predicted probability of GPC for an RBF kernel with different choices of the hyperparameters. The first figure shows the predicted probability of GPC with arbitrarily chosen hyperparameters and with the hyperparameters corresponding to the maximum log-marginal-likelihood (LML). While the hyperparameters chosen by optimizing LML have a considerable larger LML, they perform slightly worse according to the log-loss on test data. The figure shows that this is because t

Automatic Relevance Determination Regression

Fit regression model with Bayesian Ridge Regression. See Bayesian Ridge Regression for more information on the regressor. Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them. The histogram of the estimated weights is very peaked, as a sparsity-inducing prior is implied on the weights. The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations. print(__doc__)

sklearn.cross_validation.train_test_split()

Warning DEPRECATED sklearn.cross_validation.train_test_split(*arrays, **options) [source] Split arrays or matrices into random train and test subsets Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.train_test_split instead. Quick utility that wraps input validation and next(iter(ShuffleSplit(n_samples))) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner. Read more in the User Gui

Gaussian Mixture Model Ellipsoids

Plot the confidence ellipsoids of a mixture of two Gaussians obtained with Expectation Maximisation (GaussianMixture class) and Variational Inference (BayesianGaussianMixture class models with a Dirichlet process prior). Both models have access to five components with which to fit the data. Note that the Expectation Maximisation model will necessarily use all five components while the Variational Inference model will effectively only use as many as are needed for a good fit. Here we can see th

1.1. Generalized Linear Models

The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if is the predicted value. Across the module, we designate the vector as coef_ and as intercept_. To perform classification with generalized linear models, see Logistic regression. 1.1.1. Ordinary Least Squares LinearRegression fits a linear model with coefficients to minimize the residual sum of squares between the

cluster.FeatureAgglomeration()

class sklearn.cluster.FeatureAgglomeration(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=) [source] Agglomerate features. Similar to AgglomerativeClustering, but recursively merges features instead of samples. Read more in the User Guide. Parameters: n_clusters : int, default 2 The number of clusters to find. connectivity : array-like or callable, optional Connectivity matrix. Defines for each

sklearn.model_selection.permutation_test_score()

sklearn.model_selection.permutation_test_score(estimator, X, y, groups=None, cv=None, n_permutations=100, n_jobs=1, random_state=0, verbose=0, scoring=None) [source] Evaluate the significance of a cross-validated score with permutations Read more in the User Guide. Parameters: estimator : estimator object implementing ?fit? The object to use to fit the data. X : array-like of shape at least 2D The data to fit. y : array-like The target variable to try to predict in the case of supervi

API Reference

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses. sklearn.base: Base classes and utility functions Base classes for all estimators. Base classes base.BaseEstimator Base class for all estimators in scikit-learn base.ClassifierMixin Mixin class for all classifiers in scikit-learn. base.ClusterMixin Mixin class for all cluster est

Species distribution modeling

Modeling species? geographic distributions is an important problem in conservation biology. In this example we model the geographic distribution of two south american mammals given past observations and 14 environmental variables. Since we have only positive examples (there are no unsuccessful observations), we cast this problem as a density estimation problem and use the OneClassSVM provided by the package sklearn.svm as our modeling tool. The dataset is provided by Phillips et. al. (2006). I