Automatic Relevance Determination Regression

Fit regression model with Bayesian Ridge Regression. See Bayesian Ridge Regression for more information on the regressor. Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them. The histogram of the estimated weights is very peaked, as a sparsity-inducing prior is implied on the weights. The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations. print(__doc__)

Gaussian Mixture Model Ellipsoids

Plot the confidence ellipsoids of a mixture of two Gaussians obtained with Expectation Maximisation (GaussianMixture class) and Variational Inference (BayesianGaussianMixture class models with a Dirichlet process prior). Both models have access to five components with which to fit the data. Note that the Expectation Maximisation model will necessarily use all five components while the Variational Inference model will effectively only use as many as are needed for a good fit. Here we can see th

gaussian_process.GaussianProcessRegressor()

class sklearn.gaussian_process.GaussianProcessRegressor(kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None) [source] Gaussian process regression (GPR). The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams. In addition to standard scikit-learn estimator API, GaussianProcessRegressor: allows prediction without prior fitting (based on the GP pri

Gaussian process regression on Mauna Loa CO2 data.

This example is based on Section 5.4.3 of ?Gaussian Processes for Machine Learning? [RW2006]. It illustrates an example of complex kernel engineering and hyperparameter optimization using gradient ascent on the log-marginal-likelihood. The data consists of the monthly average atmospheric CO2 concentrations (in parts per million by volume (ppmv)) collected at the Mauna Loa Observatory in Hawaii, between 1958 and 1997. The objective is to model the CO2 concentration as a function of the time t.

cluster.AgglomerativeClustering()

class sklearn.cluster.AgglomerativeClustering(n_clusters=2, affinity='euclidean', memory=Memory(cachedir=None), connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func=) [source] Agglomerative Clustering Recursively merges the pair of clusters that minimally increases a given linkage distance. Read more in the User Guide. Parameters: n_clusters : int, default=2 The number of clusters to find. connectivity : array-like or callable, optional Connectivity matrix. Defines

neural_network.MLPClassifier()

class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08) [source] Multi-layer Perceptron classifier. This model optimizes the log-

sklearn.cluster.ward_tree()

sklearn.cluster.ward_tree(X, connectivity=None, n_clusters=None, return_distance=False) [source] Ward clustering based on a Feature matrix. Recursively merges the pair of clusters that minimally increases within-cluster variance. The inertia matrix uses a Heapq-based representation. This is the structured version, that takes into account some topological structure between samples. Read more in the User Guide. Parameters: X : array, shape (n_samples, n_features) feature matrix representing

sklearn.feature_selection.f_regression()

sklearn.feature_selection.f_regression(X, y, center=True) [source] Univariate linear regression tests. Quick linear model for testing the effect of a single regressor, sequentially for many regressors. This is done in 2 steps: The cross correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) * std(y)). It is converted to an F score then to a p-value. Read more in the User Guide. Parameters: X : {array-like, sparse m

Compare cross decomposition methods

Simple usage of various cross decomposition algorithms: - PLSCanonical - PLSRegression, with multivariate response, a.k.a. PLS2 - PLSRegression, with univariate response, a.k.a. PLS1 - CCA Given 2 multivariate covarying two-dimensional datasets, X, and Y, PLS extracts the ?directions of covariance?, i.e. the components of each datasets that explain the most shared variance between both datasets. This is apparent on the scatterplot matrix display: components 1 in dataset X and dataset Y are max

sklearn.datasets.fetch_20newsgroups()

sklearn.datasets.fetch_20newsgroups(data_home=None, subset='train', categories=None, shuffle=True, random_state=42, remove=(), download_if_missing=True) [source] Load the filenames and data from the 20 newsgroups dataset. Read more in the User Guide. Parameters: subset : ?train? or ?test?, ?all?, optional Select the dataset to load: ?train? for the training set, ?test? for the test set, ?all? for both, with shuffled ordering. data_home : optional, default: None Specify a download and ca