4.4. Unsupervised dimensionality reduction

If your number of features is high, it may be useful to reduce it with an unsupervised step prior to supervised steps. Many of the Unsupervised learning methods implement a transform method that can be used to reduce the dimensionality. Below we discuss two specific example of this pattern that are heavily used. Pipelining The unsupervised data reduction and the supervised estimator can be chained in one step. See Pipeline: chaining estimators. 4.4.1. PCA: principal component analysis decom

HuberRegressor vs Ridge on dataset with strong outliers

Fit Ridge and HuberRegressor on a dataset with outliers. The example shows that the predictions in ridge are strongly influenced by the outliers present in the dataset. The Huber regressor is less influenced by the outliers since the model uses the linear loss for these. As the parameter epsilon is increased for the Huber regressor, the decision function approaches that of the ridge. # Authors: Manoj Kumar mks542@nyu.edu # License: BSD 3 clause print(__doc__) import numpy as np import ma

neighbors.KNeighborsRegressor()

class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs) [source] Regression based on k-nearest neighbors. The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set. Read more in the User Guide. Parameters: n_neighbors : int, optional (default = 5) Number of neighbors to use by default for k_neighbors queries. we

discriminant_analysis.LinearDiscriminantAnalysis()

class sklearn.discriminant_analysis.LinearDiscriminantAnalysis(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001) [source] Linear Discriminant Analysis A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes? rule. The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix. The fitted model can also be used to reduce the dimension

sklearn.metrics.jaccard_similarity_score()

sklearn.metrics.jaccard_similarity_score(y_true, y_pred, normalize=True, sample_weight=None) [source] Jaccard similarity coefficient score The Jaccard index [1], or Jaccard similarity coefficient, defined as the size of the intersection divided by the size of the union of two label sets, is used to compare set of predicted labels for a sample to the corresponding set of labels in y_true. Read more in the User Guide. Parameters: y_true : 1d array-like, or label indicator array / sparse matr

covariance.OAS()

class sklearn.covariance.OAS(store_precision=True, assume_centered=False) [source] Oracle Approximating Shrinkage Estimator Read more in the User Guide. OAS is a particular form of shrinkage described in ?Shrinkage Algorithms for MMSE Covariance Estimation? Chen et al., IEEE Trans. on Sign. Proc., Volume 58, Issue 10, October 2010. The formula used here does not correspond to the one given in the article. It has been taken from the Matlab program available from the authors? webpage (http://

sklearn.learning_curve.learning_curve()

Warning DEPRECATED sklearn.learning_curve.learning_curve(estimator, X, y, train_sizes=array([ 0.1, 0.33, 0.55, 0.78, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch='all', verbose=0, error_score='raise') [source] Learning curve. Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.learning_curve instead. Determines cross-validated training and test scores for different training set sizes. A cross-validat

multioutput.MultiOutputRegressor()

class sklearn.multioutput.MultiOutputRegressor(estimator, n_jobs=1) [source] Multi target regression This strategy consists of fitting one regressor per target. This is a simple strategy for extending regressors that do not natively support multi-target regression. Parameters: estimator : estimator object An estimator object implementing fit and predict. n_jobs : int, optional, default=1 The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number o

gaussian_process.kernels.RationalQuadratic()

class sklearn.gaussian_process.kernels.RationalQuadratic(length_scale=1.0, alpha=1.0, length_scale_bounds=(1e-05, 100000.0), alpha_bounds=(1e-05, 100000.0)) [source] Rational Quadratic kernel. The RationalQuadratic kernel can be seen as a scale mixture (an infinite sum) of RBF kernels with different characteristic length-scales. It is parameterized by a length-scale parameter length_scale>0 and a scale mixture parameter alpha>0. Only the isotropic variant where length_scale is a scala

manifold.LocallyLinearEmbedding()

class sklearn.manifold.LocallyLinearEmbedding(n_neighbors=5, n_components=2, reg=0.001, eigen_solver='auto', tol=1e-06, max_iter=100, method='standard', hessian_tol=0.0001, modified_tol=1e-12, neighbors_algorithm='auto', random_state=None, n_jobs=1) [source] Locally Linear Embedding Read more in the User Guide. Parameters: n_neighbors : integer number of neighbors to consider for each point. n_components : integer number of coordinates for the manifold reg : float regularization const