1.15. Isotonic regression

The class IsotonicRegression fits a non-decreasing function to data. It solves the following problem: minimize subject to where each is strictly positive and each is an arbitrary real number. It yields the vector which is composed of non-decreasing elements the closest in terms of mean squared error. In practice this list of elements forms a function that is piecewise linear.

1.13. Feature selection

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators? accuracy scores or to boost their performance on very high-dimensional datasets. 1.13.1. Removing features with low variance VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn?t meet some threshold. By default, it removes all zero-variance features, i.e. features that have th

1.12. Multiclass and multilabel algorithms

Warning All classifiers in scikit-learn do multiclass classification out-of-the-box. You don?t need to use the sklearn.multiclass module unless you want to experiment with different multiclass strategies. The sklearn.multiclass module implements meta-estimators to solve multiclass and multilabel classification problems by decomposing such problems into binary classification problems. Multitarget regression is also supported. Multiclass classification means a classification task with more t

1.11. Ensemble methods

The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. Two families of ensemble methods are usually distinguished: In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its varianc

1.10. Decision Trees

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model. Some advantages of deci

1.9. Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes? theorem with the ?naive? assumption of independence between every pair of features. Given a class variable and a dependent feature vector through , Bayes? theorem states the following relationship: Using the naive independence assumption that for all , this relationship is simplified to Since is constant given the input, we can use the following classification rule: and we can use Maximum A

1.8. Cross decomposition

The cross decomposition module contains two main families of algorithms: the partial least squares (PLS) and the canonical correlation analysis (CCA). These families of algorithms are useful to find linear relations between two multivariate datasets: the X and Y arguments of the fit method are 2D arrays. Cross decomposition algorithms find the fundamental relations between two matrices (X and Y). They are latent variable approaches to modeling the covariance structures in these two spaces.

1.7. Gaussian Processes

Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems. The advantages of Gaussian processes are: The prediction interpolates the observations (at least for regular kernels). The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and decide based on those if one should refit (online fitting, adaptive fitting) the prediction in some region of interest. Versatile: differen

1.6. Nearest Neighbors

sklearn.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in

1.5. Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning. SGD has been successfully applied to large-scale and sparse machine learning problems often e