class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True) [source] Transforms features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one. The transformation is given by: X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min where min, max = feature_range. This transformation is often used as an

class sklearn.preprocessing.MaxAbsScaler(copy=True) [source] Scale each feature by its maximum absolute value. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity. This scaler can also be applied to sparse CSR or CSC matrices. New in version 0.17. Parameters: copy : boolean, optional, default is True Set to False to pe

preprocessing.LabelEncoder

class sklearn.preprocessing.LabelEncoder [source] Encode labels with value between 0 and n_classes-1. Read more in the User Guide. Attributes: classes_ : array of shape (n_class,) Holds the label for each class. See also sklearn.preprocessing.OneHotEncoder encode categorical integer features using a one-hot aka one-of-K scheme. Examples LabelEncoder can be used to normalize labels. >>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>

preprocessing.LabelBinarizer()

class sklearn.preprocessing.LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False) [source] Binarize labels in a one-vs-all fashion Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme. At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels

preprocessing.KernelCenterer

class sklearn.preprocessing.KernelCenterer [source] Center a kernel matrix Let K(x, z) be a kernel defined by phi(x)^T phi(z), where phi is a function mapping x to a Hilbert space. KernelCenterer centers (i.e., normalize to have zero mean) the data without explicitly computing phi(x). It is equivalent to centering phi(x) with sklearn.preprocessing.StandardScaler(with_std=False). Read more in the User Guide. Methods fit(K[, y]) Fit KernelCenterer fit_transform(X[, y]) Fit to data, then tra

preprocessing.Imputer()

class sklearn.preprocessing.Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True) [source] Imputation transformer for completing missing values. Read more in the User Guide. Parameters: missing_values : integer or ?NaN?, optional (default=?NaN?) The placeholder for the missing values. All occurrences of missing_values will be imputed. For missing values encoded as np.nan, use the string value ?NaN?. strategy : string, optional (default=?mean?) The imputation strat

preprocessing.FunctionTransformer()

class sklearn.preprocessing.FunctionTransformer(func=None, inverse_func=None, validate=True, accept_sparse=False, pass_y=False, kw_args=None, inv_kw_args=None) [source] Constructs a transformer from an arbitrary callable. A FunctionTransformer forwards its X (and optionally y) arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc. A Functio

preprocessing.Binarizer()

class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True) [source] Binarize data (set feature values to 0 or 1) according to a threshold Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1. Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurrences f

Prediction Latency

This is an example showing the prediction latency of various scikit-learn estimators. The goal is to measure the latency one can expect when doing predictions either in bulk or atomic (i.e. one by one) mode. The plots represent the distribution of the prediction latency as a boxplot. # Authors: Eustache Diemert <eustache@diemert.fr> # License: BSD 3 clause from __future__ import print_function from collections import defaultdict import time import gc import numpy as np import matplotli

Prediction Intervals for Gradient Boosting Regression

This example shows how quantile regression can be used to create prediction intervals. import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import GradientBoostingRegressor np.random.seed(1) def f(x): """The function to predict.""" return x * np.sin(x) #---------------------------------------------------------------------- # First the noiseless case X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T X = X.astype(np.float32) # Observations y = f(X).