sklearn.random_projection.johnson_lindenstrauss_min_dim()

sklearn.random_projection.johnson_lindenstrauss_min_dim(n_samples, eps=0.1) [source] Find a ?safe? number of components to randomly project to The distortion introduced by a random projection p only changes the distance between two points by a factor (1 +- eps) in an euclidean space with good probability. The projection p is an eps-embedding as defined by: (1 - eps) ||u - v||^2 < ||p(u) - p(v)||^2 < (1 + eps) ||u - v||^2 Where u and v are any rows taken from a dataset of shape [n_sam

sklearn.preprocessing.robust_scale()

sklearn.preprocessing.robust_scale(X, axis=0, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True) [source] Standardize a dataset along any axis Center to the median and component wise scale according to the interquartile range. Read more in the User Guide. Parameters: X : array-like The data to center and scale. axis : int (0 by default) axis used to compute the medians and IQR along. If 0, independently scale each feature, otherwise (if 1) scale each sample.

sklearn.preprocessing.normalize()

sklearn.preprocessing.normalize(X, norm='l2', axis=1, copy=True, return_norm=False) [source] Scale input vectors individually to unit norm (vector length). Read more in the User Guide. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] The data to normalize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy. norm : ?l1?, ?l2?, or ?max?, optional (?l2? by default) The norm to use to normalize each non zero sample (or e

sklearn.preprocessing.minmax_scale()

sklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True) [source] Transforms features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one. The transformation is given by: X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min where min, max = feature_range. This transformation is often used a

sklearn.preprocessing.label_binarize()

sklearn.preprocessing.label_binarize(y, classes, neg_label=0, pos_label=1, sparse_output=False) [source] Binarize labels in a one-vs-all fashion Several regression and binary classification algorithms are available in the scikit. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme. This function makes it possible to compute this transformation for a fixed set of class labels known ahead of time. Parameters: y : array-like

sklearn.preprocessing.maxabs_scale()

sklearn.preprocessing.maxabs_scale(X, axis=0, copy=True) [source] Scale each feature to the [-1, 1] range without breaking the sparsity. This estimator scales each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. This scaler can also be applied to sparse CSR or CSC matrices. Parameters: axis : int (0 by default) axis used to scale along. If 0, independently scale each feature, otherwise (if 1) scale each sample. copy : boolean, op

sklearn.preprocessing.add_dummy_feature()

sklearn.preprocessing.add_dummy_feature(X, value=1.0) [source] Augment dataset with an additional dummy feature. This is useful for fitting an intercept term with implementations which cannot otherwise fit it directly. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] Data. value : float Value to use for the dummy feature. Returns: X : {array, sparse matrix}, shape [n_samples, n_features + 1] Same data with dummy feature added as first column. Examples >

sklearn.preprocessing.binarize()

sklearn.preprocessing.binarize(X, threshold=0.0, copy=True) [source] Boolean thresholding of array-like or scipy.sparse matrix Read more in the User Guide. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] The data to binarize, element by element. scipy.sparse matrices should be in CSR or CSC format to avoid an un-necessary copy. threshold : float, optional (0.0 by default) Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not

sklearn.neighbors.radius_neighbors_graph()

sklearn.neighbors.radius_neighbors_graph(X, radius, mode='connectivity', metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=1) [source] Computes the (weighted) graph of Neighbors for points in X Neighborhoods are restricted the points at a distance lower than radius. Read more in the User Guide. Parameters: X : array-like or BallTree, shape = [n_samples, n_features] Sample data, in the form of a numpy array or a precomputed BallTree. radius : float Radius of neighbo

sklearn.pipeline.make_union()

sklearn.pipeline.make_union(*transformers) [source] Construct a FeatureUnion from the given transformers. This is a shorthand for the FeatureUnion constructor; it does not require, and does not permit, naming the transformers. Instead, they will be given names automatically based on their types. It also does not allow weighting. Returns: f : FeatureUnion Examples >>> from sklearn.decomposition import PCA, TruncatedSVD >>> make_union(PCA(), TruncatedSVD()) FeatureUnion