class sklearn.naive_bayes.GaussianNB(priors=None) [source] Gaussian Naive Bayes (GaussianNB) Can perform online updates to model parameters via partial_fit method. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque: http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf Read more in the User Guide. Parameters: priors : array-like, shape (n_classes,) Prior probabilities of the clas

random_projection.SparseRandomProjection()

class sklearn.random_projection.SparseRandomProjection(n_components='auto', density='auto', eps=0.1, dense_output=False, random_state=None) [source] Reduce dimensionality through sparse random projection Sparse random matrix is an alternative to dense random projection matrix that guarantees similar embedding quality while being much more memory efficient and allowing faster computation of the projected data. If we note s = 1 / density the components of the random matrix are drawn from: -s

lda.LDA()

Warning DEPRECATED class sklearn.lda.LDA(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001) [source] Alias for sklearn.discriminant_analysis.LinearDiscriminantAnalysis. Deprecated since version 0.17: This class will be removed in 0.19. Use sklearn.discriminant_analysis.LinearDiscriminantAnalysis instead. Methods decision_function(X) Predict confidence scores for samples. fit(X, y[, store_covariance, tol]) Fit LinearDiscriminantAnalysis mo

naive_bayes.BernoulliNB()

class sklearn.naive_bayes.BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None) [source] Naive Bayes classifier for multivariate Bernoulli models. Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features. Read more in the User Guide. Parameters: alpha : float, optional (default=1.0) Additive (Laplace/Lidstone) smoothing parameter (0 for no

preprocessing.MinMaxScaler()

class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True) [source] Transforms features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one. The transformation is given by: X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min where min, max = feature_range. This transformation is often used as an

isotonic.IsotonicRegression()

class sklearn.isotonic.IsotonicRegression(y_min=None, y_max=None, increasing=True, out_of_bounds='nan') [source] Isotonic regression model. The isotonic regression optimization problem is defined by: min sum w_i (y[i] - y_[i]) ** 2 subject to y_[i] <= y_[j] whenever X[i] <= X[j] and min(y_) = y_min, max(y_) = y_max where: y[i] are inputs (real numbers) y_[i] are fitted X specifies the order. If X is non-decreasing then y_ is non-decreasing. w[i] are optional strictly positive w

pipeline.FeatureUnion()

class sklearn.pipeline.FeatureUnion(transformer_list, n_jobs=1, transformer_weights=None) [source] Concatenates results of multiple transformer objects. This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results. This is useful to combine several feature extraction mechanisms into a single transformer. Parameters of the transformers may be set using its name and the parameter name separated by a ?__?. A transformer may be replaced entir

ensemble.IsolationForest()

class sklearn.ensemble.IsolationForest(n_estimators=100, max_samples='auto', contamination=0.1, max_features=1.0, bootstrap=False, n_jobs=1, random_state=None, verbose=0) [source] Isolation Forest Algorithm Return the anomaly score of each sample using the IsolationForest algorithm The IsolationForest ?isolates? observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. Since recursive partitioning c

linear_model.MultiTaskElasticNet()

class sklearn.linear_model.MultiTaskElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, random_state=None, selection='cyclic') [source] Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer The optimization objective for MultiTaskElasticNet is: (1 / (2 * n_samples)) * ||Y - XW||^Fro_2 + alpha * l1_ratio * ||W||_21 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 Where: ||W||_21 = \sum_i \sqrt{\sum

Feature transformations with ensembles of trees

Transform your features into a higher dimensional, sparse space. Then train a linear model on these features. First fit an ensemble of trees (totally random trees, a random forest, or gradient boosted trees) on the training set. Then each leaf of each tree in the ensemble is assigned a fixed arbitrary feature index in a new feature space. These leaf indices are then encoded in a one-hot fashion. Each sample goes through the decisions of each tree of the ensemble and ends up in one leaf per tre