kernel_approximation.SkewedChi2Sampler()

class sklearn.kernel_approximation.SkewedChi2Sampler(skewedness=1.0, n_components=100, random_state=None) [source] Approximates feature map of the ?skewed chi-squared? kernel by Monte Carlo approximation of its Fourier transform. Read more in the User Guide. Parameters: skewedness : float ?skewedness? parameter of the kernel. Needs to be cross-validated. n_components : int number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space. rando

svm.OneClassSVM()

class sklearn.svm.OneClassSVM(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, random_state=None) [source] Unsupervised Outlier Detection. Estimate the support of a high-dimensional distribution. The implementation is based on libsvm. Read more in the User Guide. Parameters: kernel : string, optional (default=?rbf?) Specifies the kernel type to be used in the algorithm. It must be one of ?linear?, ?poly?, ?rbf?

sklearn.preprocessing.scale()

sklearn.preprocessing.scale(X, axis=0, with_mean=True, with_std=True, copy=True) [source] Standardize a dataset along any axis Center to the mean and component wise scale to unit variance. Read more in the User Guide. Parameters: X : {array-like, sparse matrix} The data to center and scale. axis : int (0 by default) axis used to compute the means and standard deviations along. If 0, independently standardize each feature, otherwise (if 1) standardize each sample. with_mean : boolean, T

Nested versus non-nested cross-validation

This example compares non-nested and nested cross-validation strategies on a classifier of the iris data set. Nested cross-validation (CV) is often used to train a model in which hyperparameters also need to be optimized. Nested CV estimates the generalization error of the underlying model and its (hyper)parameter search. Choosing the parameters that maximize non-nested CV biases the model to the dataset, yielding an overly-optimistic score. Model selection without nested CV uses the same data

Feature transformations with ensembles of trees

Transform your features into a higher dimensional, sparse space. Then train a linear model on these features. First fit an ensemble of trees (totally random trees, a random forest, or gradient boosted trees) on the training set. Then each leaf of each tree in the ensemble is assigned a fixed arbitrary feature index in a new feature space. These leaf indices are then encoded in a one-hot fashion. Each sample goes through the decisions of each tree of the ensemble and ends up in one leaf per tre

linear_model.MultiTaskElasticNet()

class sklearn.linear_model.MultiTaskElasticNet(alpha=1.0, l1_ratio=0.5, fit_intercept=True, normalize=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, random_state=None, selection='cyclic') [source] Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer The optimization objective for MultiTaskElasticNet is: (1 / (2 * n_samples)) * ||Y - XW||^Fro_2 + alpha * l1_ratio * ||W||_21 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 Where: ||W||_21 = \sum_i \sqrt{\sum

ensemble.IsolationForest()

class sklearn.ensemble.IsolationForest(n_estimators=100, max_samples='auto', contamination=0.1, max_features=1.0, bootstrap=False, n_jobs=1, random_state=None, verbose=0) [source] Isolation Forest Algorithm Return the anomaly score of each sample using the IsolationForest algorithm The IsolationForest ?isolates? observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. Since recursive partitioning c

lda.LDA()

Warning DEPRECATED class sklearn.lda.LDA(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001) [source] Alias for sklearn.discriminant_analysis.LinearDiscriminantAnalysis. Deprecated since version 0.17: This class will be removed in 0.19. Use sklearn.discriminant_analysis.LinearDiscriminantAnalysis instead. Methods decision_function(X) Predict confidence scores for samples. fit(X, y[, store_covariance, tol]) Fit LinearDiscriminantAnalysis mo

naive_bayes.GaussianNB()

class sklearn.naive_bayes.GaussianNB(priors=None) [source] Gaussian Naive Bayes (GaussianNB) Can perform online updates to model parameters via partial_fit method. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque: http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf Read more in the User Guide. Parameters: priors : array-like, shape (n_classes,) Prior probabilities of the clas

dummy.DummyClassifier()

class sklearn.dummy.DummyClassifier(strategy='stratified', random_state=None, constant=None) [source] DummyClassifier is a classifier that makes predictions using simple rules. This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems. Read more in the User Guide. Parameters: strategy : str, default=?stratified? Strategy to use to generate predictions. ?stratified?: generates predictions by respecting the training set?s class