Plot multi-class SGD on the iris dataset

Plot decision surface of multi-class SGD on iris dataset. The hyperplanes corresponding to the three one-versus-all (OVA) classifiers are represented by the dashed lines. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.linear_model import SGDClassifier # import some data to play with iris = datasets.load_iris() X = iris.data[:, :2] # we only take the first two features. We could # avoid this ugly slicing b

sklearn.metrics.average_precision_score()

sklearn.metrics.average_precision_score(y_true, y_score, average='macro', sample_weight=None) [source] Compute average precision (AP) from prediction scores This score corresponds to the area under the precision-recall curve. Note: this implementation is restricted to the binary classification task or multilabel classification task. Read more in the User Guide. Parameters: y_true : array, shape = [n_samples] or [n_samples, n_classes] True binary labels in binary label indicators. y_score

cross_decomposition.PLSSVD()

class sklearn.cross_decomposition.PLSSVD(n_components=2, scale=True, copy=True) [source] Partial Least Square SVD Simply perform a svd on the crosscovariance matrix: X?Y There are no iterative deflation here. Read more in the User Guide. Parameters: n_components : int, default 2 Number of components to keep. scale : boolean, default True Whether to scale X and Y. copy : boolean, default True Whether to copy X and Y, or perform in-place computations. Attributes: x_weights_ : array,

2.3. Clustering

Clustering of unlabeled data can be performed with the module sklearn.cluster. Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on train data, and a function, that, given train data, returns an array of integer labels corresponding to the different clusters. For the class, the labels over the training data can be found in the labels_ attribute. Input data One important thing to note is that the algorithms implemented in this module

decomposition.LatentDirichletAllocation()

class sklearn.decomposition.LatentDirichletAllocation(n_topics=10, doc_topic_prior=None, topic_word_prior=None, learning_method=None, learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None) [source] Latent Dirichlet Allocation with online variational Bayes algorithm New in version 0.17. Read more in the User Guide. Parameters: n_

linear_model.SGDClassifier()

class sklearn.linear_model.SGDClassifier(loss='hinge', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, n_iter=5, shuffle=True, verbose=0, epsilon=0.1, n_jobs=1, random_state=None, learning_rate='optimal', eta0=0.0, power_t=0.5, class_weight=None, warm_start=False, average=False) [source] Linear classifiers (SVM, logistic regression, a.o.) with SGD training. This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the

Manifold learning on handwritten digits

An illustration of various embeddings on the digits dataset. The RandomTreesEmbedding, from the sklearn.ensemble module, is not technically a manifold embedding method, as it learn a high-dimensional representation on which we apply a dimensionality reduction method. However, it is often useful to cast a dataset into a representation in which the classes are linearly-separable. t-SNE will be initialized with the embedding that is generated by PCA in this example, which is not the default setti

sklearn.metrics.zero_one_loss()

sklearn.metrics.zero_one_loss(y_true, y_pred, normalize=True, sample_weight=None) [source] Zero-one classification loss. If normalize is True, return the fraction of misclassifications (float), else it returns the number of misclassifications (int). The best performance is 0. Read more in the User Guide. Parameters: y_true : 1d array-like, or label indicator array / sparse matrix Ground truth (correct) labels. y_pred : 1d array-like, or label indicator array / sparse matrix Predicted la

Comparing random forests and the multi-output meta estimator

An example to compare multi-output regression with random forest and the multioutput.MultiOutputRegressor meta-estimator. This example illustrates the use of the multioutput.MultiOutputRegressor meta-estimator to perform multi-output regression. A random forest regressor is used, which supports multi-output regression natively, so the results can be compared. The random forest regressor will only ever predict values within the range of observations or closer to zero for each of the targets. As

model_selection.RandomizedSearchCV()

class sklearn.model_selection.RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score='raise', return_train_score=True) [source] Randomized search on hyper parameters. RandomizedSearchCV implements a ?fit? and a ?score? method. It also implements ?predict?, ?predict_proba?, ?decision_function?, ?transform? and ?inverse_transform? if they are implem