sklearn.datasets.mldata_filename()

sklearn.datasets.mldata_filename(dataname) [source] Convert a raw name for a data set in a mldata.org filename.

Plot class probabilities calculated by the VotingClassifier

Plot the class probabilities of the first sample in a toy dataset predicted by three different classifiers and averaged by the VotingClassifier. First, three examplary classifiers are initialized (LogisticRegression, GaussianNB, and RandomForestClassifier) and used to initialize a soft-voting VotingClassifier with weights [1, 1, 5], which means that the predicted probabilities of the RandomForestClassifier count 5 times as much as the weights of the other classifiers when the averaged probabil

Comparison of Manifold Learning methods

An illustration of dimensionality reduction on the S-curve dataset with various manifold learning methods. For a discussion and comparison of these algorithms, see the manifold module page For a similar example, where the methods are applied to a sphere dataset, see Manifold Learning methods on a severed sphere Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional spa

semi_supervised.LabelSpreading()

class sklearn.semi_supervised.LabelSpreading(kernel='rbf', gamma=20, n_neighbors=7, alpha=0.2, max_iter=30, tol=0.001, n_jobs=1) [source] LabelSpreading model for semi-supervised learning This model is similar to the basic Label Propgation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels. Read more in the User Guide. Parameters: kernel : {?knn?, ?rbf?} String identifier for kernel function to use. Only ?rbf? and ?knn? kernels

cross_validation.StratifiedShuffleSplit()

Warning DEPRECATED class sklearn.cross_validation.StratifiedShuffleSplit(y, n_iter=10, test_size=0.1, train_size=None, random_state=None) [source] Stratified ShuffleSplit cross validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.StratifiedShuffleSplit instead. Provides train/test indices to split data in train test sets. This cross-validation object is a merge of StratifiedKFold and ShuffleSplit, which returns stratified ra

A demo of K-Means clustering on the handwritten digits data

In this example we compare the various initialization strategies for K-means in terms of runtime and quality of the results. As the ground truth is known here, we also apply different cluster quality metrics to judge the goodness of fit of the cluster labels to the ground truth. Cluster quality metrics evaluated (see Clustering performance evaluation for definitions and discussions of the metrics): Shorthand full name homo homogeneity score compl completeness score v-meas V measure ARI adjuste

Linear and Quadratic Discriminant Analysis with confidence ellipsoid

Plot the confidence ellipsoids of each class and decision boundary print(__doc__) from scipy import linalg import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl from matplotlib import colors from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis colormap cmap = colors.LinearSegmentedColormap( 'red_blue_classes', {'red': [(0, 1, 1), (1, 0.7, 0.7)], 'green': [(0, 0.7, 0.7),

decomposition.SparsePCA()

class sklearn.decomposition.SparsePCA(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e-08, method='lars', n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None) [source] Sparse Principal Components Analysis (SparsePCA) Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha. Read more in the User Guide. Parameters: n_components :

feature_extraction.DictVectorizer()

class sklearn.feature_extraction.DictVectorizer(dtype=, separator='=', sparse=True, sort=True) [source] Transforms lists of feature-value mappings to vectors. This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators. When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the pos

svm.LinearSVC()

class sklearn.svm.LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000) [source] Linear Support Vector Classification. Similar to SVC with parameter kernel=?linear?, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.