feature_extraction.text.TfidfTransformer()

class sklearn.feature_extraction.text.TfidfTransformer(norm=u'l2', use_idf=True, smooth_idf=True, sublinear_tf=False) [source] Transform a count matrix to a normalized tf or tf-idf representation Tf means term-frequency while tf-idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification. The goal of using tf-idf instead of the raw frequencies of occurrence of a token in a

cross_validation.StratifiedKFold()

Warning DEPRECATED class sklearn.cross_validation.StratifiedKFold(y, n_folds=3, shuffle=False, random_state=None) [source] Stratified K-Folds cross validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.StratifiedKFold instead. Provides train/test indices to split data in train test sets. This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of sampl

sklearn.metrics.pairwise.paired_distances()

sklearn.metrics.pairwise.paired_distances(X, Y, metric='euclidean', **kwds) [source] Computes the paired distances between X and Y. Computes the distances between (X[0], Y[0]), (X[1], Y[1]), etc... Read more in the User Guide. Parameters: X : ndarray (n_samples, n_features) Array 1 for distance computation. Y : ndarray (n_samples, n_features) Array 2 for distance computation. metric : string or callable The metric to use when calculating distance between instances in a feature array.

sklearn.datasets.make_gaussian_quantiles()

sklearn.datasets.make_gaussian_quantiles(mean=None, cov=1.0, n_samples=100, n_features=2, n_classes=3, shuffle=True, random_state=None) [source] Generate isotropic Gaussian and label samples by quantile This classification dataset is constructed by taking a multi-dimensional standard normal distribution and defining classes separated by nested concentric multi-dimensional spheres such that roughly equal numbers of samples are in each class (quantiles of the distribution). Read more in the

Parameter estimation using grid search with cross-validation

This examples shows how a classifier is optimized by cross-validation, which is done using the sklearn.model_selection.GridSearchCV object on a development set that comprises only half of the available labeled data. The performance of the selected hyper-parameters and trained model is then measured on a dedicated evaluation set that was not used during the model selection step. More details on tools available for model selection can be found in the sections on Cross-validation: evaluating esti

neural_network.MLPRegressor()

class sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(100, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08) [source] Multi-layer Perceptron regressor. This model optimizes the square

Multi-class AdaBoosted Decision Trees

This example reproduces Figure 1 of Zhu et al [1] and shows how boosting can improve prediction accuracy on a multi-class problem. The classification dataset is constructed by taking a ten-dimensional standard normal distribution and defining three classes separated by nested concentric ten-dimensional spheres such that roughly equal numbers of samples are in each class (quantiles of the distribution). The performance of the SAMME and SAMME.R [1] algorithms are compared. SAMME.R uses the prob

kernel_approximation.Nystroem()

class sklearn.kernel_approximation.Nystroem(kernel='rbf', gamma=None, coef0=1, degree=3, kernel_params=None, n_components=100, random_state=None) [source] Approximate a kernel map using a subset of the training data. Constructs an approximate feature map for an arbitrary kernel using a subset of the data as basis. Read more in the User Guide. Parameters: kernel : string or callable, default=?rbf? Kernel map to be approximated. A callable should accept two arguments and the keyword argumen

ensemble.ExtraTreesRegressor()

class sklearn.ensemble.ExtraTreesRegressor(n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False) [source] An extra-trees regressor. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of th

sklearn.metrics.normalized_mutual_info_score()

sklearn.metrics.normalized_mutual_info_score(labels_true, labels_pred) [source] Normalized Mutual Information between two clusterings. Normalized Mutual Information (NMI) is an normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by sqrt(H(labels_true) * H(labels_pred)) This measure is not adjusted for chance. Therefore adjusted_mustual_info_score might be pre