sklearn.cross_validation.permutation_test_score()

Warning DEPRECATED sklearn.cross_validation.permutation_test_score(estimator, X, y, cv=None, n_permutations=100, n_jobs=1, labels=None, random_state=0, verbose=0, scoring=None) [source] Evaluate the significance of a cross-validated score with permutations Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.permutation_test_score instead. Read more in the User Guide. Parameters: estimator : estimator object implementing ?fit? The object to u

neighbors.DistanceMetric

class sklearn.neighbors.DistanceMetric DistanceMetric class This class provides a uniform interface to fast distance metric functions. The various metrics can be accessed via the get_metric class method and the metric string identifier (see below). For example, to use the Euclidean distance: >>> dist = DistanceMetric.get_metric('euclidean') >>> X = [[0, 1, 2], [3, 4, 5]]) >>> dist.pairwise(X) array([[ 0. , 5.19615242], [ 5.19615242, 0.

calibration.CalibratedClassifierCV()

class sklearn.calibration.CalibratedClassifierCV(base_estimator=None, method='sigmoid', cv=3) [source] Probability calibration with isotonic regression or sigmoid. With this class, the base_estimator is fit on the train set of the cross-validation generator and the test set is used for calibration. The probabilities for each of the folds are then averaged for prediction. In case that cv=?prefit? is passed to __init__, it is assumed that base_estimator has been fitted already and all data is

Gaussian Mixture Model Sine Curve

This example demonstrates the behavior of Gaussian mixture models fit on data that was not sampled from a mixture of Gaussian random variables. The dataset is formed by 100 points loosely spaced following a noisy sine curve. There is therefore no ground truth value for the number of Gaussian components. The first model is a classical Gaussian Mixture Model with 10 components fit with the Expectation-Maximization algorithm. The second model is a Bayesian Gaussian Mixture Model with a Dirichlet

Model Complexity Influence

Demonstrate how model complexity influences both prediction accuracy and computational performance. The dataset is the Boston Housing dataset (resp. 20 Newsgroups) for regression (resp. classification). For each class of models we make the model complexity vary through the choice of relevant model parameters and measure the influence on both computational performance (latency) and predictive power (MSE or Hamming Loss). print(__doc__) # Author: Eustache Diemert <eustache@diemert.fr> # L

linear_model.PassiveAggressiveRegressor()

class sklearn.linear_model.PassiveAggressiveRegressor(C=1.0, fit_intercept=True, n_iter=5, shuffle=True, verbose=0, loss='epsilon_insensitive', epsilon=0.1, random_state=None, warm_start=False) [source] Passive Aggressive Regressor Read more in the User Guide. Parameters: C : float Maximum step size (regularization). Defaults to 1.0. epsilon : float If the difference between the current prediction and the correct label is below this threshold, the model is not updated. fit_intercept :

Comparing different clustering algorithms on toy datasets

This example aims at showing characteristics of different clustering algorithms on datasets that are ?interesting? but still in 2D. The last dataset is an example of a ?null? situation for clustering: the data is homogeneous, and there is no good clustering. While these examples give some intuition about the algorithms, this intuition might not apply to very high dimensional data. The results could be improved by tweaking the parameters for each clustering strategy, for instance setting the nu

2.2. Manifold learning

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high. 2.2.1. Introduction High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension

Working With Text Data

The goal of this guide is to explore some of the main scikit-learn tools on a single practical task: analysing a collection of text documents (newsgroups posts) on twenty different topics. In this section we will see how to: load the file contents and the categories extract feature vectors suitable for machine learning train a linear model to perform categorization use a grid search strategy to find a good configuration of both the feature extraction components and the classifier Tutorial s

feature_extraction.FeatureHasher()

class sklearn.feature_extraction.FeatureHasher(n_features=1048576, input_type='dict', dtype=, non_negative=False) [source] Implements feature hashing, aka the hashing trick. This class turns sequences of symbolic feature names (strings) into scipy.sparse matrices, using a hash function to compute the matrix column corresponding to a name. The hash function employed is the signed 32-bit version of Murmurhash3. Feature names of type byte string are used as-is. Unicode strings are converted to