sklearn.isotonic.check_increasing(x, y) [source] Determine whether y is monotonically correlated with x. y is found increasing or decreasing with respect to x based on a Spearman correlation test. Parameters: x : array-like, shape=(n_samples,) Training data. y : array-like, shape=(n_samples,) Training target. Returns: `increasing_bool` : boolean Whether the relationship is increasing or decreasing. Notes The Spearman correlation coefficient is estimated from the data, and the sign

We show that linear_model.Lasso provides the same results for dense and sparse data and that in the case of sparse data the speed is improved. print(__doc__) from time import time from scipy import sparse from scipy import linalg from sklearn.datasets.samples_generator import make_regression from sklearn.linear_model import Lasso The two Lasso implementations on Dense data print("--- Dense matrices") X, y = make_regression(n_samples=200, n_features=5000, random_state=0) X_sp = sparse.coo_m

base.RegressorMixin

class sklearn.base.RegressorMixin [source] Mixin class for all regression estimators in scikit-learn. Methods score(X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction. __init__() x.__init__(...) initializes x; see help(type(x)) for signature score(X, y, sample_weight=None) [source] Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred)

Plot randomly generated classification dataset

Plot several randomly generated 2D classification datasets. This example illustrates the datasets.make_classification datasets.make_blobs and datasets.make_gaussian_quantiles functions. For make_classification, three binary and two multi-class classification datasets are generated, with different numbers of informative features and clusters per class. print(__doc__) import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.datasets import make_blobs fr

1.13. Feature selection

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators? accuracy scores or to boost their performance on very high-dimensional datasets. 1.13.1. Removing features with low variance VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn?t meet some threshold. By default, it removes all zero-variance features, i.e. features that have th

sklearn.covariance.empirical_covariance()

sklearn.covariance.empirical_covariance(X, assume_centered=False) [source] Computes the Maximum likelihood covariance estimator Parameters: X : ndarray, shape (n_samples, n_features) Data from which to compute the covariance estimate assume_centered : Boolean If True, data are not centered before computation. Useful when working with data whose mean is almost, but not exactly zero. If False, data are centered before computation. Returns: covariance : 2D ndarray, shape (n_features, n_

sklearn.datasets.load_linnerud()

sklearn.datasets.load_linnerud(return_X_y=False) [source] Load and return the linnerud dataset (multivariate regression). Samples total: 20 Dimensionality: 3 for both data and targets Features: integer Targets: integer Parameters: return_X_y : boolean, default=False. If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object. New in version 0.18. Returns: data : Bunch Dictionary-like object, the interesting attributes a

Online learning of a dictionary of parts of faces

This example uses a large dataset of faces to learn a set of 20 x 20 images patches that constitute faces. From the programming standpoint, it is interesting because it shows how to use the online API of the scikit-learn to process a very large dataset by chunks. The way we proceed is that we load an image at a time and extract randomly 50 patches from this image. Once we have accumulated 500 of these patches (using 10 images), we run the partial_fit method of the online KMeans object, MiniBat

sklearn.metrics.f1_score()

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None) [source] Compute the F1 score, also known as balanced F-score or F-measure The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall)

Plotting Validation Curves

In this plot you can see the training scores and validation scores of an SVM for different values of the kernel parameter gamma. For very low values of gamma, you can see that both the training score and the validation score are low. This is called underfitting. Medium values of gamma will result in high values for both scores, i.e. the classifier is performing fairly well. If gamma is too high, the classifier will overfit, which means that the training score is good but the validation score i