When performing classification one often wants to predict not only the class label, but also the associated probability. This probability gives some kind of confidence on the prediction. This example demonstrates how to display how well calibrated the predicted probabilities are and how to calibrate an uncalibrated classifier. The experiment is performed on an artificial dataset for binary classification with 100.000 samples (1.000 of them are used for model fitting) with 20 features. Of the 2

A demo of the Spectral Co-Clustering algorithm

This example demonstrates how to generate a dataset and bicluster it using the Spectral Co-Clustering algorithm. The dataset is generated using the make_biclusters function, which creates a matrix of small values and implants bicluster with large values. The rows and columns are then shuffled and passed to the Spectral Co-Clustering algorithm. Rearranging the shuffled matrix to make biclusters contiguous shows how accurately the algorithm found the biclusters. Out: consensus score: 1.0

sklearn.datasets.make_s_curve()

sklearn.datasets.make_s_curve(n_samples=100, noise=0.0, random_state=None) [source] Generate an S curve dataset. Read more in the User Guide. Parameters: n_samples : int, optional (default=100) The number of sample points on the S curve. noise : float, optional (default=0.0) The standard deviation of the gaussian noise. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance

Receiver Operating Characteristic

Example of Receiver Operating Characteristic (ROC) metric to evaluate classifier output quality. ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis. This means that the top left corner of the plot is the ?ideal? point - a false positive rate of zero, and a true positive rate of one. This is not very realistic, but it does mean that a larger area under the curve (AUC) is usually better. The ?steepness? of ROC curves is also important, since it i

sklearn.datasets.make_friedman2()

sklearn.datasets.make_friedman2(n_samples=100, noise=0.0, random_state=None) [source] Generate the ?Friedman #2? regression problem This dataset is described in Friedman [1] and Breiman [2]. Inputs X are 4 independent features uniformly distributed on the intervals: 0 <= X[:, 0] <= 100, 40 * pi <= X[:, 1] <= 560 * pi, 0 <= X[:, 2] <= 1, 1 <= X[:, 3] <= 11. The output y is created according to the formula: y(X) = (X[:, 0] ** 2 + (X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X

sklearn.calibration.calibration_curve()

sklearn.calibration.calibration_curve(y_true, y_prob, normalize=False, n_bins=5) [source] Compute true and predicted probabilities for a calibration curve. Read more in the User Guide. Parameters: y_true : array, shape (n_samples,) True targets. y_prob : array, shape (n_samples,) Probabilities of the positive class. normalize : bool, optional, default=False Whether y_prob needs to be normalized into the bin [0, 1], i.e. is not a proper probability. If True, the smallest value in y_pro

sklearn.datasets.make_sparse_uncorrelated()

sklearn.datasets.make_sparse_uncorrelated(n_samples=100, n_features=10, random_state=None) [source] Generate a random regression problem with sparse uncorrelated design This dataset is described in Celeux et al [1]. as: X ~ N(0, 1) y(X) = X[:, 0] + 2 * X[:, 1] - 2 * X[:, 2] - 1.5 * X[:, 3] Only the first 4 features are informative. The remaining features are useless. Read more in the User Guide. Parameters: n_samples : int, optional (default=100) The number of samples. n_features : int,

Plot randomly generated multilabel dataset

This illustrates the datasets.make_multilabel_classification dataset generator. Each sample consists of counts of two features (up to 50 in total), which are differently distributed in each of two classes. Points are labeled as follows, where Y means the class is present: 1 2 3 Color Y N N Red N Y N Blue N N Y Yellow Y Y N Purple Y N Y Orange Y Y N Green Y Y Y Brown A star marks the expected sample for each class; its size reflects the probability of selecting that class label. The left and

sklearn.datasets.load_sample_images()

sklearn.datasets.load_sample_images() [source] Load sample images for image manipulation. Loads both, china and flower. Returns: data : Bunch Dictionary-like object with the following attributes : ?images?, the two sample images, ?filenames?, the file names for the images, and ?DESCR? the full description of the dataset. Examples To load the data and visualize the images: >>> from sklearn.datasets import load_sample_images >>> dataset = load_sample_images() >&g

sklearn.model_selection.fit_grid_point()

sklearn.model_selection.fit_grid_point(X, y, estimator, parameters, train, test, scorer, verbose, error_score='raise', **fit_params) [source] Run fit on one set of parameters. Parameters: X : array-like, sparse matrix or list Input data. y : array-like or None Targets for input data. estimator : estimator object A object of that type is instantiated for each grid point. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score functio