sklearn.neighbors.kneighbors_graph(X, n_neighbors, mode='connectivity', metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=1) [source] Computes the (weighted) graph of k-Neighbors for points in X Read more in the User Guide. Parameters: X : array-like or BallTree, shape = [n_samples, n_features] Sample data, in the form of a numpy array or a precomputed BallTree. n_neighbors : int Number of neighbors for each sample. mode : {?connectivity?, ?distance?}, optional T

Making sure that each Feature has approximately the same scale can be a crucial preprocessing step. However, when data contains outliers, StandardScaler can often be mislead. In such cases, it is better to use a scaler that is robust against outliers. Here, we demonstrate this on a toy dataset, where one single datapoint is a large outlier. Out: Testset accuracy using standard scaler: 0.545 Testset accuracy using robust scaler: 0.705 from __future__ import print_function print(__doc_

Lasso and Elastic Net for Sparse Signals

Estimates Lasso and Elastic-Net regression models on a manually generated sparse signal corrupted with an additive noise. Estimated coefficients are compared with the ground-truth. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.metrics import r2_score generate some sparse data to play with np.random.seed(42) n_samples, n_features = 50, 200 X = np.random.randn(n_samples, n_features) coef = 3 * np.random.randn(n_features) inds = np.arange(n_features) np.random

sklearn.cross_validation.cross_val_score()

Warning DEPRECATED sklearn.cross_validation.cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs') [source] Evaluate a score by cross-validation Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.cross_val_score instead. Read more in the User Guide. Parameters: estimator : estimator object implementing ?fit? The object to use to fit the data. X : array-like The data to f

sklearn.ensemble.partial_dependence.partial_dependence()

sklearn.ensemble.partial_dependence.partial_dependence(gbrt, target_variables, grid=None, X=None, percentiles=(0.05, 0.95), grid_resolution=100) [source] Partial dependence of target_variables. Partial dependence plots show the dependence between the joint values of the target_variables and the function represented by the gbrt. Read more in the User Guide. Parameters: gbrt : BaseGradientBoosting A fitted gradient boosting model. target_variables : array-like, dtype=int The target featur

Single estimator versus bagging

This example illustrates and compares the bias-variance decomposition of the expected mean squared error of a single estimator against a bagging ensemble. In regression, the expected mean squared error of an estimator can be decomposed in terms of bias, variance and noise. On average over datasets of the regression problem, the bias term measures the average amount by which the predictions of the estimator differ from the predictions of the best possible estimator for the problem (i.e., the Ba

sklearn.cluster.k_means()

sklearn.cluster.k_means(X, n_clusters, init='k-means++', precompute_distances='auto', n_init=10, max_iter=300, verbose=False, tol=0.0001, random_state=None, copy_x=True, n_jobs=1, algorithm='auto', return_n_iter=False) [source] K-means clustering algorithm. Read more in the User Guide. Parameters: X : array-like or sparse matrix, shape (n_samples, n_features) The observations to cluster. n_clusters : int The number of clusters to form as well as the number of centroids to generate. max

Illustration of Gaussian process classification on the XOR dataset

This example illustrates GPC on XOR data. Compared are a stationary, isotropic kernel (RBF) and a non-stationary kernel (DotProduct). On this particular dataset, the DotProduct kernel obtains considerably better results because the class-boundaries are linear and coincide with the coordinate axes. In general, stationary kernels often obtain better results. print(__doc__) # Authors: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de> # # License: BSD 3 clause import numpy as np import

2.4. Biclustering

Biclustering can be performed with the module sklearn.cluster.bicluster. Biclustering algorithms simultaneously cluster rows and columns of a data matrix. These clusters of rows and columns are known as biclusters. Each determines a submatrix of the original data matrix with some desired properties. For instance, given a matrix of shape (10, 10), one possible bicluster with three rows and two columns induces a submatrix of shape (3, 2): >>> import numpy as np >>> data = np.ar

model_selection.KFold()

class sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None) [source] K-Folds cross-validator Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default). Each fold is then used once as a validation while the k - 1 remaining folds form the training set. Read more in the User Guide. Parameters: n_splits : int, default=3 Number of folds. Must be at least 2. shuffle : boolean, optional Whether to s