gaussian_process.kernels.CompoundKernel()

class sklearn.gaussian_process.kernels.CompoundKernel(kernels) [source] Kernel which is composed of a set of other kernels. New in version 0.18. Methods clone_with_theta(theta) Returns a clone of self with given hyperparameters theta. diag(X) Returns the diagonal of the kernel k(X, X). get_params([deep]) Get parameters of this kernel. is_stationary() Returns whether the kernel is stationary. set_params(\*\*params) Set the parameters of this kernel. __init__(kernels) [source] boun

sklearn.neighbors.radius_neighbors_graph()

sklearn.neighbors.radius_neighbors_graph(X, radius, mode='connectivity', metric='minkowski', p=2, metric_params=None, include_self=False, n_jobs=1) [source] Computes the (weighted) graph of Neighbors for points in X Neighborhoods are restricted the points at a distance lower than radius. Read more in the User Guide. Parameters: X : array-like or BallTree, shape = [n_samples, n_features] Sample data, in the form of a numpy array or a precomputed BallTree. radius : float Radius of neighbo

model_selection.TimeSeriesSplit()

class sklearn.model_selection.TimeSeriesSplit(n_splits=3) [source] Time Series cross-validator Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate. This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set. Note that unlike st

sklearn.linear_model.orthogonal_mp_gram()

sklearn.linear_model.orthogonal_mp_gram(Gram, Xy, n_nonzero_coefs=None, tol=None, norms_squared=None, copy_Gram=True, copy_Xy=True, return_path=False, return_n_iter=False) [source] Gram Orthogonal Matching Pursuit (OMP) Solves n_targets Orthogonal Matching Pursuit problems using only the Gram matrix X.T * X and the product X.T * y. Read more in the User Guide. Parameters: Gram : array, shape (n_features, n_features) Gram matrix of the input data: X.T * X Xy : array, shape (n_features,) o

The Johnson-Lindenstrauss bound for embedding with random projections

The Johnson-Lindenstrauss lemma states that any high dimensional dataset can be randomly projected into a lower dimensional Euclidean space while controlling the distortion in the pairwise distances. Theoretical bounds The distortion introduced by a random projection p is asserted by the fact that p is defining an eps-embedding with good probability as defined by: Where u and v are any rows taken from a dataset of shape [n_samples, n_features] and p is a projection by a random Gaussian N(0

sklearn.datasets.load_digits()

sklearn.datasets.load_digits(n_class=10, return_X_y=False) [source] Load and return the digits dataset (classification). Each datapoint is a 8x8 image of a digit. Classes 10 Samples per class ~180 Samples total 1797 Dimensionality 64 Features integers 0-16 Read more in the User Guide. Parameters: n_class : integer, between 0 and 10, optional (default=10) The number of classes to return. return_X_y : boolean, default=False. If True, returns (data, target) instead of a Bunch object. See b

sklearn.metrics.log_loss()

sklearn.metrics.log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None, labels=None) [source] Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier?s predictions. The log loss is only defined for two or more labels. For a single sample with true label yt in {0,1} and estimated

1.12. Multiclass and multilabel algorithms

Warning All classifiers in scikit-learn do multiclass classification out-of-the-box. You don?t need to use the sklearn.multiclass module unless you want to experiment with different multiclass strategies. The sklearn.multiclass module implements meta-estimators to solve multiclass and multilabel classification problems by decomposing such problems into binary classification problems. Multitarget regression is also supported. Multiclass classification means a classification task with more t

sklearn.datasets.load_files()

sklearn.datasets.load_files(container_path, description=None, categories=None, load_content=True, shuffle=True, encoding=None, decode_error='strict', random_state=0) [source] Load text files with categories as subfolder names. Individual samples are assumed to be files stored a two levels folder structure such as the following: container_folder/ category_1_folder/ file_1.txt file_2.txt ... file_42.txt category_2_folder/ file_43.txt file_44.txt ... The folder names are used as supervised

sklearn.cluster.k_means()

sklearn.cluster.k_means(X, n_clusters, init='k-means++', precompute_distances='auto', n_init=10, max_iter=300, verbose=False, tol=0.0001, random_state=None, copy_x=True, n_jobs=1, algorithm='auto', return_n_iter=False) [source] K-means clustering algorithm. Read more in the User Guide. Parameters: X : array-like or sparse matrix, shape (n_samples, n_features) The observations to cluster. n_clusters : int The number of clusters to form as well as the number of centroids to generate. max