feature_extraction.text.TfidfVectorizer()

class sklearn.feature_extraction.text.TfidfVectorizer(input=u'content', encoding=u'utf-8', decode_error=u'strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer=u'word', stop_words=None, token_pattern=u'(?u)\b\w\w+\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=, norm=u'l2', use_idf=True, smooth_idf=True, sublinear_tf=False) [source] Convert a collection of raw documents to a matrix of TF-IDF features.

gaussian_process.kernels.PairwiseKernel()

class sklearn.gaussian_process.kernels.PairwiseKernel(gamma=1.0, gamma_bounds=(1e-05, 100000.0), metric='linear', pairwise_kernels_kwargs=None) [source] Wrapper for kernels in sklearn.metrics.pairwise. A thin wrapper around the functionality of the kernels in sklearn.metrics.pairwise. Note: Evaluation of eval_gradient is not analytic but numeric and all kernels support only isotropic distances. The parameter gamma is considered to be a hyperparameter and may be optimized. The other kernel p

Manifold learning on handwritten digits

An illustration of various embeddings on the digits dataset. The RandomTreesEmbedding, from the sklearn.ensemble module, is not technically a manifold embedding method, as it learn a high-dimensional representation on which we apply a dimensionality reduction method. However, it is often useful to cast a dataset into a representation in which the classes are linearly-separable. t-SNE will be initialized with the embedding that is generated by PCA in this example, which is not the default setti

cross_decomposition.PLSSVD()

class sklearn.cross_decomposition.PLSSVD(n_components=2, scale=True, copy=True) [source] Partial Least Square SVD Simply perform a svd on the crosscovariance matrix: X?Y There are no iterative deflation here. Read more in the User Guide. Parameters: n_components : int, default 2 Number of components to keep. scale : boolean, default True Whether to scale X and Y. copy : boolean, default True Whether to copy X and Y, or perform in-place computations. Attributes: x_weights_ : array,

Illustration of prior and posterior Gaussian process for different kernels

This example illustrates the prior and posterior of a GPR with different kernels. Mean, standard deviation, and 10 samples are shown for both prior and posterior. print(__doc__) # Authors: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de> # # License: BSD 3 clause import numpy as np from matplotlib import pyplot as plt from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,

Multilabel classification

This example simulates a multi-label document classification problem. The dataset is generated randomly based on the following process: pick the number of labels: n ~ Poisson(n_labels) n times, choose a class c: c ~ Multinomial(theta) pick the document length: k ~ Poisson(length) k times, choose a word: w ~ Multinomial(theta_c) In the above process, rejection sampling is used to make sure that n is more than 2, and that the document length is never zero. Likewise, we reject classes which ha

One-class SVM with non-linear kernel

An example using a one-class SVM for novelty detection. One-class SVM is an unsupervised algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set. print(__doc__) import numpy as np import matplotlib.pyplot as plt import matplotlib.font_manager from sklearn import svm xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500)) # Generate train data X = 0.3 * np.random.randn(100, 2) X_train = np.r_[X + 2, X

Sparse coding with a precomputed dictionary

Transform a signal as a sparse combination of Ricker wavelets. This example visually compares different sparse coding methods using the sklearn.decomposition.SparseCoder estimator. The Ricker (also known as Mexican hat or the second derivative of a Gaussian) is not a particularly good kernel to represent piecewise constant signals like this one. It can therefore be seen how much adding different widths of atoms matters and it therefore motivates learning the dictionary to best fit your type of

cross_validation.LeaveOneOut()

Warning DEPRECATED class sklearn.cross_validation.LeaveOneOut(n) [source] Leave-One-Out cross validation iterator. Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeaveOneOut instead. Provides train/test indices to split data in train test sets. Each sample is used once as a test set (singleton) while the remaining samples form the training set. Note: LeaveOneOut(n) is equivalent to KFold(n, n_folds=n) and LeavePOut(n, p=1). Due to the hig

Hyper-parameters of Approximate Nearest Neighbors

This example demonstrates the behaviour of the accuracy of the nearest neighbor queries of Locality Sensitive Hashing Forest as the number of candidates and the number of estimators (trees) vary. In the first plot, accuracy is measured with the number of candidates. Here, the term ?number of candidates? refers to maximum bound for the number of distinct points retrieved from each tree to calculate the distances. Nearest neighbors are selected from this pool of candidates. Number of estimators