SVM Margins Example

The plots below illustrate the effect the parameter C has on the separation line. A large value of C basically tells our model that we do not have that much faith in our data?s distribution, and will only consider points close to line of separation. A small value of C includes more/all the observations, allowing the margins to be calculated using all the data in the area. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause

decomposition.FastICA()

class sklearn.decomposition.FastICA(n_components=None, algorithm='parallel', whiten=True, fun='logcosh', fun_args=None, max_iter=200, tol=0.0001, w_init=None, random_state=None) [source] FastICA: a fast algorithm for Independent Component Analysis. Read more in the User Guide. Parameters: n_components : int, optional Number of components to use. If none is passed, all are used. algorithm : {?parallel?, ?deflation?} Apply parallel or deflational algorithm for FastICA. whiten : boolean,

linear_model.OrthogonalMatchingPursuitCV()

class sklearn.linear_model.OrthogonalMatchingPursuitCV(copy=True, fit_intercept=True, normalize=True, max_iter=None, cv=None, n_jobs=1, verbose=False) [source] Cross-validated Orthogonal Matching Pursuit model (OMP) Parameters: copy : bool, optional Whether the design matrix X must be copied by the algorithm. A false value is only helpful if X is already Fortran-ordered, otherwise a copy is made anyway. fit_intercept : boolean, optional whether to calculate the intercept for this model.

Out-of-core classification of text documents

This is an example showing how scikit-learn can be used for classification using an out-of-core approach: learning from data that doesn?t fit into main memory. We make use of an online classifier, i.e., one that supports the partial_fit method, that will be fed with batches of examples. To guarantee that the features space remains the same over time we leverage a HashingVectorizer that will project each example into the same feature space. This is especially useful in the case of text classifi

Cross-validation on diabetes Dataset Exercise

A tutorial exercise which uses cross-validation with linear models. This exercise is used in the Cross-validated estimators part of the Model selection: choosing estimators and their parameters section of the A tutorial on statistical-learning for scientific data processing. from __future__ import print_function print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.linear_model import LassoCV from sklearn.linear_model import Lasso from sk

multiclass.OneVsOneClassifier()

class sklearn.multiclass.OneVsOneClassifier(estimator, n_jobs=1) [source] One-vs-one multiclass strategy This strategy consists in fitting one classifier per class pair. At prediction time, the class which received the most votes is selected. Since it requires to fit n_classes * (n_classes - 1) / 2 classifiers, this method is usually slower than one-vs-the-rest, due to its O(n_classes^2) complexity. However, this method may be advantageous for algorithms such as kernel algorithms which don?

decomposition.FactorAnalysis()

class sklearn.decomposition.FactorAnalysis(n_components=None, tol=0.01, copy=True, max_iter=1000, noise_variance_init=None, svd_method='randomized', iterated_power=3, random_state=0) [source] Factor Analysis (FA) A simple linear generative model with Gaussian latent variables. The observations are assumed to be caused by a linear transformation of lower dimensional latent factors and added Gaussian noise. Without loss of generality the factors are distributed according to a Gaussian with ze

cross_validation.LabelShuffleSplit()

Warning DEPRECATED class sklearn.cross_validation.LabelShuffleSplit(labels, n_iter=5, test_size=0.2, train_size=None, random_state=None) [source] Shuffle-Labels-Out cross-validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.GroupShuffleSplit instead. Provides randomized train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifica

sklearn.datasets.make_spd_matrix()

sklearn.datasets.make_spd_matrix(n_dim, random_state=None) [source] Generate a random symmetric, positive-definite matrix. Read more in the User Guide. Parameters: n_dim : int The matrix dimension. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.ran

tree.ExtraTreeRegressor()

class sklearn.tree.ExtraTreeRegressor(criterion='mse', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', random_state=None, min_impurity_split=1e-07, max_leaf_nodes=None) [source] An extremely randomized tree regressor. Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the m