model_selection.GroupShuffleSplit()

class sklearn.model_selection.GroupShuffleSplit(n_splits=5, test_size=0.2, train_size=None, random_state=None) [source] Shuffle-Group(s)-Out cross-validation iterator Provides randomized train/test indices to split data according to a third-party provided group. This group information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the groups could be the year of collection of the samples and thus allow for cross-validation against ti

Feature importances with forests of trees

This examples shows the use of forests of trees to evaluate the importance of features on an artificial classification task. The red bars are the feature importances of the forest, along with their inter-trees variability. As expected, the plot suggests that 3 features are informative, while the remaining are not. Out: Feature ranking: 1. feature 1 (0.295902) 2. feature 2 (0.208351) 3. feature 0 (0.177632) 4. feature 3 (0.047121) 5. feature 6 (0.046303) 6. feature 8 (0.046013) 7. feature

sklearn.datasets.get_data_home()

sklearn.datasets.get_data_home(data_home=None) [source] Return the path of the scikit-learn data dir. This folder is used by some large dataset loaders to avoid downloading the data several times. By default the data dir is set to a folder named ?scikit_learn_data? in the user home folder. Alternatively, it can be set by the ?SCIKIT_LEARN_DATA? environment variable or programmatically by giving an explicit folder path. The ?~? symbol is expanded to the user home folder. If the folder does n

linear_model.LassoCV()

class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1, positive=False, random_state=None, selection='cyclic') [source] Lasso linear model with iterative fitting along a regularization path The best model is selected by cross-validation. The optimization objective for Lasso is: (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1 Read more i

sklearn.metrics.pairwise.sigmoid_kernel()

sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, gamma=None, coef0=1) [source] Compute the sigmoid kernel between X and Y: K(X, Y) = tanh(gamma <X, Y> + coef0) Read more in the User Guide. Parameters: X : ndarray of shape (n_samples_1, n_features) Y : ndarray of shape (n_samples_2, n_features) gamma : float, default None If None, defaults to 1.0 / n_samples_1 coef0 : int, default 1 Returns: Gram matrix : array of shape (n_samples_1, n_samples_2)

sklearn.pipeline.make_pipeline()

sklearn.pipeline.make_pipeline(*steps) [source] Construct a Pipeline from the given estimators. This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically. Returns: p : Pipeline Examples >>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.preprocessing import StandardScaler >>> make_pipeline(StandardScaler(), GaussianN

Decision Tree Regression with AdaBoost

A decision tree is boosted using the AdaBoost.R2 [1] algorithm on a 1D sinusoidal dataset with a small amount of Gaussian noise. 299 boosts (300 decision trees) is compared with a single decision tree regressor. As the number of boosts is increased the regressor can fit more detail. [1] Drucker, ?Improving Regressors using Boosting Techniques?, 1997. print(__doc__) # Author: Noel Dawe <noel.dawe@gmail.com> # # License: BSD 3 clause # importing necessary libraries import numpy as

Concatenating multiple feature extraction methods

In many real-world examples, there are many ways to extract features from a dataset. Often it is beneficial to combine several methods to obtain good performance. This example shows how to use FeatureUnion to combine features obtained by PCA and univariate selection. Combining features using this transformer has the benefit that it allows cross validation and grid searches over the whole process. The combination used in this example is not particularly helpful on this dataset and is only used

Blind source separation using FastICA

An example of estimating sources from noisy data. Independent component analysis (ICA) is used to estimate sources given noisy measurements. Imagine 3 instruments playing simultaneously and 3 microphones recording the mixed signals. ICA is used to recover the sources ie. what is played by each instrument. Importantly, PCA fails at recovering our instruments since the related signals reflect non-Gaussian processes. print(__doc__) import numpy as np import matplotlib.pyplot as plt from scipy im

sklearn.metrics.pairwise.paired_cosine_distances()

sklearn.metrics.pairwise.paired_cosine_distances(X, Y) [source] Computes the paired cosine distances between X and Y Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Y : array-like, shape (n_samples, n_features) Returns: distances : ndarray, shape (n_samples, ) Notes The cosine distance is equivalent to the half the squared euclidean distance if each sample is normalized to unit norm