model_selection.GroupKFold()

class sklearn.model_selection.GroupKFold(n_splits=3) [source] K-fold iterator variant with non-overlapping groups. The same group will not appear in two different folds (the number of distinct groups has to be at least equal to the number of folds). The folds are approximately balanced in the sense that the number of distinct groups is approximately the same in each fold. Parameters: n_splits : int, default=3 Number of folds. Must be at least 2. See also LeaveOneGroupOut For splitti

exceptions.DataConversionWarning

class sklearn.exceptions.DataConversionWarning [source] Warning used to notify implicit data conversions happening in the code. This warning occurs when some input data needs to be converted or interpreted in a way that may not match the user?s expectations. For example, this warning may occur when the user passes an integer array to a function which expects float input and will convert the input requests a non-copying operation, but a copy is required to meet the implementation?s data-typ

Hashing feature transformation using Totally Random Trees

RandomTreesEmbedding provides a way to map data to a very high-dimensional, sparse representation, which might be beneficial for classification. The mapping is completely unsupervised and very efficient. This example visualizes the partitions given by several trees and shows how the transformation can also be used for non-linear dimensionality reduction or non-linear classification. Points that are neighboring often share the same leaf of a tree and therefore share large parts of their hashed

The Digit Dataset

This dataset is made up of 1797 8x8 images. Each image, like the one shown below, is of a hand-written digit. In order to utilize an 8x8 figure like this, we?d have to first transform it into a feature vector with length 64. See here for more information about this dataset. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause from sklearn import datasets import matplotlib.pyplot as plt #Load the digits dataset digits = datas

sklearn.preprocessing.minmax_scale()

sklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True) [source] Transforms features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one. The transformation is given by: X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min where min, max = feature_range. This transformation is often used a

ensemble.BaggingClassifier()

class sklearn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0) [source] A Bagging classifier. A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final predictio

SGD: Maximum margin separating hyperplane

Plot the maximum margin separating hyperplane within a two-class separable dataset using a linear Support Vector Machines classifier trained using SGD. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import SGDClassifier from sklearn.datasets.samples_generator import make_blobs # we create 50 separable points X, Y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60) # fit the model clf = SGDClassifier(loss="hinge", alpha=0.0

manifold.MDS()

class sklearn.manifold.MDS(n_components=2, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=1, random_state=None, dissimilarity='euclidean') [source] Multidimensional scaling Read more in the User Guide. Parameters: metric : boolean, optional, default: True compute metric or nonmetric SMACOF (Scaling by Majorizing a Complicated Function) algorithm n_components : int, optional, default: 2 number of dimension in which to immerse the similarities overridden if initial arra

linear_model.ElasticNetCV()

class sklearn.linear_model.ElasticNetCV(l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute='auto', max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=1, positive=False, random_state=None, selection='cyclic') [source] Elastic Net model with iterative fitting along a regularization path The best model is selected by cross-validation. Read more in the User Guide. Parameters: l1_ratio : float or array of floats, optional float b

sklearn.metrics.label_ranking_average_precision_score()

sklearn.metrics.label_ranking_average_precision_score(y_true, y_score) [source] Compute ranking-based average precision Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score. This metric is used in multilabel ranking problem, where the goal is to give better rank to the labels associated to each sample. The obtained score is always strictly greater than 0 and the best value is 1. Rea