Gradient Boosting regularization

Illustration of the effect of different regularization strategies for Gradient Boosting. The example is taken from Hastie et al 2009. The loss function used is binomial deviance. Regularization via shrinkage (learning_rate < 1.0) improves performance considerably. In combination with shrinkage, stochastic gradient boosting (subsample < 1.0) can produce more accurate models by reducing the variance via bagging. Subsampling without shrinkage usually does poorly. Another strategy to reduce

sklearn.metrics.fbeta_score()

sklearn.metrics.fbeta_score(y_true, y_pred, beta, labels=None, pos_label=1, average='binary', sample_weight=None) [source] Compute the F-beta score The F-beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0. The beta parameter determines the weight of precision in the combined score. beta < 1 lends more weight to precision, while beta > 1 favors recall (beta -> 0 considers only precision, beta -> inf only reca

linear_model.MultiTaskLasso()

class sklearn.linear_model.MultiTaskLasso(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, random_state=None, selection='cyclic') [source] Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer The optimization objective for Lasso is: (1 / (2 * n_samples)) * ||Y - XW||^2_Fro + alpha * ||W||_21 Where: ||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2} i.e. the sum of norm of each row. Read more in the User Guide. Parameters: alph

sklearn.preprocessing.binarize()

sklearn.preprocessing.binarize(X, threshold=0.0, copy=True) [source] Boolean thresholding of array-like or scipy.sparse matrix Read more in the User Guide. Parameters: X : {array-like, sparse matrix}, shape [n_samples, n_features] The data to binarize, element by element. scipy.sparse matrices should be in CSR or CSC format to avoid an un-necessary copy. threshold : float, optional (0.0 by default) Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not

sklearn.svm.libsvm.predict_proba()

sklearn.svm.libsvm.predict_proba() Predict probabilities svm_model stores all parameters needed to predict a given value. For speed, all real work is done at the C level in function copy_predict (libsvm_helper.c). We have to reconstruct model and parameters to make sure we stay in sync with the python object. See sklearn.svm.predict for a complete list of parameters. Parameters: X: array-like, dtype=float : kernel : {?linear?, ?rbf?, ?poly?, ?sigmoid?, ?precomputed?} Returns: dec_values

sklearn.metrics.pairwise.distance_metrics()

sklearn.metrics.pairwise.distance_metrics() [source] Valid metrics for pairwise_distances. This function simply returns the valid pairwise distance metrics. It exists to allow for a description of the mapping for each of the valid strings. The valid distance metrics, and the function they map to, are: metric Function ?cityblock? metrics.pairwise.manhattan_distances ?cosine? metrics.pairwise.cosine_distances ?euclidean? metrics.pairwise.euclidean_distances ?l1? metrics.pairwise.manhattan_dis

Orthogonal Matching Pursuit

Using orthogonal matching pursuit for recovering a sparse signal from a noisy measurement encoded with a dictionary print(__doc__) import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import OrthogonalMatchingPursuit from sklearn.linear_model import OrthogonalMatchingPursuitCV from sklearn.datasets import make_sparse_coded_signal n_components, n_features = 512, 100 n_nonzero_coefs = 17 # generate the data ################### # y = Xw # |x|_0 = n_nonzero_coefs y, X,

sklearn.metrics.pairwise.rbf_kernel()

sklearn.metrics.pairwise.rbf_kernel(X, Y=None, gamma=None) [source] Compute the rbf (gaussian) kernel between X and Y: K(x, y) = exp(-gamma ||x-y||^2) for each pair of rows x in X and y in Y. Read more in the User Guide. Parameters: X : array of shape (n_samples_X, n_features) Y : array of shape (n_samples_Y, n_features) gamma : float, default None If None, defaults to 1.0 / n_samples_X Returns: kernel_matrix : array of shape (n_samples_X, n_samples_Y)

sklearn.metrics.calinski_harabaz_score()

sklearn.metrics.calinski_harabaz_score(X, labels) [source] Compute the Calinski and Harabaz score. The score is defined as ratio between the within-cluster dispersion and the between-cluster dispersion. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) List of n_features-dimensional data points. Each row corresponds to a single data point. labels : array-like, shape (n_samples,) Predicted labels for each sample. Returns: score: float : The result

cross_validation.LeaveOneLabelOut()

Warning DEPRECATED class sklearn.cross_validation.LeaveOneLabelOut(labels) [source] Leave-One-Label_Out cross-validation iterator Deprecated since version 0.18: This module will be removed in 0.20. Use sklearn.model_selection.LeaveOneGroupOut instead. Provides train/test indices to split data according to a third-party provided label. This label information can be used to encode arbitrary domain specific stratifications of the samples as integers. For instance the labels could be the ye