base.BaseEstimator

class sklearn.base.BaseEstimator [source] Base class for all estimators in scikit-learn Notes All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs). Methods get_params([deep]) Get parameters for this estimator. set_params(\*\*params) Set the parameters of this estimator. __init__() x.__init__(...) initializes x; see help(type(x)) for signature get_params(deep=True) [source] Get para

sklearn.metrics.homogeneity_score()

sklearn.metrics.homogeneity_score(labels_true, labels_pred) [source] Homogeneity metric of a cluster labeling given a ground truth. A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class. This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won?t change the score value in any way. This metric is not symmetric: switching label_true with label_pred will return

sklearn.metrics.pairwise.paired_manhattan_distances()

sklearn.metrics.pairwise.paired_manhattan_distances(X, Y) [source] Compute the L1 distances between the vectors in X and Y. Read more in the User Guide. Parameters: X : array-like, shape (n_samples, n_features) Y : array-like, shape (n_samples, n_features) Returns: distances : ndarray (n_samples, )

1.13. Feature selection

The classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators? accuracy scores or to boost their performance on very high-dimensional datasets. 1.13.1. Removing features with low variance VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn?t meet some threshold. By default, it removes all zero-variance features, i.e. features that have th

Feature agglomeration

These images how similar features are merged together using feature agglomeration. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from sklearn import datasets, cluster from sklearn.feature_extraction.image import grid_to_graph digits = datasets.load_digits() images = digits.images X = np.reshape(images, (len(images), -1)) connectivity = grid_to_graph(*images[0].shape)

SVM: Separating hyperplane for unbalanced classes

Find the optimal separating hyperplane using an SVC for classes that are unbalanced. We first find the separating plane with a plain SVC and then plot (dashed) the separating hyperplane with automatically correction for unbalanced classes. Note This example will also work by replacing SVC(kernel="linear") with SGDClassifier(loss="hinge"). Setting the loss parameter of the SGDClassifier equal to hinge will yield behaviour such as that of a SVC with a linear kernel. For example try instead of t

SVM Exercise

A tutorial exercise for using different SVM kernels. This exercise is used in the Using kernels part of the Supervised learning: predicting an output variable from high-dimensional observations section of the A tutorial on statistical-learning for scientific data processing. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import datasets, svm iris = datasets.load_iris() X = iris.data y = iris.target X = X[y != 0, :2] y = y[y != 0] n_sample = len(X)

sklearn.metrics.f1_score()

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None) [source] Compute the F1 score, also known as balanced F-score or F-measure The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall)

sklearn.metrics.precision_score()

sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None) [source] Compute the precision The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0. Read more in the User Guide. Parameters: y_true : 1d array-like, or label

Prediction Latency

This is an example showing the prediction latency of various scikit-learn estimators. The goal is to measure the latency one can expect when doing predictions either in bulk or atomic (i.e. one by one) mode. The plots represent the distribution of the prediction latency as a boxplot. # Authors: Eustache Diemert <eustache@diemert.fr> # License: BSD 3 clause from __future__ import print_function from collections import defaultdict import time import gc import numpy as np import matplotli