pipeline.FeatureUnion()

class sklearn.pipeline.FeatureUnion(transformer_list, n_jobs=1, transformer_weights=None) [source] Concatenates results of multiple transformer objects. This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results. This is useful to combine several feature extraction mechanisms into a single transformer. Parameters of the transformers may be set using its name and the parameter name separated by a ?__?. A transformer may be replaced entir

Pipeline Anova SVM

Simple usage of Pipeline that runs successively a univariate feature selection with anova and then a C-SVM of the selected features. print(__doc__) from sklearn import svm from sklearn.datasets import samples_generator from sklearn.feature_selection import SelectKBest, f_regression from sklearn.pipeline import make_pipeline # import some data to play with X, y = samples_generator.make_classification( n_features=20, n_informative=3, n_redundant=0, n_classes=4, n_clusters_per_class=2)

PCA example with Iris Data-set

Principal Component Analysis applied to the Iris dataset. See here for more information on this dataset. print(__doc__) # Code source: Ga Varoquaux # License: BSD 3 clause import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from sklearn import decomposition from sklearn import datasets np.random.seed(5) centers = [[1, 1], [-1, -1], [1, -1]] iris = datasets.load_iris() X = iris.data y = iris.target fig = plt.figure(1, figsize=(4, 3)) plt.clf()

Path with L1- Logistic Regression

Computes path on IRIS dataset. print(__doc__) # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr> # License: BSD 3 clause from datetime import datetime import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model from sklearn import datasets from sklearn.svm import l1_min_c iris = datasets.load_iris() X = iris.data y = iris.target X = X[y != 2] y = y[y != 2] X -= np.mean(X, 0) Demo path functions cs = l1_min_c(X, y, loss='log') * np.logspace(0, 3) p

Partial Dependence Plots

Partial dependence plots show the dependence between the target function [2] and a set of ?target? features, marginalizing over the values of all other features (the complement features). Due to the limits of human perception the size of the target feature set must be small (usually, one or two) thus the target features are usually chosen among the most important features (see feature_importances_). This example shows how to obtain partial dependence plots from a GradientBoostingRegressor trai

Parameter estimation using grid search with cross-validation

This examples shows how a classifier is optimized by cross-validation, which is done using the sklearn.model_selection.GridSearchCV object on a development set that comprises only half of the available labeled data. The performance of the selected hyper-parameters and trained model is then measured on a dedicated evaluation set that was not used during the model selection step. More details on tools available for model selection can be found in the sections on Cross-validation: evaluating esti

Outlier detection with several methods.

When the amount of contamination is known, this example illustrates three different ways of performing Novelty and Outlier Detection: based on a robust estimator of covariance, which is assuming that the data are Gaussian distributed and performs better than the One-Class SVM in that case. using the One-Class SVM and its ability to capture the shape of the data set, hence performing better when the data is strongly non-Gaussian, i.e. with two well-separated clusters; using the Isolation Forest

Outlier detection on a real data set

This example illustrates the need for robust covariance estimation on a real data set. It is useful both for outlier detection and for a better understanding of the data structure. We selected two sets of two variables from the Boston housing data set as an illustration of what kind of analysis can be done with several outlier detection tools. For the purpose of visualization, we are working with two-dimensional examples, but one should be aware that things are not so trivial in high-dimension

Out-of-core classification of text documents

This is an example showing how scikit-learn can be used for classification using an out-of-core approach: learning from data that doesn?t fit into main memory. We make use of an online classifier, i.e., one that supports the partial_fit method, that will be fed with batches of examples. To guarantee that the features space remains the same over time we leverage a HashingVectorizer that will project each example into the same feature space. This is especially useful in the case of text classifi

Orthogonal Matching Pursuit

Using orthogonal matching pursuit for recovering a sparse signal from a noisy measurement encoded with a dictionary print(__doc__) import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import OrthogonalMatchingPursuit from sklearn.linear_model import OrthogonalMatchingPursuitCV from sklearn.datasets import make_sparse_coded_signal n_components, n_features = 512, 100 n_nonzero_coefs = 17 # generate the data ################### # y = Xw # |x|_0 = n_nonzero_coefs y, X,