If your number of features is high, it may be useful to reduce it with an unsupervised step prior to supervised steps. Many of the Unsupervised learning methods implement a transform
method that can be used to reduce the dimensionality. Below we discuss two specific example of this pattern that are heavily used.
Pipelining
The unsupervised data reduction and the supervised estimator can be chained in one step. See Pipeline: chaining estimators.
4.4.1. PCA: principal component analysis
decomposition.PCA
looks for a combination of features that capture well the variance of the original features. See Decomposing signals in components (matrix factorization problems).
4.4.2. Random projections
The module: random_projection
provides several tools for data reduction by random projections. See the relevant section of the documentation: Random Projection.
4.4.3. Feature agglomeration
cluster.FeatureAgglomeration
applies Hierarchical clustering to group together features that behave similarly.
Feature scaling
Note that if features have very different scaling or statistical properties, cluster.FeatureAgglomeration
may not be able to capture the links between related features. Using a preprocessing.StandardScaler
can be useful in these settings.
Please login to continue.