SGD: Weighted samples

Plot decision function of a weighted dataset, where the size of points is proportional to its weight. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import linear_model # we create 20 points np.random.seed(0) X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)] y = [1] * 10 + [-1] * 10 sample_weight = 100 * np.abs(np.random.randn(20)) # and assign a bigger weight to the last 10 samples sample_weight[:10] *= 10 # plot the weighted data points

SGD: Penalties

Plot the contours of the three penalties. All of the above are supported by sklearn.linear_model.stochastic_gradient. from __future__ import division print(__doc__) import numpy as np import matplotlib.pyplot as plt def l1(xs): return np.array([np.sqrt((1 - np.sqrt(x ** 2.0)) ** 2.0) for x in xs]) def l2(xs): return np.array([np.sqrt(1.0 - x ** 2.0) for x in xs]) def el(xs, z): return np.array([(2 - 2 * x - 2 * z + 4 * x * z - (4 * z ** 2

SGD: Maximum margin separating hyperplane

Plot the maximum margin separating hyperplane within a two-class separable dataset using a linear Support Vector Machines classifier trained using SGD. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import SGDClassifier from sklearn.datasets.samples_generator import make_blobs # we create 50 separable points X, Y = make_blobs(n_samples=50, centers=2, random_state=0, cluster_std=0.60) # fit the model clf = SGDClassifier(loss="hinge", alpha=0.0

SGD: convex loss functions

A plot that compares the various convex loss functions supported by sklearn.linear_model.SGDClassifier . print(__doc__) import numpy as np import matplotlib.pyplot as plt def modified_huber_loss(y_true, y_pred): z = y_pred * y_true loss = -4 * z loss[z >= -1] = (1 - z[z >= -1]) ** 2 loss[z >= 1.] = 0 return loss xmin, xmax = -4, 4 xx = np.linspace(xmin, xmax, 100) lw = 2 plt.plot([xmin, 0, 0, xmax], [1, 1, 0, 0], color='gold', lw=lw, label="Zero-o

semi_supervised.LabelSpreading()

class sklearn.semi_supervised.LabelSpreading(kernel='rbf', gamma=20, n_neighbors=7, alpha=0.2, max_iter=30, tol=0.001, n_jobs=1) [source] LabelSpreading model for semi-supervised learning This model is similar to the basic Label Propgation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels. Read more in the User Guide. Parameters: kernel : {?knn?, ?rbf?} String identifier for kernel function to use. Only ?rbf? and ?knn? kernels

semi_supervised.LabelPropagation()

class sklearn.semi_supervised.LabelPropagation(kernel='rbf', gamma=20, n_neighbors=7, alpha=1, max_iter=30, tol=0.001, n_jobs=1) [source] Label Propagation classifier Read more in the User Guide. Parameters: kernel : {?knn?, ?rbf?} String identifier for kernel function to use. Only ?rbf? and ?knn? kernels are currently supported.. gamma : float Parameter for rbf kernel n_neighbors : integer > 0 Parameter for knn kernel alpha : float Clamping factor max_iter : float Change maxim

Selecting the number of clusters with silhouette analysis on KMeans clustering

Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters like number of clusters visually. This measure has a range of [-1, 1]. Silhouette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that t

Selecting dimensionality reduction with Pipeline and GridSearchCV

This example constructs a pipeline that does dimensionality reduction followed by prediction with a support vector classifier. It demonstrates the use of GridSearchCV and Pipeline to optimize over different classes of estimators in a single CV run ? unsupervised PCA and NMF dimensionality reductions are compared to univariate feature selection during the grid search. # Authors: Robert McGibbon, Joel Nothman from __future__ import print_function, division import numpy as np import matplot

Segmenting the picture of a raccoon face in regions

This example uses Spectral clustering on a graph created from voxel-to-voxel difference on an image to break this image into multiple partly-homogeneous regions. This procedure (spectral clustering on an image) is an efficient approximate solution for finding normalized graph cuts. There are two options to assign labels: with ?kmeans? spectral clustering will cluster samples in the embedding space using a kmeans algorithm whereas ?discrete? will iteratively search for the closest partition spa

scikit-learn Tutorials

An introduction to machine learning with scikit-learnMachine learning: the problem setting Loading an example dataset Learning and predicting Model persistence Conventions A tutorial on statistical-learning for scientific data processingStatistical learning: the setting and the estimator object in scikit-learn Supervised learning: predicting an output variable from high-dimensional observations Model selection: choosing estimators and their parameters Unsupervised learning: seeking repres