The Iris Dataset

This data sets consists of 3 different types of irises? (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length and Petal Width. The below plot uses the first two features. See here for more information on this dataset. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import matplotlib.pyplot as

The Digit Dataset

This dataset is made up of 1797 8x8 images. Each image, like the one shown below, is of a hand-written digit. In order to utilize an 8x8 figure like this, we?d have to first transform it into a feature vector with length 64. See here for more information about this dataset. print(__doc__) # Code source: Ga Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause from sklearn import datasets import matplotlib.pyplot as plt #Load the digits dataset digits = datas

Test with permutations the significance of a classification score

In order to test if a classification score is significative a technique in repeating the classification procedure after randomizing, permuting, the labels. The p-value is then given by the percentage of runs for which the score obtained is greater than the classification score obtained in the first place. # Author: Alexandre Gramfort <alexandre.gramfort@inria.fr> # License: BSD 3 clause print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn.svm import SVC from

Swiss Roll reduction with LLE

An illustration of Swiss Roll reduction with locally linear embedding Out: Computing LLE embedding Done. Reconstruction error: 9.45487e-08 # Author: Fabian Pedregosa -- <fabian.pedregosa@inria.fr> # License: BSD 3 clause (C) INRIA 2011 print(__doc__) import matplotlib.pyplot as plt # This import is needed to modify the way figure behaves from mpl_toolkits.mplot3d import Axes3D Axes3D #---------------------------------------------------------------------- # Locally linear embe

SVM: Weighted samples

Plot decision function of a weighted dataset, where the size of points is proportional to its weight. The sample weighting rescales the C parameter, which means that the classifier puts more emphasis on getting these points right. The effect might often be subtle. To emphasize the effect here, we particularly weight outliers, making the deformation of the decision boundary very visible. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import svm def plot_de

SVM: Separating hyperplane for unbalanced classes

Find the optimal separating hyperplane using an SVC for classes that are unbalanced. We first find the separating plane with a plain SVC and then plot (dashed) the separating hyperplane with automatically correction for unbalanced classes. Note This example will also work by replacing SVC(kernel="linear") with SGDClassifier(loss="hinge"). Setting the loss parameter of the SGDClassifier equal to hinge will yield behaviour such as that of a SVC with a linear kernel. For example try instead of t

SVM: Maximum margin separating hyperplane

Plot the maximum margin separating hyperplane within a two-class separable dataset using a Support Vector Machine classifier with linear kernel. print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import svm # we create 40 separable points np.random.seed(0) X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]] Y = [0] * 20 + [1] * 20 # fit the model clf = svm.SVC(kernel='linear') clf.fit(X, Y) # get the separating hyperplane w = clf.coef

svm.SVR()

class sklearn.svm.SVR(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1) [source] Epsilon-Support Vector Regression. The free parameters in the model are C and epsilon. The implementation is based on libsvm. Read more in the User Guide. Parameters: C : float, optional (default=1.0) Penalty parameter C of the error term. epsilon : float, optional (default=0.1) Epsilon in the epsilon-SVR model. It spe

svm.SVC()

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None) [source] C-Support Vector Classification. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples. The multiclass support

svm.OneClassSVM()

class sklearn.svm.OneClassSVM(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, random_state=None) [source] Unsupervised Outlier Detection. Estimate the support of a high-dimensional distribution. The implementation is based on libsvm. Read more in the User Guide. Parameters: kernel : string, optional (default=?rbf?) Specifies the kernel type to be used in the algorithm. It must be one of ?linear?, ?poly?, ?rbf?