sklearn.datasets.make_moons()

sklearn.datasets.make_moons(n_samples=100, shuffle=True, noise=None, random_state=None) [source] Make two interleaving half circles A simple toy dataset to visualize clustering and classification algorithms. Read more in the User Guide. Parameters: n_samples : int, optional (default=100) The total number of points generated. shuffle : bool, optional (default=True) Whether to shuffle the samples. noise : double or None (default=None) Standard deviation of Gaussian noise added to the da

sklearn.datasets.make_low_rank_matrix()

sklearn.datasets.make_low_rank_matrix(n_samples=100, n_features=100, effective_rank=10, tail_strength=0.5, random_state=None) [source] Generate a mostly low rank matrix with bell-shaped singular values Most of the variance can be explained by a bell-shaped curve of width effective_rank: the low rank part of the singular values profile is: (1 - tail_strength) * exp(-1.0 * (i / effective_rank) ** 2) The remaining singular values? tail is fat, decreasing as: tail_strength * exp(-0.1 * i / eff

sklearn.datasets.make_hastie_10_2()

sklearn.datasets.make_hastie_10_2(n_samples=12000, random_state=None) [source] Generates data for binary classification used in Hastie et al. 2009, Example 10.2. The ten features are standard independent Gaussian and the target y is defined by: y[i] = 1 if np.sum(X[i] ** 2) > 9.34 else -1 Read more in the User Guide. Parameters: n_samples : int, optional (default=12000) The number of samples. random_state : int, RandomState instance or None, optional (default=None) If int, random_st

sklearn.datasets.make_gaussian_quantiles()

sklearn.datasets.make_gaussian_quantiles(mean=None, cov=1.0, n_samples=100, n_features=2, n_classes=3, shuffle=True, random_state=None) [source] Generate isotropic Gaussian and label samples by quantile This classification dataset is constructed by taking a multi-dimensional standard normal distribution and defining classes separated by nested concentric multi-dimensional spheres such that roughly equal numbers of samples are in each class (quantiles of the distribution). Read more in the

sklearn.datasets.make_friedman3()

sklearn.datasets.make_friedman3(n_samples=100, noise=0.0, random_state=None) [source] Generate the ?Friedman #3? regression problem This dataset is described in Friedman [1] and Breiman [2]. Inputs X are 4 independent features uniformly distributed on the intervals: 0 <= X[:, 0] <= 100, 40 * pi <= X[:, 1] <= 560 * pi, 0 <= X[:, 2] <= 1, 1 <= X[:, 3] <= 11. The output y is created according to the formula: y(X) = arctan((X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X[:, 3])) /

sklearn.datasets.make_friedman2()

sklearn.datasets.make_friedman2(n_samples=100, noise=0.0, random_state=None) [source] Generate the ?Friedman #2? regression problem This dataset is described in Friedman [1] and Breiman [2]. Inputs X are 4 independent features uniformly distributed on the intervals: 0 <= X[:, 0] <= 100, 40 * pi <= X[:, 1] <= 560 * pi, 0 <= X[:, 2] <= 1, 1 <= X[:, 3] <= 11. The output y is created according to the formula: y(X) = (X[:, 0] ** 2 + (X[:, 1] * X[:, 2] - 1 / (X[:, 1] * X

sklearn.datasets.make_friedman1()

sklearn.datasets.make_friedman1(n_samples=100, n_features=10, noise=0.0, random_state=None) [source] Generate the ?Friedman #1? regression problem This dataset is described in Friedman [1] and Breiman [2]. Inputs X are independent features uniformly distributed on the interval [0, 1]. The output y is created according to the formula: y(X) = 10 * sin(pi * X[:, 0] * X[:, 1]) + 20 * (X[:, 2] - 0.5) ** 2 + 10 * X[:, 3] + 5 * X[:, 4] + noise * N(0, 1). Out of the n_features features, only 5 are

sklearn.datasets.make_classification()

sklearn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None) [source] Generate a random n-class classification problem. This initially creates clusters of points normally distributed (std=1) about vertices of a 2 * class_sep-sided hypercube, and assigns an equal number of clusters to each cla

sklearn.datasets.make_circles()

sklearn.datasets.make_circles(n_samples=100, shuffle=True, noise=None, random_state=None, factor=0.8) [source] Make a large circle containing a smaller circle in 2d. A simple toy dataset to visualize clustering and classification algorithms. Read more in the User Guide. Parameters: n_samples : int, optional (default=100) The total number of points generated. shuffle: bool, optional (default=True) : Whether to shuffle the samples. noise : double or None (default=None) Standard deviatio

sklearn.datasets.make_checkerboard()

sklearn.datasets.make_checkerboard(shape, n_clusters, noise=0.0, minval=10, maxval=100, shuffle=True, random_state=None) [source] Generate an array with block checkerboard structure for biclustering. Read more in the User Guide. Parameters: shape : iterable (n_rows, n_cols) The shape of the result. n_clusters : integer or iterable (n_row_clusters, n_column_clusters) The number of row and column clusters. noise : float, optional (default=0.0) The standard deviation of the gaussian nois