sklearn.datasets.fetch_mldata()

sklearn.datasets.fetch_mldata(dataname, target_name='label', data_name='data', transpose_data=True, data_home=None) [source] Fetch an mldata.org data set If the file does not exist yet, it is downloaded from mldata.org . mldata.org does not have an enforced convention for storing data or naming the columns in a data set. The default behavior of this function works well with the most common cases: data values are stored in the column ?data?, and target values in the column ?label? alternati

sklearn.datasets.fetch_lfw_people()

sklearn.datasets.fetch_lfw_people(data_home=None, funneled=True, resize=0.5, min_faces_per_person=0, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True) [source] Loader for the Labeled Faces in the Wild (LFW) people dataset This dataset is a collection of JPEG pictures of famous people collected on the internet, all details are available on the official website: http://vis-www.cs.umass.edu/lfw/ Each picture is centered on a single face. Each pixel of

sklearn.datasets.fetch_lfw_pairs()

sklearn.datasets.fetch_lfw_pairs(subset='train', data_home=None, funneled=True, resize=0.5, color=False, slice_=(slice(70, 195, None), slice(78, 172, None)), download_if_missing=True) [source] Loader for the Labeled Faces in the Wild (LFW) pairs dataset This dataset is a collection of JPEG pictures of famous people collected on the internet, all details are available on the official website: http://vis-www.cs.umass.edu/lfw/ Each picture is centered on a single face. Each pixel of each chan

sklearn.datasets.fetch_kddcup99()

sklearn.datasets.fetch_kddcup99(subset=None, shuffle=False, random_state=None, percent10=True, download_if_missing=True) [source] Load and return the kddcup 99 dataset (classification). The KDD Cup ?99 dataset was created by processing the tcpdump portions of the 1998 DARPA Intrusion Detection System (IDS) Evaluation dataset, created by MIT Lincoln Lab [1] . The artificial data was generated using a closed network and hand-injected attacks to produce a large number of different types of att

sklearn.datasets.fetch_covtype()

sklearn.datasets.fetch_covtype(data_home=None, download_if_missing=True, random_state=None, shuffle=False) [source] Load the covertype dataset, downloading it if necessary. Read more in the User Guide. Parameters: data_home : string, optional Specify another download and cache folder for the datasets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders. download_if_missing : boolean, default=True If False, raise a IOError if the data is not locally available i

sklearn.datasets.fetch_california_housing()

sklearn.datasets.fetch_california_housing(data_home=None, download_if_missing=True) [source] Loader for the California housing dataset from StatLib. Read more in the User Guide. Parameters: data_home : optional, default: None Specify another download and cache folder for the datasets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders. download_if_missing: optional, True by default : If False, raise a IOError if the data is not locally available instead of tr

sklearn.datasets.fetch_20newsgroups_vectorized()

sklearn.datasets.fetch_20newsgroups_vectorized(subset='train', remove=(), data_home=None) [source] Load the 20 newsgroups dataset and transform it into tf-idf vectors. This is a convenience function; the tf-idf transformation is done using the default settings for sklearn.feature_extraction.text.Vectorizer. For more advanced usage (stopword filtering, n-gram extraction, etc.), combine fetch_20newsgroups with a custom Vectorizer or CountVectorizer. Read more in the User Guide. Parameters: s

sklearn.datasets.fetch_20newsgroups()

sklearn.datasets.fetch_20newsgroups(data_home=None, subset='train', categories=None, shuffle=True, random_state=42, remove=(), download_if_missing=True) [source] Load the filenames and data from the 20 newsgroups dataset. Read more in the User Guide. Parameters: subset : ?train? or ?test?, ?all?, optional Select the dataset to load: ?train? for the training set, ?test? for the test set, ?all? for both, with shuffled ordering. data_home : optional, default: None Specify a download and ca

sklearn.datasets.dump_svmlight_file()

sklearn.datasets.dump_svmlight_file(X, y, f, zero_based=True, comment=None, query_id=None, multilabel=False) [source] Dump the dataset in svmlight / libsvm file format. This format is a text-based format, with one sample per line. It does not store zero valued features hence is suitable for sparse dataset. The first element of each line can be used to store a target variable to predict. Parameters: X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_

sklearn.datasets.clear_data_home()

sklearn.datasets.clear_data_home(data_home=None) [source] Delete all the content of the data home cache.