-
sklearn.datasets.fetch_mldata(dataname, target_name='label', data_name='data', transpose_data=True, data_home=None)
[source] -
Fetch an mldata.org data set
If the file does not exist yet, it is downloaded from mldata.org .
mldata.org does not have an enforced convention for storing data or naming the columns in a data set. The default behavior of this function works well with the most common cases:
- data values are stored in the column ?data?, and target values in the column ?label?
- alternatively, the first column stores target values, and the second data values
- the data array is stored as
n_features x n_samples
, and thus needs to be transposed to match thesklearn
standard
Keyword arguments allow to adapt these defaults to specific data sets (see parameters
target_name
,data_name
,transpose_data
, and the examples below).mldata.org data sets may have multiple columns, which are stored in the Bunch object with their original name.
Parameters: dataname : :
Name of the data set on mldata.org, e.g.: ?leukemia?, ?Whistler Daily Snowfall?, etc. The raw name is automatically converted to a mldata.org URL .
target_name : optional, default: ?label?
Name or index of the column containing the target values.
data_name : optional, default: ?data?
Name or index of the column containing the data.
transpose_data : optional, default: True
If True, transpose the downloaded data array.
data_home : optional, default: None
Specify another download and cache folder for the data sets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders.
Returns: data : Bunch
Dictionary-like object, the interesting attributes are: ?data?, the data to learn, ?target?, the classification labels, ?DESCR?, the full description of the dataset, and ?COL_NAMES?, the original names of the dataset columns.
Examples
Load the ?iris? dataset from mldata.org:
>>> from sklearn.datasets.mldata import fetch_mldata >>> import tempfile >>> test_data_home = tempfile.mkdtemp()
>>> iris = fetch_mldata('iris', data_home=test_data_home) >>> iris.target.shape (150,) >>> iris.data.shape (150, 4)
Load the ?leukemia? dataset from mldata.org, which needs to be transposed to respects the scikit-learn axes convention:
>>> leuk = fetch_mldata('leukemia', transpose_data=True, ... data_home=test_data_home) >>> leuk.data.shape (72, 7129)
Load an alternative ?iris? dataset, which has different names for the columns:
>>> iris2 = fetch_mldata('datasets-UCI iris', target_name=1, ... data_name=0, data_home=test_data_home) >>> iris3 = fetch_mldata('datasets-UCI iris', ... target_name='class', data_name='double0', ... data_home=test_data_home)
>>> import shutil >>> shutil.rmtree(test_data_home)
sklearn.datasets.fetch_mldata()
Examples using
2017-01-15 04:25:41
Please login to continue.