-
sklearn.datasets.fetch_mldata(dataname, target_name='label', data_name='data', transpose_data=True, data_home=None)
[source] -
Fetch an mldata.org data set
If the file does not exist yet, it is downloaded from mldata.org .
mldata.org does not have an enforced convention for storing data or naming the columns in a data set. The default behavior of this function works well with the most common cases:
- data values are stored in the column ?data?, and target values in the column ?label?
- alternatively, the first column stores target values, and the second data values
- the data array is stored as
n_features x n_samples
, and thus needs to be transposed to match thesklearn
standard
Keyword arguments allow to adapt these defaults to specific data sets (see parameters
target_name
,data_name
,transpose_data
, and the examples below).mldata.org data sets may have multiple columns, which are stored in the Bunch object with their original name.
Parameters: dataname : :
Name of the data set on mldata.org, e.g.: ?leukemia?, ?Whistler Daily Snowfall?, etc. The raw name is automatically converted to a mldata.org URL .
target_name : optional, default: ?label?
Name or index of the column containing the target values.
data_name : optional, default: ?data?
Name or index of the column containing the data.
transpose_data : optional, default: True
If True, transpose the downloaded data array.
data_home : optional, default: None
Specify another download and cache folder for the data sets. By default all scikit learn data is stored in ?~/scikit_learn_data? subfolders.
Returns: data : Bunch
Dictionary-like object, the interesting attributes are: ?data?, the data to learn, ?target?, the classification labels, ?DESCR?, the full description of the dataset, and ?COL_NAMES?, the original names of the dataset columns.
Examples
Load the ?iris? dataset from mldata.org:
123>>>
from
sklearn.datasets.mldata
import
fetch_mldata
>>>
import
tempfile
>>> test_data_home
=
tempfile.mkdtemp()
12345>>> iris
=
fetch_mldata(
'iris'
, data_home
=
test_data_home)
>>> iris.target.shape
(
150
,)
>>> iris.data.shape
(
150
,
4
)
Load the ?leukemia? dataset from mldata.org, which needs to be transposed to respects the scikit-learn axes convention:
1234>>> leuk
=
fetch_mldata(
'leukemia'
, transpose_data
=
True
,
... data_home
=
test_data_home)
>>> leuk.data.shape
(
72
,
7129
)
Load an alternative ?iris? dataset, which has different names for the columns:
12345>>> iris2
=
fetch_mldata(
'datasets-UCI iris'
, target_name
=
1
,
... data_name
=
0
, data_home
=
test_data_home)
>>> iris3
=
fetch_mldata(
'datasets-UCI iris'
,
... target_name
=
'class'
, data_name
=
'double0'
,
... data_home
=
test_data_home)
12>>>
import
shutil
>>> shutil.rmtree(test_data_home)
sklearn.datasets.fetch_mldata()
Examples using

2025-01-10 15:47:30
Please login to continue.