statsmodels.tools.tools.categorical
-
statsmodels.tools.tools.categorical(data, col=None, dictnames=False, drop=False)
[source] -
Returns a dummy matrix given an array of categorical variables.
Parameters: data : array
A structured array, recarray, or array. This can be either a 1d vector of the categorical variable or a 2d array with the column specifying the categorical variable specified by the col argument.
col : ?string?, int, or None
If data is a structured array or a recarray,
col
can be a string that is the name of the column that contains the variable. For all arrayscol
can be an int that is the (zero-based) column index number.col
can only be None for a 1d array. The default is None.dictnames : bool, optional
If True, a dictionary mapping the column number to the categorical name is returned. Used to have information about plain arrays.
drop : bool
Whether or not keep the categorical variable in the returned matrix.
Returns: dummy_matrix, [dictnames, optional] :
A matrix of dummy (indicator/binary) float variables for the categorical data. If dictnames is True, then the dictionary is returned as well.
Notes
This returns a dummy variable for EVERY distinct variable. If a a structured or recarray is provided, the names for the new variable is the old variable name - underscore - category name. So if the a variable ?vote? had answers as ?yes? or ?no? then the returned array would have to new variables? ?vote_yes? and ?vote_no?. There is currently no name checking.
Examples
12>>>
import
numpy as np
>>>
import
statsmodels.api as sm
Univariate examples
12345>>>
import
string
>>> string_var
=
[string.lowercase[
0
:
5
], string.lowercase[
5
:
10
], string.lowercase[
10
:
15
], string.lowercase[
15
:
20
], string.lowercase[
20
:
25
]]
>>> string_var
*
=
5
>>> string_var
=
np.asarray(
sorted
(string_var))
>>> design
=
sm.tools.categorical(string_var, drop
=
True
)
Or for a numerical categorical variable
12>>> instr
=
np.floor(np.arange(
10
,
60
, step
=
2
)
/
10
)
>>> design
=
sm.tools.categorical(instr, drop
=
True
)
With a structured array
1234567>>> num
=
np.random.randn(
25
,
2
)
>>> struct_ar
=
np.zeros((
25
,
1
), dtype
=
[(
'var1'
,
'f4'
),(
'var2'
,
'f4'
), (
'instrument'
,
'f4'
),(
'str_instr'
,
'a5'
)])
>>> struct_ar[
'var1'
]
=
num[:,
0
][:,
None
]
>>> struct_ar[
'var2'
]
=
num[:,
1
][:,
None
]
>>> struct_ar[
'instrument'
]
=
instr[:,
None
]
>>> struct_ar[
'str_instr'
]
=
string_var[:,
None
]
>>> design
=
sm.tools.categorical(struct_ar, col
=
'instrument'
, drop
=
True
)
Or
1>>> design2
=
sm.tools.categorical(struct_ar, col
=
'str_instr'
, drop
=
True
)
Please login to continue.