tools.tools.categorical()

statsmodels.tools.tools.categorical

statsmodels.tools.tools.categorical(data, col=None, dictnames=False, drop=False) [source]

Returns a dummy matrix given an array of categorical variables.

Parameters:

Parameters:	data : array A structured array, recarray, or array. This can be either a 1d vector of the categorical variable or a 2d array with the column specifying the categorical variable specified by the col argument. col : ?string?, int, or None If data is a structured array or a recarray, `col` can be a string that is the name of the column that contains the variable. For all arrays `col` can be an int that is the (zero-based) column index number. `col` can only be None for a 1d array. The default is None. dictnames : bool, optional If True, a dictionary mapping the column number to the categorical name is returned. Used to have information about plain arrays. drop : bool Whether or not keep the categorical variable in the returned matrix.
Returns:	dummy_matrix, [dictnames, optional] : A matrix of dummy (indicator/binary) float variables for the categorical data. If dictnames is True, then the dictionary is returned as well.

data : array

A structured array, recarray, or array. This can be either a 1d vector of the categorical variable or a 2d array with the column specifying the categorical variable specified by the col argument.

col : ?string?, int, or None

If data is a structured array or a recarray, col can be a string that is the name of the column that contains the variable. For all arrays col can be an int that is the (zero-based) column index number. col can only be None for a 1d array. The default is None.

dictnames : bool, optional

If True, a dictionary mapping the column number to the categorical name is returned. Used to have information about plain arrays.

drop : bool

Whether or not keep the categorical variable in the returned matrix.

Returns:

dummy_matrix, [dictnames, optional] :

A matrix of dummy (indicator/binary) float variables for the categorical data. If dictnames is True, then the dictionary is returned as well.

Notes

This returns a dummy variable for EVERY distinct variable. If a a structured or recarray is provided, the names for the new variable is the old variable name - underscore - category name. So if the a variable ?vote? had answers as ?yes? or ?no? then the returned array would have to new variables? ?vote_yes? and ?vote_no?. There is currently no name checking.

Examples

>>> import numpy as np
>>> import statsmodels.api as sm

Univariate examples

>>> import string
>>> string_var = [string.lowercase[0:5], string.lowercase[5:10],                   string.lowercase[10:15], string.lowercase[15:20],                   string.lowercase[20:25]]
>>> string_var *= 5
>>> string_var = np.asarray(sorted(string_var))
>>> design = sm.tools.categorical(string_var, drop=True)

Or for a numerical categorical variable

>>> instr = np.floor(np.arange(10,60, step=2)/10)
>>> design = sm.tools.categorical(instr, drop=True)

With a structured array

>>> num = np.random.randn(25,2)
>>> struct_ar = np.zeros((25,1), dtype=[('var1', 'f4'),('var2', 'f4'),                      ('instrument','f4'),('str_instr','a5')])
>>> struct_ar['var1'] = num[:,0][:,None]
>>> struct_ar['var2'] = num[:,1][:,None]
>>> struct_ar['instrument'] = instr[:,None]
>>> struct_ar['str_instr'] = string_var[:,None]
>>> design = sm.tools.categorical(struct_ar, col='instrument', drop=True)

>>> design2 = sm.tools.categorical(struct_ar, col='str_instr', drop=True)

Links:

http://statsmodels.sourceforge.net/stable/generated/statsmodels.tools.tools.categorical.html

doc_statsmodels

2025-01-10 15:47:30

Comments