-
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)
[source] -
Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed
Parameters: index : array-like, Series, or list of arrays/Series
Values to group by in the rows
columns : array-like, Series, or list of arrays/Series
Values to group by in the columns
values : array-like, optional
Array of values to aggregate according to the factors. Requires
aggfunc
be specified.aggfunc : function, optional
If specified, requires
values
be specified as wellrownames : sequence, default None
If passed, must match number of row arrays passed
colnames : sequence, default None
If passed, must match number of column arrays passed
margins : boolean, default False
Add row/column margins (subtotals)
dropna : boolean, default True
Do not include columns whose entries are all NaN
normalize : boolean, {?all?, ?index?, ?columns?}, or {0,1}, default False
Normalize by dividing all values by the sum of values.
- If passed ?all? or
True
, will normalize over all values. - If passed ?index? will normalize over each row.
- If passed ?columns? will normalize over each column.
- If margins is
True
, will also normalize margin values.
New in version 0.18.1.
Returns: crosstab : DataFrame
Notes
Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified.
Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.
In the event that there aren?t overlapping indexes an empty DataFrame will be returned.
Examples
123456789>>> a
array([foo, foo, foo, foo, bar, bar,
bar, bar, foo, foo, foo], dtype
=
object
)
>>> b
array([one, one, one, two, one, one,
one, two, two, two, one], dtype
=
object
)
>>> c
array([dull, dull, shiny, dull, dull, shiny,
shiny, dull, shiny, shiny, shiny], dtype
=
object
)
123456>>> crosstab(a, [b, c], rownames
=
[
'a'
], colnames
=
[
'b'
,
'c'
])
b one two
c dull shiny dull shiny
a
bar
1
2
1
0
foo
2
2
1
2
123456789>>> foo
=
pd.Categorical([
'a'
,
'b'
], categories
=
[
'a'
,
'b'
,
'c'
])
>>> bar
=
pd.Categorical([
'd'
,
'e'
], categories
=
[
'd'
,
'e'
,
'f'
])
>>> crosstab(foo, bar)
# 'c' and 'f' are not represented in the data,
# but they still will be counted in the output
col_0 d e f
row_0
a
1
0
0
b
0
1
0
c
0
0
0
- If passed ?all? or
pandas.crosstab()

2025-01-10 15:47:30
Please login to continue.