DataFrameGroupBy.describe()

DataFrameGroupBy.describe(percentiles=None, include=None, exclude=None)

Generate various summary statistics, excluding NaN values.

Parameters:

Parameters:	percentiles : array-like, optional The percentiles to include in the output. Should all be in the interval [0, 1]. By default `percentiles` is [.25, .5, .75], returning the 25th, 50th, and 75th percentiles. include, exclude : list-like, ?all?, or None (default) Specify the form of the returned result. Either: None to both (default). The result will include only numeric-typed columns or, if none are, only categorical columns. A list of dtypes or strings to be included/excluded. To select all numeric types use numpy numpy.number. To select categorical objects use type object. See also the select_dtypes documentation. eg. df.describe(include=[?O?]) If include is the string ?all?, the output column-set will match the input one.
Returns:	summary: NDFrame of summary statistics

percentiles : array-like, optional

The percentiles to include in the output. Should all be in the interval [0, 1]. By default percentiles is [.25, .5, .75], returning the 25th, 50th, and 75th percentiles.

include, exclude : list-like, ?all?, or None (default)

Specify the form of the returned result. Either:

None to both (default). The result will include only numeric-typed columns or, if none are, only categorical columns.
A list of dtypes or strings to be included/excluded. To select all numeric types use numpy numpy.number. To select categorical objects use type object. See also the select_dtypes documentation. eg. df.describe(include=[?O?])
If include is the string ?all?, the output column-set will match the input one.

Returns:

summary: NDFrame of summary statistics

Notes

The output DataFrame index depends on the requested dtypes:

For numeric dtypes, it will include: count, mean, std, min, max, and lower, 50, and upper percentiles.

For object dtypes (e.g. timestamps or strings), the index will include the count, unique, most common, and frequency of the most common. Timestamps also include the first and last items.

For mixed dtypes, the index will be the union of the corresponding output types. Non-applicable entries will be filled with NaN. Note that mixed-dtype outputs can only be returned from mixed-dtype inputs and appropriate use of the include/exclude arguments.

If multiple values have the highest count, then the count and most common pair will be arbitrarily chosen from among those with the highest count.

The include, exclude arguments are ignored for Series.

Links:

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.core.groupby.DataFrameGroupBy.describe.html

doc_Pandas

2025-01-10 15:47:30

Comments