Comparison with R / R libraries

Since pandas aims to provide a lot of the data manipulation and analysis functionality that people use R for, this page was started to provide a more detailed look at the R language and its many third party libraries as they relate to pandas. In comparisons with R and CRAN libraries, we care about the following things: Functionality / flexibility: what can/cannot be done with each tool Performance: how fast are operations. Hard numbers/benchmarks are preferable Ease-of-use: Is one tool eas

Caveats and Gotchas

Using If/Truth Statements with pandas pandas follows the numpy convention of raising an error when you try to convert something to a bool. This happens in a if or when using the boolean operations, and, or, or not. It is not clear what the result of >>> if pd.Series([False, True, False]): ... should be. Should it be True because it?s not zero-length? False because there are False values? It is unclear, so instead, pandas raises a ValueError: >>> if pd.Series([False, Tr

CategoricalIndex[source]

class pandas.CategoricalIndex [source] Immutable Index implementing an ordered, sliceable set. CategoricalIndex represents a sparsely populated Index with an underlying Categorical. New in version 0.16.1. Parameters: data : array-like or Categorical, (1-dimensional) categories : optional, array-like categories for the CategoricalIndex ordered : boolean, designating if the categories are ordered copy : bool Make a copy of input ndarray name : object Name to be stored in the index

CategoricalIndex.where()

CategoricalIndex.where(cond, other=None) [source] New in version 0.19.0. Return an Index of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. Parameters: cond : boolean same length as self other : scalar, or array-like

CategoricalIndex.view()

CategoricalIndex.view(cls=None) [source]

CategoricalIndex.value_counts()

CategoricalIndex.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True) [source] Returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default. Parameters: normalize : boolean, default False If True then the object returned will contain the relative frequencies of the unique values. sort : boolean, default True Sort by values

CategoricalIndex.values

CategoricalIndex.values return the underlying data, which is a Categorical

CategoricalIndex.unique()

CategoricalIndex.unique() [source] Return Index of unique values in the object. Significantly faster than numpy.unique. Includes NA values. The order of the original is preserved. Returns: uniques : Index

CategoricalIndex.union()

CategoricalIndex.union(other) [source] Form the union of two Index objects and sorts if possible. Parameters: other : Index or array-like Returns: union : Index Examples >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

CategoricalIndex.transpose()

CategoricalIndex.transpose(*args, **kwargs) [source] return the transpose, which is by definition self