Working with Text Data

Series and Index are equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically. These are accessed via the str attribute and generally have names matching the equivalent (scalar) built-in string methods: In [1]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) In [2]: s.str.lower() Out[2]: 0 a 1 b 2 c 3 aaba 4 ba

Working with missing data

In this section, we will discuss missing (also referred to as NA) values in pandas. Note The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example, scikits.timeseries. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas. See the cookbook for some advanced strategies Missing data basics When / why does

Window.sum()

Window.sum(*args, **kwargs) [source] window sum Parameters: how : string, default None (DEPRECATED) Method for down- or re-sampling Returns: same type as input See also pandas.Series.window, pandas.DataFrame.window

Window.mean()

Window.mean(*args, **kwargs) [source] window mean Parameters: how : string, default None (DEPRECATED) Method for down- or re-sampling Returns: same type as input See also pandas.Series.window, pandas.DataFrame.window

Visualization

We use the standard convention for referencing the matplotlib API: In [1]: import matplotlib.pyplot as plt The plots in this document are made using matplotlib?s ggplot style (new in version 1.4): import matplotlib matplotlib.style.use('ggplot') We provide the basics in pandas to easily create decent looking plots. See the ecosystem section for visualization libraries that go beyond the basics documented here. Note All calls to np.random are seeded with 123456. Basic Plotting: plot See t

Tutorials

This is a guide to many pandas tutorials, geared mainly for new users. Internal Guides pandas own 10 Minutes to pandas More complex recipes are in the Cookbook pandas Cookbook The goal of this cookbook (by Julia Evans) is to give you some concrete examples for getting started with pandas. These are examples with real-world data, and all the bugs and weirdness that that entails. Here are links to the v0.1 release. For an up-to-date table of contents, see the pandas-cookbook GitHub repository

TimedeltaIndex[source]

class pandas.TimedeltaIndex [source] Immutable ndarray of timedelta64 data, represented internally as int64, and which can be boxed to timedelta objects Parameters: data : array-like (1-dimensional), optional Optional timedelta-like data to construct index with unit: unit of the arg (D,h,m,s,ms,us,ns) denote the unit, optional which is an integer/float number freq: a frequency for the index, optional copy : bool Make a copy of input ndarray start : starting value, timedelta-like, opt

TimedeltaIndex.where()

TimedeltaIndex.where(cond, other=None) [source] New in version 0.19.0. Return an Index of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. Parameters: cond : boolean same length as self other : scalar, or array-like

TimedeltaIndex.view()

TimedeltaIndex.view(cls=None) [source]

TimedeltaIndex.value_counts()

TimedeltaIndex.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True) [source] Returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default. Parameters: normalize : boolean, default False If True then the object returned will contain the relative frequencies of the unique values. sort : boolean, default True Sort by values a