Series.str.extract()

Series.str.extract(pat, flags=0, expand=None) [source]

For each subject string in the Series, extract groups from the first match of regular expression pat.

New in version 0.13.0.

Parameters:

Parameters:	pat : string Regular expression pattern with capturing groups flags : int, default 0 (no flags) re module flags, e.g. re.IGNORECASE .. versionadded:: 0.18.0 expand : bool, default False If True, return DataFrame. If False, return Series/Index/DataFrame.
Returns:	DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If expand=False and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).

pat : string

Regular expression pattern with capturing groups

flags : int, default 0 (no flags)

re module flags, e.g. re.IGNORECASE

.. versionadded:: 0.18.0

expand : bool, default False

Returns:

DataFrame with one row for each subject string, and one column for

each group. Any capture group names in regular expression pat will

be used for column names; otherwise capture group numbers will be

used. The dtype of each result column is always object, even when

no match is found. If expand=False and pat has only one capture group,

then return a Series (if subject is a Series) or Index (if subject

is an Index).

A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.

>>> s = Series(['a1', 'b2', 'c3'])
>>> s.str.extract('([ab])(\d)')
     0    1
0    a    1
1    b    2
2  NaN  NaN

A pattern may contain optional groups.

>>> s.str.extract('([ab])?(\d)')
     0  1
0    a  1
1    b  2
2  NaN  3

Named groups will become column names in the result.

>>> s.str.extract('(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2    NaN   NaN

A pattern with one group will return a DataFrame with one column if expand=True.

>>> s.str.extract('[ab](\d)', expand=True)
     0
0    1
1    2
2  NaN

A pattern with one group will return a Series if expand=False.

>>> s.str.extract('[ab](\d)', expand=False)
0      1
1      2
2    NaN
dtype: object

Links:

doc_Pandas

2025-01-10 15:47:30

Comments