-
Series.str.extract(pat, flags=0, expand=None)
[source] -
For each subject string in the Series, extract groups from the first match of regular expression pat.
New in version 0.13.0.
Parameters: pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
.. versionadded:: 0.18.0
expand : bool, default False
- If True, return DataFrame.
- If False, return Series/Index/DataFrame.
Returns: DataFrame with one row for each subject string, and one column for
each group. Any capture group names in regular expression pat will
be used for column names; otherwise capture group numbers will be
used. The dtype of each result column is always object, even when
no match is found. If expand=False and pat has only one capture group,
then return a Series (if subject is a Series) or Index (if subject
is an Index).
See also
-
extractall
- returns all matches (not just the first match)
Examples
A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.
123456>>> s
=
Series([
'a1'
,
'b2'
,
'c3'
])
>>> s.
str
.extract(
'([ab])(\d)'
)
0
1
0
a
1
1
b
2
2
NaN NaN
A pattern may contain optional groups.
12345>>> s.
str
.extract(
'([ab])?(\d)'
)
0
1
0
a
1
1
b
2
2
NaN
3
Named groups will become column names in the result.
12345>>> s.
str
.extract(
'(?P<letter>[ab])(?P<digit>\d)'
)
letter digit
0
a
1
1
b
2
2
NaN NaN
A pattern with one group will return a DataFrame with one column if expand=True.
12345>>> s.
str
.extract(
'[ab](\d)'
, expand
=
True
)
0
0
1
1
2
2
NaN
A pattern with one group will return a Series if expand=False.
12345>>> s.
str
.extract(
'[ab](\d)'
, expand
=
False
)
0
1
1
2
2
NaN
dtype:
object
Series.str.extract()

2025-01-10 15:47:30
Please login to continue.