【发布时间】:2020-06-17 15:48:27
【问题描述】:
我正在尝试借助正则表达式从 Twitter 文本中提取姓名。但是,尽管有模式,但返回的值是 none,但情况并非如此。我的代码哪里错了,我不知道。我正在使用 jupyter 实验室。
示例文本是 pd.Series full_text
0 RT @SeamusHughes: The Taliban Stamp of approva...
1 RT @WFaqiri: Taliban and Afghan groups find co...
2 RT @DavidCornDC: Imagine what Fox News would h...
3 RT @DavidCornDC: Imagine what Fox News would h...
4 RT @billroggio: Even if you are inclined to tr...
5 RT @billroggio: I am sure we will hear the arg...
6 RT @KFILE: This did happen and it went exactly...
Name: full_text, dtype: object
我的函数定义如下:
def extract_user(text):
m = re.search(r"RT\s@\w+:", text)
return m
并且,我将上述功能应用如下:
full_text.apply(extract_user)
但我得到的回报如下:
0 None
1 None
2 None
3 None
4 None
...
21299 None
21300 None
21301 None
21302 None
21303 None
Name: full_text, Length: 21304, dtype: object
【问题讨论】:
-
re.search 返回匹配对象
-
与
pandas的语法几乎相同:full_text.str.match("RT\s@\w+:")
标签: python regex pandas series