【发布时间】:2019-12-21 05:58:30
【问题描述】:
背景
我有以下 df
import pandas as pd
df = pd.DataFrame({'Text' : ['\n[SPORTS FAN]\nHere',
'Nothing here',
'\n[BASEBALL]\nTHIS SOUNDS right',
'\n[SPORTS FAN]\nLikes sports',
'Nothing is here',
'\n[NOT SPORTS]\nTHIS SOUNDS good',
'\n[SPORTS FAN]\nReally Big big fan',
'\n[BASEBALL]\nRARELY IS a fan'
],
'P_ID': [1,2,3,4,5,6,7,8],
'P_Name' : ['J J SMITH',
'J J SMITH',
'J J SMITH',
'J J SMITH',
'MARY HYDER',
'MARY HYDER',
'MARY HYDER',
'MARY HYDER']
})
输出
P_ID P_Name Text
0 1 J J SMITH \n[SPORTS FAN]\nHere
1 2 J J SMITH Nothing here
2 3 J J SMITH \n[BASEBALL]\nTHIS SOUNDS right
3 4 J J SMITH \n[SPORTS FAN]\nLikes sports
4 5 MARY HYDER Nothing is here
5 6 MARY HYDER \n[NOT SPORTS]\nTHIS SOUNDS good
6 7 MARY HYDER \n[SPORTS FAN]\nReally Big big fan
7 8 MARY HYDER \n[BASEBALL]\nRARELY IS a fan
目标
保留以'\n[SPORTS FAN]\ 和\n[BASEBALL]\n 开头的行
期望的输出
P_ID P_Name Text
0 1 J J SMITH \n[SPORTS FAN]\nHere
2 3 J J SMITH \n[BASEBALL]\nTHIS SOUNDS right
3 4 J J SMITH \n[SPORTS FAN]\nLikes sports
6 7 MARY HYDER \n[SPORTS FAN]\nReally Big big fan
7 8 MARY HYDER \n[BASEBALL]\nRARELY IS a fan
问题
如何实现我想要的输出?
【问题讨论】:
-
df.loc[df['Text'].str.contains(r'\n\[(?:SPORTS FAN|BASEBALL)\]\n')] -
完美。这行得通。
-
我假设
df.loc[]遍历 df 行,并返回一个新的 df ? -
@sin 我认为它被这样使用pandas.pydata.org/pandas-docs/stable/reference/api/…
标签: python regex string pandas text