在 python 中使用 pandas 将关键字映射到数据框列答案

【问题标题】：Mapping keyword with a dataframe column using pandas in python在 python 中使用 pandas 将关键字映射到数据框列
【发布时间】：2018-03-15 12:34:42
【问题描述】：

我有一个数据框，

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

还有一个列表，

my_list=["one"]

 I tried mask=df["Description"].str.contains('|'.join(my_list),na=False)

但它给了，

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player

它必须考虑阶段列，我想要与描述相关的所有行。

【问题讨论】：

print (df.index) 是什么？
不，mask=df["Description"].str.contains(my_list,na=False) 效果很好，但我也想选择另一行。直到阶段结束或阶段再次为 1
我的电脑挂了重启和回答，给我一些时间。
Name 列是否可能匹配数据？
否，见阶段 =2 ，名称列为空

标签： python regex pandas dataframe data-analysis

【解决方案1】：

它似乎在数据帧的描述字段中找到“一个”并返回匹配的描述。

如果你想要第三行，你必须为第二个匹配添加一个数组元素

例如。 'Thanks' 类似 my_list=["one", "Thanks"]

【讨论】：

【解决方案2】：

我认为你需要：

print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1              2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#replace empty or whitespaces by previous value
df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()
print (df)
     Name  Stage                                Description
0     Sri      1  Sri is one of the good singer in this two
1     Sri      2                         Thanks for reading
2     Ram      1      Ram is one of the good cricket player
3  ganesh      1                                good driver

#get all names by condition
my_list = ["one"]
names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']
print (names)
0    Sri
2    Ram
Name: Name, dtype: object

#select all rows contains names
df = df[df['Name'].isin(names)]
print (df)
  Name  Stage                                Description
0  Sri      1  Sri is one of the good singer in this two
1  Sri      2                         Thanks for reading
2  Ram      1      Ram is one of the good cricket player

【讨论】：

如果我们有 my_list ["Thanks"]，它会给我“感谢阅读”行。但是我不想在舞台不是1时映射。有没有办法？
我只想在阶段为 1 时映射 my_list 和 df["Description"]。如果我们找到匹配项，我们将获得特定描述的所有阶段。
是的，我认为 df["Description"].str.contains("|".join(my_list),na=False) 需要 df["Description"].str.contains("|".join(my_list),na=False) & (df['Stage'] == 1)
你想替换这个 names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name '] 如果我只得到列，则没有值
是的，Jezrael，它工作正常。我仔细检查。谢谢你的回答