【问题标题】:Mapping keyword with a dataframe column using pandas in python在 python 中使用 pandas 将关键字映射到数据框列
【发布时间】:2018-03-15 12:34:42
【问题描述】:

我有一个数据框,

DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player
ganesh  1       good driver

还有一个列表,

my_list=["one"]

 I tried mask=df["Description"].str.contains('|'.join(my_list),na=False)

但它给了,

 output_DF.
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
Ram     1       Ram is one of the good cricket player

My desired output is,
desired_DF,
Name    Stage   Description
Sri     1       Sri is one of the good singer in this two
        2       Thanks for reading
Ram     1       Ram is one of the good cricket player

它必须考虑阶段列,我想要与描述相关的所有行。

【问题讨论】:

  • print (df.index) 是什么?
  • 不,mask=df["Description"].str.contains(my_list,na=False) 效果很好,但我也想选择另一行。直到阶段结束或阶段再次为 1
  • 我的电脑挂了 重启和回答,给我一些时间。
  • Name 列是否可能匹配数据?
  • 否,见阶段 =2 ,名称列为空

标签: python regex pandas dataframe data-analysis


【解决方案1】:

它似乎在数据帧的描述字段中找到“一个”并返回匹配的描述。

如果你想要第三行,你必须为第二个匹配添加一个数组元素

例如。 'Thanks' 类似 my_list=["one", "Thanks"]

【讨论】:

    【解决方案2】:

    我认为你需要:

    print (df)
         Name  Stage                                Description
    0     Sri      1  Sri is one of the good singer in this two
    1              2                         Thanks for reading
    2     Ram      1      Ram is one of the good cricket player
    3  ganesh      1                                good driver
    
    #replace empty or whitespaces by previous value
    df['Name'] = df['Name'].mask(df['Name'].str.strip() == '').ffill()
    print (df)
         Name  Stage                                Description
    0     Sri      1  Sri is one of the good singer in this two
    1     Sri      2                         Thanks for reading
    2     Ram      1      Ram is one of the good cricket player
    3  ganesh      1                                good driver
    
    #get all names by condition
    my_list = ["one"]
    names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name']
    print (names)
    0    Sri
    2    Ram
    Name: Name, dtype: object
    
    #select all rows contains names
    df = df[df['Name'].isin(names)]
    print (df)
      Name  Stage                                Description
    0  Sri      1  Sri is one of the good singer in this two
    1  Sri      2                         Thanks for reading
    2  Ram      1      Ram is one of the good cricket player
    

    【讨论】:

    • 如果我们有 my_list ["Thanks"],它会给我“感谢阅读”行。但是我不想在舞台不是1时映射。有没有办法?
    • 我只想在阶段为 1 时映射 my_list 和 df["Description"]。如果我们找到匹配项,我们将获得特定描述的所有阶段。
    • 是的,我认为 df["Description"].str.contains("|".join(my_list),na=False) 需要 df["Description"].str.contains("|".join(my_list),na=False) & (df['Stage'] == 1)
    • 你想替换这个 names=df.loc[df["Description"].str.contains("|".join(my_list),na=False), 'Name '] 如果我只得到列,则没有值
    • 是的,Jezrael,它工作正常。我仔细检查。谢谢你的回答
    猜你喜欢
    • 2018-03-14
    • 1970-01-01
    • 2015-07-24
    • 1970-01-01
    • 1970-01-01
    • 2022-10-15
    • 2021-08-21
    • 2018-05-03
    相关资源
    最近更新 更多