【问题标题】:How Do I Create New Pandas Column Based On Word In A List如何根据列表中的单词创建新的 Pandas 列
【发布时间】:2021-04-26 14:51:41
【问题描述】:

所以我有一个列表和一个数据框。我想从列表中取出单词并将其作为列的标题。如果这个词是它添加到新创建的列的行。如果它不在行中,请留空或不适用。 我应该使用 iloc 吗?

import pandas as pd
wordlist = [['this is sentence 1'],['this is sentence 2'],['this is not a sentence'],['ok who is this']]
query=['is','not']
df = pd.DataFrame(wordlist, columns = ['Name'])

for word in query:
    if word in df['Name']:
        df[word] = word
df


Output

Name                       is     not  <<column titles
0   this is sentence 1     is     NA
1   this is sentence 2     is     NA
2   this is not a sentence is     not
3   ok who is this         is     NA

【问题讨论】:

    标签: python-3.x pandas dataframe if-statement


    【解决方案1】:

    创建一个搜索模式,然后使用Series.str.extractall 来获取单词。然后将每个唯一的单词变成一个 dummy 并聚合回原始行索引,并连接回原始 DataFrame。

    import pandas as pd
    
    pat = f'({"|".join(query)})'
    #(is|not)
    
    df_dummies = pd.get_dummies(df['Name'].str.extractall(pat)[0]).max(level=0)
    
    df = pd.concat([df, df_dummies], axis=1)
    
    #                     Name  is  not
    #0      this is sentence 1   1    0
    #1      this is sentence 2   1    0
    #2  this is not a sentence   1    1
    #3          ok who is this   1    0
    

    如果您真的希望重复单词而不是 dummies,那么我们可以将虚拟 DataFrame 乘以列。

    df_dummies = pd.get_dummies(df['Name'].str.extractall(pat)[0]).max(level=0)
    df_dummies = df_dummies.mul(df_dummies.columns).replace('', np.NaN)
    df = pd.concat([df, df_dummies], axis=1)
    
    #                     Name  is  not
    #0      this is sentence 1  is  NaN
    #1      this is sentence 2  is  NaN
    #2  this is not a sentence  is  not
    #3          ok who is this  is  NaN
    

    最后提醒一下,单词'this' 本身包含匹配'is',因此上面的基本模式匹配单独的单词'is''this' 的最后两个字符。如果要排除属于较长单词的匹配项,请修改搜索模式以包含查询中每个元素周围的单词边界:

    pat = '(\\b' + '\\b|\\b'.join(query) + '\\b)'
    #'(\\bis\\b|\\bnot\\b)'
    

    【讨论】:

    • 谢谢。重复的话是为了让我融化,但假人让我可以做其他我以前没有想过的事情。
    猜你喜欢
    • 2022-10-15
    • 2020-01-04
    • 1970-01-01
    • 2020-04-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多