使用字符串列表计算数据框列中单词的出现次数答案

【问题标题】：Counting the occurrence of words in a dataframe column using a list of strings使用字符串列表计算数据框列中单词的出现次数
【发布时间】：2021-12-10 21:36:58
【问题描述】：

我有一个字符串列表和一个带有文本列的数据框。在文本列中，我有几行文本。我想计算字符串列表中每个单词在文本列中出现的次数。我的目标是在数据框中添加两列；一列包含单词，另一列包含出现次数。如果有更好的解决方案，我愿意接受。学习不同的方法来实现这一点会很棒。理想情况下，我希望最后有一个数据框。

string_list = ['had', 'it', 'the']

当前数据框：

代码中的数据框：

pd.DataFrame({'title': {0: 'book1', 1: 'book2', 2: 'book3', 3: 'book4', 4: 'book5'},
 'text': {0: 'His voice had never sounded so cold',
  1: 'When she arrived home, she noticed that the curtains were closed.',
  2: 'He was terrified of small spaces and she knew',
  3: "It was time. She'd fought against it for so long",
  4: 'As he took in the view from the twentieth floor, the lights went out all over the city'},
 'had': {0: 1, 1: 5, 2: 5, 3: 2, 4: 5},
 'it': {0: 1, 1: 3, 2: 2, 3: 1, 4: 2},
 'the': {0: 1, 1: 4, 2: 5, 3: 3, 4: 3}})

尝试获取这样的数据框：

【问题讨论】：

“代码中的数据框”是您想要获取的 DF 吗？

标签： python pandas dataframe text

【解决方案1】：

查找给定模式匹配数的函数：

def find_match_count(word: str, pattern: str) -> int:
    return len(re.findall(pattern, word.lower()))

然后遍历每个字符串，并将此函数应用于'word' 列：

for col in string_list:
    df[col] = df['text'].apply(find_match_count, pattern=col)

当使用您提供的数据框（没有 had、it 和列）时：

   title                                               text  had  it  the
0  book1                His voice had never sounded so cold    1   0    0
1  book2  When she arrived home, she noticed that the cu...    0   0    1
2  book3      He was terrified of small spaces and she knew    0   0    0
3  book4   It was time. She'd fought against it for so long    0   2    0
4  book5  As he took in the view from the twentieth floo...    0   1    4

【讨论】：

【解决方案2】：

定义自定义正则表达式，extractall、join 和 melt：

regex = '|'.join(fr'(?P<{w}>\b{w}\b)' for w in string_list)

(df[['title', 'text']]
 .join(df['text'].str.extractall(regex).notna().groupby(level=0).sum())
 .fillna(0)
 .melt(id_vars=['title', 'text'], var_name='word', value_name='word count')
 )

输出：

    title                                               text word  word count
0   book1                His voice had never sounded so cold  had         1.0
1   book2  When she arrived home, she noticed that the cu...  had         0.0
2   book3      He was terrified of small spaces and she knew  had         0.0
3   book4   It was time. She'd fought against it for so long  had         0.0
4   book5  As he took in the view from the twentieth floo...  had         0.0
5   book1                His voice had never sounded so cold   it         0.0
6   book2  When she arrived home, she noticed that the cu...   it         0.0
7   book3      He was terrified of small spaces and she knew   it         0.0
8   book4   It was time. She'd fought against it for so long   it         1.0
9   book5  As he took in the view from the twentieth floo...   it         0.0
10  book1                His voice had never sounded so cold  the         0.0
11  book2  When she arrived home, she noticed that the cu...  the         1.0
12  book3      He was terrified of small spaces and she knew  the         0.0
13  book4   It was time. She'd fought against it for so long  the         0.0
14  book5  As he took in the view from the twentieth floo...  the         4.0

【讨论】：