如何遍历 DataFrame 列以计算字符串中子字符串的出现次数？答案

【问题标题】：How do I iterate through a DataFrame column to count the number of occurrences of a substring within a string?如何遍历 DataFrame 列以计算字符串中子字符串的出现次数？
【发布时间】：2021-10-17 12:17:08
【问题描述】：

我有一个抓取推文信息的 pandas 数据框。有点像这样：

created_at	full_tweet
2020-20-22	" All square in Austria. \n\n???? #UEL "
2020-10-22	" We're back underway in the @EuropaLeague ????\n\n... "
2020-10-22	" We're back underway in the @EuropaLeague ????\n\n... "
2020-10-22	" DAVID LEVELS IT UP! \n\n???????? 1-1 ???? (70) \n\n???? # "

我还有第二个数据框，每个表情符号都带有 UTF-8 文本，如下所示：

emoji	meaning
ðŸ˜„	A_smiley1
ðŸ˜ƒ	A_smiley2
ðŸ˜€	A_smiley3
ðŸ˜Š	A_smiley4
â˜ºï¸	A_blush

我是 Python 的一个相对较新的用户，不知道该怎么做，但我想扫描推文数据框“full_tweet”列中的每一行并计算每个表情符号文本的出现次数，从而得出最终计数柱子。到目前为止，这是我尝试过的：

for ind in emojis:
count = str(clubs_df.full_tweet[ind]).count(emojis.emoji)
clubs_df['emoji_count'] = clubs_df.emoji_count[ind] + count

这会引发一个简单列出“表情符号”的 KeyError。有人有什么建议让我通过这个数据框的行来使用吗？

【问题讨论】：

欢迎来到本站。请参阅intro tour 中的on topic 和how to ask。 “告诉我如何解决这个编码问题”is off-topic for Stack Overflow。您必须make an honest attempt at the solution，然后询问有关您的实施的具体问题。
尝试使用pandas.Series.str.contains 或pandas.Series.str.findall
@s-ellingso，预期的输出是什么？

标签： python pandas string dataframe count

【解决方案1】：

我不知道这是否是你正在寻找的，但只是我从你的帖子中看到并理解的......

数据帧：

print(df)
   created_at                                         full_tweet
0  2020-20-22             " All square in Austria. \n\n? #UEL ".
1  2020-10-22  " We're back underway in the @EuropaLeague ?\n...
2  2020-10-22  " We're back undway in the @EuropaLeague ?\n\n...
3  2020-10-22  " DAVID LEVELS IT UP! \n\n?? 1-1 ? (70) \n\n? # "

试试 Bleow：

emoji_count = df['full_tweet'].apply(lambda x : emojis.count(str(x)))
pd.concat([df, emoji_count.apply(pd.Series)],1)
   created_at                                         full_tweet  0
0  2020-20-22             " All square in Austria. \n\n? #UEL ".  1
1  2020-10-22  " We're back underway in the @EuropaLeague ?\n...  1
2  2020-10-22  " We're back undway in the @EuropaLeague ?\n\n...  1
3  2020-10-22  " DAVID LEVELS IT UP! \n\n?? 1-1 ? (70) \n\n? # "  3

【讨论】：

这绝对是我想要做的，但是当我输入此代码时，我得到一个 ValueError -“ValueError: No axis named All square in Austria. ? #UEL for object type DataFrame”。