【问题标题】:How do I iterate through a DataFrame column to count the number of occurrences of a substring within a string?如何遍历 DataFrame 列以计算字符串中子字符串的出现次数?
【发布时间】:2021-10-17 12:17:08
【问题描述】:

我有一个抓取推文信息的 pandas 数据框。有点像这样:

created_at full_tweet
2020-20-22 " All square in Austria. \n\n???? #UEL "
2020-10-22 " We're back underway in the @EuropaLeague ????\n\n... "
2020-10-22 " We're back underway in the @EuropaLeague ????\n\n... "
2020-10-22 " DAVID LEVELS IT UP! \n\n???????? 1-1 ???? (70) \n\n???? # "

我还有第二个数据框,每个表情符号都带有 UTF-8 文本,如下所示:

emoji meaning
😄 A_smiley1
😃 A_smiley2
😀 A_smiley3
😊 A_smiley4
â˜ºï¸ A_blush

我是 Python 的一个相对较新的用户,不知道该怎么做,但我想扫描推文数据框“full_tweet”列中的每一行并计算每个表情符号文本的出现次数,从而得出最终计数柱子。到目前为止,这是我尝试过的:

for ind in emojis:
count = str(clubs_df.full_tweet[ind]).count(emojis.emoji)
clubs_df['emoji_count'] = clubs_df.emoji_count[ind] + count

这会引发一个简单列出“表情符号”的 KeyError。有人有什么建议让我通过这个数据框的行来使用吗?

【问题讨论】:

标签: python pandas string dataframe count


【解决方案1】:

我不知道这是否是你正在寻找的,但只是我从你的帖子中看到并理解的......

数据帧:

print(df)
   created_at                                         full_tweet
0  2020-20-22             " All square in Austria. \n\n? #UEL ".
1  2020-10-22  " We're back underway in the @EuropaLeague ?\n...
2  2020-10-22  " We're back undway in the @EuropaLeague ?\n\n...
3  2020-10-22  " DAVID LEVELS IT UP! \n\n?? 1-1 ? (70) \n\n? # "

试试 Bleow:

emoji_count = df['full_tweet'].apply(lambda x : emojis.count(str(x)))
pd.concat([df, emoji_count.apply(pd.Series)],1)
   created_at                                         full_tweet  0
0  2020-20-22             " All square in Austria. \n\n? #UEL ".  1
1  2020-10-22  " We're back underway in the @EuropaLeague ?\n...  1
2  2020-10-22  " We're back undway in the @EuropaLeague ?\n\n...  1
3  2020-10-22  " DAVID LEVELS IT UP! \n\n?? 1-1 ? (70) \n\n? # "  3

【讨论】:

  • 这绝对是我想要做的,但是当我输入此代码时,我得到一个 ValueError -“ValueError: No axis named All square in Austria. ? #UEL for object type DataFrame”。
猜你喜欢
  • 1970-01-01
  • 2020-02-21
  • 2012-02-12
  • 1970-01-01
  • 1970-01-01
  • 2012-10-03
  • 2011-07-04
相关资源
最近更新 更多