【问题标题】:Iterating through Dataframe and counting the words for specific values遍历 Dataframe 并计算特定值的单词
【发布时间】:2022-01-23 14:16:42
【问题描述】:

我有一个包含 2 列的数据框:[短语] [类别] 所以每个短语都有一个特定的类别。 我试图做的是遍历数据框并计算特定类别的所有单词。因此,例如,可以说类别是新闻。我想找到所有带有新闻类别的短语并计算总共使用的单词。

我希望有人可以帮助我。我正在使用 Python 和 Pandas。

谢谢

【问题讨论】:

  • 请提供一个最小可重复的例子。问题不清楚

标签: python pandas dataframe loops count


【解决方案1】:

你可以这样做:

import pandas as pd
df = pd.DataFrame({
    "Phrases":["Hello, how are you!","I am Good!","Do you want to come over?"],
    "Category":["Question","Answer","Question"]
})
l = {}
for phrase,category in zip(df["Phrases"],df["Category"]):
    try:
        l[category].append(phrase)
    except:
        l[category] = [phrase]
print(l)

出来:

{'Question': ['Hello, how are you!', 'Do you want to come over?'], 'Answer': ['I am Good!']}

【讨论】:

    【解决方案2】:

    我相信您可以使用groupby 函数。例如:

    out = df.groupby('category').count()
    

    举个例子:

    import pandas as pd
    df = pd.DataFrame({'phrase': ["basketball", "football", "tennis", "bread", "honey", "nbc", "cnn", "fox", "bloomberg"],
                      'category': ["sports", "sports", "sports", "food", "food", "news", "news", "news", "news"]})
    
    
    out = df.groupby('category').count()
    
    print(out)
    

    输出:

              phrase
    category        
    food           2
    news           4
    sports         3
    

    【讨论】:

      猜你喜欢
      • 2018-10-16
      • 2021-02-01
      • 1970-01-01
      • 2021-12-01
      • 2019-02-21
      • 2017-04-28
      • 1970-01-01
      • 1970-01-01
      • 2019-08-17
      相关资源
      最近更新 更多