【问题标题】:How to get the top "n" most frequently used words from list?如何从列表中获取前“n”个最常用的单词?
【发布时间】:2021-01-17 14:56:18
【问题描述】:

我有两个列表。每个列表都包含单词。有些词对两个列表都是通用的,有些则不是。我只想输出 20 个最常用的词,但我的代码显示了所有常用词。我想将范围限制为 20。我不允许使用 COUNTER。

def countwords(lst):
    dct = {}
    for word in lst:
        dct[word] = dct.get(word, 0) + 1
    return dct


count1 = countwords(finallist1)
count2 = countwords(finallist2)

words1 = set(count1.keys())
words2 = set(count2.keys())

common_words = words1.intersection(words2)
for i,w in enumerate (common_words,1):
    print(f"{i}\t{w}\t{count1[w]}\t{count2[w]}\t{count1[w] + count2[w]}")

预期输出:

common   f1 f2 sum 
1 program 5 10 15 
2 python  2  4  6 
.
.
until 20

【问题讨论】:

    标签: python python-3.x list


    【解决方案1】:

    您可以使用.most_common()collections.Counter 来实现:

    >>> from collections import Counter
    >>> word_list = ["one", "two", "three", "four", "two", "three", "four", "three", "four", "four"]
    
    >>> Counter(word_list).most_common(2)
    [('four', 4), ('three', 3)]
    

    来自Counter().most_common() documentation

    返回“n”个最常见元素的列表及其从最常见到最少的计数。如果“n”被省略或没有,most_common() 返回计数器中的所有元素。具有相同计数的元素按最先遇到的顺序排序


    这是一个替代方法,可以实现相同的不导入任何模块

    # Step 1: Create Counter dictionary holding frequency. 
    #         Similar to: `collections.Counter()` 
    my_counter = {}
    for word in word_list:
        my_counter[word] = my_counter.get(word, 0) + 1
    
    # where `my_counter` will hold:
    # {'four': 4, 'three': 3, 'two': 2, 'one': 1}
    #-------------
    
    # Step 2: Get sorted list holding word & frequency in descending order.
    #         Similar to: `Counter.most_common()`
    sorted_frequency = sorted(my_counter.iteritems(), key=lambda x: x[1], reverse=True)
    
    # where `sorted_frequency` will hold:
    # [('four', 4), ('three', 3), ('two', 2), ('one', 1)]
    #-------------
    
    # Step 3: Get top two words by slicing the ordered list from Step 2.
    #         Similar to: `.most_common(2)`
    top_two = sorted_frequency[:2]
    
    # where `top_two` will hold:
    # [('four', 4), ('three', 3)]
    

    请参考上面代码sn-p中的cmets进行分步说明。

    【讨论】:

    • 我不允许使用 Counter.Sorry.
    • @elomelo 检查编辑。另一个没有导入任何模块
    • @elomelo 如果此答案对您有所帮助,您可以通过单击左侧的向上箭头来投票。它将帮助我额外获得 10 点声望。您还可以通过单击答案左侧的勾号图标将答案标记为已接受(最佳答案)
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-01-21
    • 2011-04-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-11-18
    相关资源
    最近更新 更多