【问题标题】:Plotting From Counter Will Maintaining Order从柜台绘图将维持秩序
【发布时间】:2017-03-04 05:55:39
【问题描述】:

我正在尝试从我从 Wikipedia 复制的一篇文章中绘制前 50 个单词的词频。我查看了How to plot the number of times each element is in a listPython: Frequency of occurrencesUsing Counter() in Python to build histogram?,这似乎是一个很有希望的结果,直到我意识到该解决方案无法维持Counter() 的顺序。有没有办法在绘图时保留Counter() 的降序?

我用来处理数据的代码:

# Standard Library
import collections
from collections import Counter
import itertools 
import re

# Third Party Library
import matplotlib.pyplot as plt
import nltk
import numpy as np

file = '...\\NLP\\Word_Embedding\\Basketball.txt'
text = open(file, 'r').read()
text = re.sub(r'([\"\'.])([\)\[,.;])', r'\1 \2', text)

vocab = text.split()
vocab = [words.lower() for words in vocab]
print('There are a total of {} words in the corpus'.format(len(vocab)))
tokens = list(set(vocab))
print('There are {} unique words in the corpus'.format(len(tokens)))

vocab_labels, vocab_values = zip(*Counter(vocab).items())
vocab_freq = Counter(vocab)

indexes = np.arange(len(vocab_labels[:10]))
width = 1

# plt.bar(indexes, vocab_values[:10], width) # Random 10 items from list
# plt.xticks(indexes + width * 0.5, vocab_labels[:10])
# plt.show()

链接到Basketball.txt 文件

【问题讨论】:

    标签: python matplotlib plot


    【解决方案1】:

    您可以根据vocab_freqvocab_values 进行排序,并使用[::-1] 进行反向排序:

    import collections
    from collections import Counter
    import itertools
    import re
    
    # Third Party Library
    import matplotlib.pyplot as plt
    import nltk
    import numpy as np
    
    file = '.\Basketball.txt'
    text = open(file, 'r').read()
    text = re.sub(r'([\"\'.])([\)\[,.;])', r'\1 \2', text)
    
    vocab = text.split()
    vocab = [words.lower() for words in vocab]
    print('There are a total of {} words in the corpus'.format(len(vocab)))
    tokens = list(set(vocab))
    print('There are {} unique words in the corpus'.format(len(tokens)))
    
    vocab_labels, vocab_values = zip(*Counter(vocab).items())
    vocab_freq = Counter(vocab)
    
    sorted_values = sorted(vocab_values)[::-1]
    sorted_labels = [x for (y,x) in sorted(zip(vocab_values,vocab_labels))][::-1]
    indexes = np.arange(len(sorted_labels[:10]))
    width = 1
    
    plt.bar(indexes, sorted_values[:10] ) # Random 10 items from list
    plt.xticks(indexes + width * 0.5, sorted_labels[:10])
    plt.show()
    

    结果:

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-07-14
      • 2014-11-05
      • 1970-01-01
      • 2015-10-28
      • 2021-07-19
      • 2016-02-25
      • 1970-01-01
      • 2013-07-22
      相关资源
      最近更新 更多