【问题标题】:How to create a wordcloud according to frequencies in a pandas dataframe如何根据熊猫数据框中的频率创建词云
【发布时间】:2019-09-06 17:02:16
【问题描述】:

我必须绘制一个 wordcloud。 “tweets.csv”是一个 Pandas 数据框,其中有一列名为“text”。绘制的图表并非基于最常见的词,艰难。单词大小如何与它们在数据框中的频率相关联?

text = df_final.text.values
wordcloud = WordCloud(
    #mask = logomask,
    max_words = 1000,
    width = 600,
    height = 400,
    #max_font_size = 1000,
    #min_font_size = 100,
    normalize_plurals = True,
    #scale = 5,
    #relative_scaling = 0,
    background_color = 'black',
    stopwords = STOPWORDS.union(stopwords)
).generate(str(text))
fig = plt.figure(
    figsize = (50,40),
    facecolor = 'k',
    edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

我的数据框如下所示:

0   RT @Pontifex_pt: Temos que descobrir as riquezezas ...
1   RT @Pontifex_pt: Todos estamos em viagem rumo ...
2   RT @Pontifex_pt: Unamos as forças, em todos ...
3   RT @GeneralMourao: #Segurançapública, preocupa ...
4   RT @FIFAcom: The Brasileirao U-17 final provided ...

【问题讨论】:

    标签: python pandas dataframe frequency word-cloud


    【解决方案1】:

    设置示例数据帧:

    import pandas as pd
    
    df = pd.DataFrame({'word': ['how', 'are', 'you', 'doing', 'this', 'afternoon'],
                       'count': [7, 10, 4, 1, 20, 100]}) 
    
            word  count
    0        how      7
    1        are     10
    2        you      4
    3      doing      1
    4       this     20
    5  afternoon    100
    

    word & count 列转换为dict

    • WordCloud().generate_from_frequencies() 需要 dict
    • 使用以下方法之一
    # method 1: convert to dict 
    data = dict(zip(df['word'].tolist(), df['count'].tolist()))
    
    # method 2: convert to dict
    data = df.set_index('word').to_dict()['count']
    
    print(data)
    
    [out]: {'how': 7, 'are': 10, 'you': 4, 'doing': 1, 'this': 20, 'afternoon': 100}                                                                          
    

    词云:

    from wordcloud import WordCloud
    
    wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
    

    情节

    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 10))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.show()
    

    使用图像掩码:

    twitter_mask = np.array(Image.open('twitter.png'))
    wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)
    
    plt.figure(figsize=(10, 10))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis("off")
    plt.figure()
    plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
    plt.axis("off")
    plt.show()
    

    【讨论】:

      猜你喜欢
      • 2018-03-28
      • 1970-01-01
      • 2018-03-03
      • 1970-01-01
      • 2018-12-19
      • 2019-03-27
      • 1970-01-01
      • 2021-06-02
      • 1970-01-01
      相关资源
      最近更新 更多