【问题标题】:how to plot a histogram of a column from a csv file如何从 csv 文件中绘制列的直方图
【发布时间】:2020-10-24 02:20:32
【问题描述】:

the sample file looks like thisx-axis 应包含 a-z+A-Z 范围内的字母,y-axis 应从内容列绘制它们各自的频率

import pandas as pd
import numpy as np
import string
from matplotlib import pyplot as plt
plt.style.use('fivethirtyeight')

col_list = ["tweet_id","sentiment","author","content"]
df = pd.read_csv("sample.csv",usecols=col_list)
freq = (df["content"])

frequencies = {}

for sentence in freq:
    for char in sentence:
        if char in frequencies:
            frequencies[char] += 1
        else:
            frequencies[char] = 1

frequency = str(frequencies)

bins = [chr(i + ord('a')) for i in range(26)].__add__([chr(j + ord('A')) for j in range(26)])


plt.title('data')
plt.xlabel('letters')
plt.ylabel('frequencies')
plt.hist(bins,frequency,edgecolor ='black')
plt.tight_layout()

plt.show()

【问题讨论】:

  • 能否提供一些示例数据? binsfrequency 长什么样子?
  • 根据您的描述,您不需要直方图而是条形图。
  • 为数据添加了一张图片
  • @Darina 我们是否必须使用 plt.hist() 以外的其他函数来绘制条形图??
  • @MridulSetia 是的,它有一个 matplotlib 函数和一个 pandas 环绕。只需谷歌“python barplot”。

标签: python pandas numpy matplotlib histogram


【解决方案1】:

您的代码已经结构良好,但我仍然建议在xticks 上使用带有字母的plt.bar,而不是plt.hist,因为在x 轴上使用chars 似乎更容易。我评论了else,以便除了所需的字母(a-zA-Z)之外什么都不会添加。还包括一个sorted 命令,以提供让条形按字母顺序或频率计数排序的选项。

sample.csv

中使用的输入
    tweet_id  sentiment  author                                            content
0        NaN        NaN     NaN  @tiffanylue i know i was listenin to bad habit...
1        NaN        NaN     NaN  Layin n bed with a headache ughhhh...waitin on...
2        NaN        NaN     NaN                Funeral ceremony...gloomy friday...
3        NaN        NaN     NaN               wants to hang out with friends SOON!
4        NaN        NaN     NaN  @dannycastillo We want to trade with someone w...
5        NaN        NaN     NaN  Re-pinging @ghostridahl4: why didn't you go to...
6        NaN        NaN     NaN  I should be sleep, but im not! thinking about ...
...
...
# populate dictionary a-zA-Z with zeros
frequencies = {}
for i in range(26):
    frequencies[chr(i + ord('a'))] = 0
    frequencies[chr(i + ord('A'))] = 0

# iterate over each row of "content"
for row in df.loc[:,"content"]:
    for char in row:
        if char in frequencies:
            frequencies[char] += 1
        # uncomment to include numbers and symbols (!@#$...)
        # else:
        #     frequencies[char] = 1

# sort items from highest count to lowest
char_freq = sorted(frequencies.items(), key=lambda x: x[1], reverse=True)
# char_freq = sorted(frequencies.items(), key=lambda x: x, reverse=False)

plt.title('data')
plt.xlabel('letters')
plt.ylabel('frequencies')

plt.bar(range(len(char_freq)), [i[1] for i in char_freq], align='center')
plt.xticks(range(len(char_freq)), [i[0] for i in char_freq])

plt.tight_layout()

plt.show()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-07-03
    • 1970-01-01
    • 2012-02-03
    • 2021-12-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多