【发布时间】:2016-03-11 17:46:31
【问题描述】:
我在从导入的 csv 文件生成基本分布直方图时遇到问题。该代码适用于来自另一个 csv 的一组数据,但不适用于我感兴趣的数据,这基本上是相同的。这是我尝试过的代码:
import pandas as pd
import numpy as np
import matplotlib as plt
data = pd.read_csv("idcases.csv")
data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")]
data2 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Sonoma")]
fig = plt.pyplot.figure()
ax = fig.add_subplot(111)
ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.pyplot.xlabel('Population')
plt.pyplot.ylabel('Count of Population')
plt.pyplot.show()
产量:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-35-63303aa9d8a5> in <module>()
1 fig = plt.pyplot.figure()
2 ax = fig.add_subplot(111)
----> 3 ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
4 plt.pyplot.xlabel('Count')
5 plt.pyplot.ylabel('Count of Population')
C:\Program Files (x86)\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5602 # Massage 'x' for processing.
5603 # NOTE: Be sure any changes here is also done below to 'weights'
-> 5604 if isinstance(x, np.ndarray) or not iterable(x[0]):
5605 # TODO: support masked arrays;
5606 x = np.asarray(x)
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
549 def __getitem__(self, key):
550 try:
--> 551 result = self.index.get_value(self, key)
552
553 if not np.isscalar(result):
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1721
1722 try:
-> 1723 return self._engine.get_value(s, k)
1724 except KeyError as e1:
1725 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0L
我做错了什么?这是我正在阅读的数据的一部分。该代码不适用于任何字段,包括“计数”或“速率”
Disease County Year Sex Count Population Rate CI.lower \
882 Amebiasis Marin 2001 Total 14 247731 5.651 3.090
883 Amebiasis Marin 2001 Female 0 125414 0.000 0.000
884 Amebiasis Marin 2001 Male 0 122317 0.000 0.000
885 Amebiasis Marin 2002 Total 7 247382 2.830 1.138
886 Amebiasis Marin 2002 Female 0 125308 0.000 0.000
887 Amebiasis Marin 2002 Male 0 122074 0.000 0.000
888 Amebiasis Marin 2003 Total 9 247280 3.640 1.664
889 Amebiasis Marin 2003 Female 0 125259 0.000 0.000
890 Amebiasis Marin 2003 Male 0 122021 0.000 0.000
【问题讨论】:
-
这是熊猫问题。请出示
data1的内容。 -
您粘贴的数据似乎是制表符分隔的(或者您在粘贴后进行了格式化)。确保所有 csv 文件具有相同的分隔符,并将其作为参数提供给 read_csv 函数
-
@MikeMüller,数据1的内容在最后。
-
@hitzg,我在粘贴后进行了格式化以便更好地查看。如何判断使用了哪个分隔符? csv 文件不都是逗号吗?
标签: python csv pandas matplotlib histogram