【问题标题】:KeyError when trying to plot or histogram pandas data in matplotlib尝试在 matplotlib 中绘制或直方图熊猫数据时出现 KeyError
【发布时间】:2016-03-11 17:46:31
【问题描述】:

我在从导入的 csv 文件生成基本分布直方图时遇到问题。该代码适用于来自另一个 csv 的一组数据,但不适用于我感兴趣的数据,这基本上是相同的。这是我尝试过的代码:

import pandas as pd
import numpy as np
import matplotlib as plt
data = pd.read_csv("idcases.csv")
data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")]
data2 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Sonoma")]

fig = plt.pyplot.figure()
ax = fig.add_subplot(111)
ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.pyplot.xlabel('Population')
plt.pyplot.ylabel('Count of Population')
plt.pyplot.show()

产量:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-35-63303aa9d8a5> in <module>()
      1 fig = plt.pyplot.figure()
      2 ax = fig.add_subplot(111)
----> 3 ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
  4 plt.pyplot.xlabel('Count')
  5 plt.pyplot.ylabel('Count of Population')

C:\Program Files (x86)\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5602         # Massage 'x' for processing.
   5603         # NOTE: Be sure any changes here is also done below to 'weights'
-> 5604         if isinstance(x, np.ndarray) or not iterable(x[0]):
   5605             # TODO: support masked arrays;
   5606             x = np.asarray(x)

C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    549     def __getitem__(self, key):
    550         try:
--> 551             result = self.index.get_value(self, key)
    552 
    553             if not np.isscalar(result):

C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
   1721 
   1722         try:
-> 1723             return self._engine.get_value(s, k)
   1724         except KeyError as e1:
   1725             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()

pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()

pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()

pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()

KeyError: 0L

我做错了什么?这是我正在阅读的数据的一部分。该代码不适用于任何字段,包括“计数”或“速率”

       Disease County  Year     Sex  Count  Population   Rate  CI.lower  \
882  Amebiasis  Marin  2001   Total     14      247731  5.651     3.090   
883  Amebiasis  Marin  2001  Female      0      125414  0.000     0.000   
884  Amebiasis  Marin  2001    Male      0      122317  0.000     0.000   
885  Amebiasis  Marin  2002   Total      7      247382  2.830     1.138   
886  Amebiasis  Marin  2002  Female      0      125308  0.000     0.000   
887  Amebiasis  Marin  2002    Male      0      122074  0.000     0.000   
888  Amebiasis  Marin  2003   Total      9      247280  3.640     1.664   
889  Amebiasis  Marin  2003  Female      0      125259  0.000     0.000   
890  Amebiasis  Marin  2003    Male      0      122021  0.000     0.000   

【问题讨论】:

  • 这是熊猫问题。请出示data1的内容。
  • 您粘贴的数据似乎是制表符分隔的(或者您在粘贴后进行了格式化)。确保所有 csv 文件具有相同的分隔符,并将其作为参数提供给 read_csv 函数
  • @MikeMüller,数据1的内容在最后。
  • @hitzg,我在粘贴后进行了格式化以便更好地查看。如何判断使用了哪个分隔符? csv 文件不都是逗号吗?

标签: python csv pandas matplotlib histogram


【解决方案1】:

在从matploblib-v1.4.3 升级到matplotlib-v1.5.0 时,我注意到pandas.Series 的绘图停止工作,例如:

ax.plot_date(df['date'], df['raw'], '.-', label='raw')

将导致KeyError: 0 异常。

快速解决方案:

您需要将numpy.ndarray 而不是pandas.Series 传递给plot_date 函数:

ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')


更多细节:

让我们看一下异常的完整回溯:

# ... PREVIOUS TRACEBACK MESSAGES OMITTED FOR BREVITY ...

C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\matplotlib\dates.py in default_units(x, axis)
   1562 
   1563         try:
-> 1564             x = x[0]
   1565         except (TypeError, IndexError):
   1566             pass

C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    555     def __getitem__(self, key):
    556         try:
--> 557             result = self.index.get_value(self, key)
    558 
    559             if not np.isscalar(result):

C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
   1788 
   1789         try:
-> 1790             return self._engine.get_value(s, k)
   1791         except KeyError as e1:
   1792             if len(self) > 0 and self.inferred_type in ['integer','boolean']:

pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()

pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()

pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()

pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()

KeyError: 0

请注意,当 matploblib 尝试执行 x=x[0] 时,该错误就出现了。如果您的 pandas 系列未使用从零开始的整数进行索引,这将失败,因为这将查找索引值为 0 的项目,而不是 pandas.Series0th 元素。

为了解决这个问题,我们需要从pandas.Series中的数据中获取一个numpy.ndarray,然后将其用于绘图:

ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')

【讨论】:

    【解决方案2】:

    给我的情节:

    import io
    import matplotlib.pyplot as plt
    
    
    s = """       Disease County  Year     Sex  Count  Population   Rate  CI.lower
     Amebiasis  Marin  2001   Total     14      247731  5.651     3.090   
     Amebiasis  Marin  2001  Female      0      125414  0.000     0.000   
    Amebiasis  Marin  2001    Male      0      122317  0.000     0.000   
    Amebiasis  Marin  2002   Total      7      247382  2.830     1.138   
    Amebiasis  Marin  2002  Female      0      125308  0.000     0.000   
     Amebiasis  Marin  2002    Male      0      122074  0.000     0.000   
    Amebiasis  Marin  2003   Total      9      247280  3.640     1.664   
    Amebiasis  Marin  2003  Female      0      125259  0.000     0.000   
     Amebiasis  Marin  2003    Male      0      122021  0.000     0.000  """
    fobj = io.StringIO(s)
    data1 = pd.read_csv(fobj, delim_whitespace=True)
    plt.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
    plt.xlabel('Population')
    plt.ylabel('Count of Population')
    plt.show()
    

    【讨论】:

      猜你喜欢
      • 2018-09-08
      • 2013-07-24
      • 2023-03-12
      • 2018-07-05
      • 1970-01-01
      • 2018-02-05
      • 2012-09-06
      • 2021-09-17
      • 1970-01-01
      相关资源
      最近更新 更多