【问题标题】:error running python script for data wrangling运行 python 脚本进行数据处理时出错
【发布时间】:2021-02-20 07:23:10
【问题描述】:

我是一个新的 python 用户,我正在尝试编写一个脚本来执行一些数据整理活动。它将用于获取 .csv 文件并返回一些必要的输出。脚本如下所示:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%d/%m/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = ddfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

然后我尝试使用命令python3 hello.py 运行脚本,然后错误显示如下:

Traceback (most recent call last):
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Day'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "hello.py", line 6, in <module>
    dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Day'

请帮忙,我真的很感激。

【问题讨论】:

标签: python python-3.x pandas dataframe csv


【解决方案1】:
import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Month"],format='%b %Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

根据您的数据,此代码将起作用。

在您的数据集中存在"Month" 列而不是"Day",后者引发了KeyError

您的数据集中的时间格式为"%b %Y",因此也更改了该部分。

OP 提供的数据集链接:https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm

输出: 生成 2 个 csv 文件:

1)gas-details_day.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
...

2) gas-details_month.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
31/01/1997,3.45
28/02/1997,2.15
31/03/1997,1.89
30/04/1997,2.03
31/05/1997,2.25
30/06/1997,2.2
...

【讨论】:

  • 和上一个类似。我会将其添加为答案,以便您看到
  • 你能看到吗?
  • 修正了我的答案。有一个错字。立即查看
  • 哇,太棒了。有效。你纠正了什么错字?
  • 我删除了 "%b %Y" 中的正斜杠。您也可以在我的答案的编辑历史记录中看到这一点。
猜你喜欢
  • 1970-01-01
  • 2013-10-07
  • 2018-01-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-08-23
  • 2012-04-10
  • 1970-01-01
相关资源
最近更新 更多