运行 python 脚本进行数据处理时出错答案

【问题标题】：error running python script for data wrangling运行 python 脚本进行数据处理时出错
【发布时间】：2021-02-20 07:23:10
【问题描述】：

我是一个新的 python 用户，我正在尝试编写一个脚本来执行一些数据整理活动。它将用于获取 .csv 文件并返回一些必要的输出。脚本如下所示：

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%d/%m/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = ddfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

然后我尝试使用命令python3 hello.py 运行脚本，然后错误显示如下：

Traceback (most recent call last):
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Day'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "hello.py", line 6, in <module>
    dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/samuel/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'Day'

请帮忙，我真的很感激。

【问题讨论】：

我必须没有列名“Day”。
重新检查您的数据框，因为没有列名“Day”......也许它的列名是“day”
我检查了这里的数据集：github.com/Levantado/henry_hub_natural_gas_spot_price-/blob/…，你的数据集是一样的吗？检查数据集中的第 5 行，如果它有一个列名 Day
@RishabhKumar 数据集不一样，这里是我的数据集eia.gov/dnav/ng/hist/rngwhhdm.htm

标签： python python-3.x pandas dataframe csv

【解决方案1】：

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Month"],format='%b %Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%d/%m/%Y"))
df.to_csv('gas-details_month.csv', index=True)

根据您的数据，此代码将起作用。

在您的数据集中存在"Month" 列而不是"Day"，后者引发了KeyError。

您的数据集中的时间格式为"%b %Y"，因此也更改了该部分。

OP 提供的数据集链接：https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm

输出：生成 2 个 csv 文件：

1)gas-details_day.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
...

2) gas-details_month.csv

Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
31/01/1997,3.45
28/02/1997,2.15
31/03/1997,1.89
30/04/1997,2.03
31/05/1997,2.25
30/06/1997,2.2
...

【讨论】：

和上一个类似。我会将其添加为答案，以便您看到
你能看到吗？
修正了我的答案。有一个错字。立即查看
哇，太棒了。有效。你纠正了什么错字？
我删除了 "%b %Y" 中的正斜杠。您也可以在我的答案的编辑历史记录中看到这一点。