从 Excel 数据读入数据框时出现 KeyError答案

【问题标题】：KeyError when reading from Excel data into dataframe从 Excel 数据读入数据框时出现 KeyError
【发布时间】：2015-07-26 01:54:39
【问题描述】：

我有一个带有两张纸的 Excel 文件，我正在尝试将它们都读入数据框中，如下面的代码所示。但是，我收到错误

KeyError: "['months_to_maturity' 'asset_id' 'orig_iss_dt' 'maturity_dt' 'pay_freq_cd'\n 'coupon' 'closing_price'] not in index"

在行中

return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

在SecondExcelFileReader() 函数中。但是，两张纸都有标题

asset_id    orig_iss_dt maturity_dt  pay_freq_cd    coupon  closing_price   months_to_maturity

我如下返回df，因为这是我想要列的顺序。

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[0])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]


def SecondExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[1])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

def mergingdataframes():
    df1 = ExcelFileReader()
    df2 = SecondExcelFileReader()
    return pd.concat([df1, df2])

编辑：这个 Excel 文件是从 Sybase Oracle SQL Developer 导出的，因此第一个工作表已经带有标题。我刚刚复制并粘贴了具有相同标题的第二张纸。另外，我只有第二张纸有问题。

表 1：

表 2：

【问题讨论】：

您没有得到第一张纸的问题？
@AnandSKumar 我对第一张纸没有意见。我必须解释一下，这个 Excel 文件是从 Sybase Oracle SQL Developer 导出的，因此第一张表已经带有标题。我只是复制并粘贴了带有标题的第二张纸。
你能显示第二张和第一张（可能是截图）吗？
首先查看输出df，然后再选择列的子集。如果看起来不错，请尝试单独选择每一列并查看是否收到 KeyError。如果是这样，它可能是一些愚蠢的东西，比如其中一个列名中的额外空格。
@user131983 你为什么要以这种方式阅读这些文件？为什么不使用pandas.read_excel？此方法包含许多用于控制解析的参数，包括 sheetname 参数。

标签： python pandas

【解决方案1】：

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    sheet_num = xls.sheet_names.index(xls.sheet_names[0])
    df = pd.read_excel('D:/USDataRECENTLY.xls',sheetname=sheet_num)
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' ,'pay_freq_cd', 'coupon', 'closing_price']]

在这种情况下，您也可以使用 sheetname = xls.sheet_names[0] 而不是 sheetname=0

看起来您的问题是您的第二张工作表名称是“Sheet1”并且基于 ExcelParser 文档“Sheet1”表示第一张工作表，但在您的情况下它是第二张工作表。 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html

更好的实现是：

def mergingdataframes():
    mergedf= pd.concat(pd.read_excel('D:/USDataRECENTLY.xls', sheetname=[0,1]))
    mergedf.index = mergedf.index.droplevel(0)# need this to drop dict keys
    return mergedf

【讨论】：