通过 Pandas 将 Excel 字段导入 Python 时遇到问题 - 索引越界错误答案

【问题标题】：Trouble importing Excel fields into Python via Pandas - index out of bounds error通过 Pandas 将 Excel 字段导入 Python 时遇到问题 - 索引越界错误
【发布时间】：2020-02-21 03:20:09
【问题描述】：

我不确定发生了什么，但我的代码今天可以运行，但不是不会。我有一个要单独导入并放入列表的项目的 Excel 电子表格。但是，我收到“IndexError：index 8 is out of bounds for axis 0 with size 8”错误，谷歌搜索并没有为我解决这个问题。任何帮助表示赞赏。我的 Excel 表中有以下字段：id、funding_end、keywords、pi、summaryurl、htmlabstract、abstract、project_num、title。不知道我错过了什么......

import pandas as pd

dataset = pd.read_excel('new_ahrq_projects_current.xlsx',encoding="ISO-8859-1")
df = pd.DataFrame(dataset)
cols = [0,1,2,3,4,5,6,7,8]
df = df[df.columns[cols]]

tt = df['funding_end'] = df['funding_end'].astype(str)
tt = df.funding_end.tolist()
for t in tt:
   allenddates.append(t)

bb = df['keywords'] = df['keywords'].astype(str)
bb = df.keywords.tolist()
for b in bb:
   allkeywords.append(b)

uu = df['pi'] = df['pi'].astype(str)
uu = df.pi.tolist()
for u in uu:
   allpis.append(u)

vv = df['summaryurl'] = df['summaryurl'].astype(str)
vv = df.summaryurl.tolist()
for v in vv:
   allsummaryurls.append(v)

ww = df['htmlabstract'] = df['htmlabstract'].astype(str)
ww = df.htmlabstract.tolist()
for w in ww:
   allhtmlabstracts.append(w) 

xx = df['abstract'] = df['abstract'].astype(str)
xx = df.abstract.tolist()
for x in xx:
   allabstracts.append(x) 

yy = df['project_num'] = df['project_num'].astype(str)
yy = df.project_num.tolist()
for y in yy:
   allprojectnums.append(y)    

zz = df['title'] = df['title'].astype(str)
zz = df.title.tolist()

for z in zz:
   alltitles.append(z)

【问题讨论】：

标签： python excel pandas numpy text

【解决方案1】：

"IndexError: 索引 8 超出轴 0 的范围，大小为 8"

cols = [0,1,2,3,4,5,6,7,8]

应该是cols = [0,1,2,3,4,5,6,7]。

我认为你有 8 列，但你的 col 有 9 col 索引。

【讨论】：

我只是仔细检查了一下，不幸的是我有 9 列。

【解决方案2】：

IndexError: index out of bounds 表示您试图插入或访问 something 超出其限制或范围。

每次，当您使用 Pandas 加载这些文件（例如 test.xlx、test.csv 或 test.xlsx 文件）时，例如：

data_set = pd.read_excel('file_example_XLS_10.xls', encoding="ISO-8859-1")

每个人都最好找到一个 DataFrame 的列长度，这将有助于您在处理 大型 Data_Sets 时继续前进。例如

import pandas as pd

data_set = pd.read_excel('file_example_XLS_10.xls', encoding="ISO-8859-1") 
data_frames = pd.DataFrame(data_set)

print("Length of Columns:", len(data_frames.columns))

这将为您提供 Excel 电子表格的确切列数。然后您可以相应地指定数据框：

列长：8

cols = [0, 1, 2, 3, 4, 5, 6, 7]

【讨论】：

【解决方案3】：

我同意@Bill CX 的观点，听起来您正在尝试访问一个不存在的列。虽然我无法重现您的错误，但我有一些想法可以帮助您继续前进。

首先，仔细检查数据框的形状：

import pandas as pd

dataset = pd.read_excel('new_ahrq_projects_current.xlsx',encoding="ISO-8859-1")
df = pd.DataFrame(dataset)
print(df.shape) # print shape of data read in to python

输出应该是

(X, 9) # "X" is the number of rows

如果数据框有 8 列，则 df.shape 将是 (X, 8)。这可能就是您收到错误的原因。

对您的另一项检查是打印出数据框的前几行。

print(df.head)

这将让您仔细检查您是否以正确的形式读取了数据。我不确定，但您的 .xlsx 文件可能有 9 列，但 pandas 只读取其中的 8 列。

【讨论】：