为什么“列表索引超出范围”错误？答案

【问题标题】：Why the "List index out of range" error?为什么“列表索引超出范围”错误？
【发布时间】：2014-04-10 17:02:07
【问题描述】：

所以我有一个我想要阅读并打印出这些信息的文件列表。它不断给我错误list index out of range。不知道出了什么问题。对于 line2，如果我添加 matches[:10] 它可以用于前 10 个文件。但我需要它来处理所有文件。检查了一些旧帖子，但仍然无法让我的代码工作。

re.findall 在我分段编写此代码之前工作过。不确定它是否不再工作。谢谢。

import re, os
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis.
matches = []
for root, dirnames, filenames in os.walk(topdir):
    for filename in filenames:
        if filename.endswith(('.txt','.pdf')):
            matches.append(os.path.join(root, filename))

capturedorgs = []
capturedfiles = []
capturedabstracts = []
orgAwards={}
for filepath in matches:
with open (filepath,'rt') as mytext:
    mytext=mytext.read()

    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
            capturedorgs.append(matchOrg)

    # code to capture files
    matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0]
    capturedfiles.append(matchFile)

    # code to capture abstracts
    matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)',mytext)[0]
    capturedabstracts.append(matchAbs)

    # total awarded money
    matchAmt=re.findall(r'Total\s+Amt\.\s+\:\s+\$(\d+)',mytext)[0]

    if matchOrg not in orgAwards:
        orgAwards[matchOrg]=[]
    orgAwards[matchOrg].append(int(matchAmt))

for each in capturedorgs:
    print(each,"\n")
for each in capturedfiles:
    print(each,"\n")
for each in capturedabstracts:
    print (each,"\n")

# add code to print what is in your other two lists
from collections import Counter
countOrg=Counter(capturedorgs)
print (countOrg)

for each in orgAwards:
print(each,sum(orgAwards[each]))

错误信息：

Traceback (most recent call last):
  File "C:\Python32\Assignment1.py", line 17, in <module>
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
IndexError: list index out of range

【问题讨论】：

for filepath in matches[]:?
我在尝试不同的东西，忘记删除[]。更新了我的代码。

标签： python list python-2.7

【解决方案1】：

如果findall 没有找到匹配项，它将返回一个空列表[]；当您尝试从此空列表中获取第一个项目时会发生错误，从而导致您的异常：

>>> import re
>>> i = 'hello'
>>> re.findall('abc', i)
[]
>>> re.findall('abc', i)[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

为确保您的代码在找不到匹配项时不会停止，您需要捕获引发的异常：

try:
    matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
    capturedorgs.append(matchOrg)
except IndexError:
    print('No organization match for {}'.format(filepath))

您必须为每个 re.findall 语句执行此操作。

【讨论】：

它确实适用于我的前 10 个和 100 个文件。我应该如何修复代码？

【解决方案2】：

问题出在这里：

matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]

显然，您有一个文件中根本没有此文件。所以当你尊重项目[0]时，它就不存在了。

你需要处理这个案子。

一种方法是在没有找到的情况下根本不包括它：

for filepath in matches:
    with open (filepath,'rt') as mytext:
        mytext=mytext.read()

        matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)
        if len(matchOrg) > 0:
            capturedorgs.append(matchOrg[0])

此外，如果文件中可能有多个文件，并且您想捕获所有文件，您可能需要使用 extend(matchOrg)。

【讨论】：

抱歉，我复制了您的错字回复：matches[]。现已修复。
如果您正在运行 OP 中的代码...您仍然需要将[0] 从matchOrg= 行的末尾删除...
如果它不是你正在运行的......如果你可以发布回溯，它会有所帮助。
好的，我发布的只是我代码的一部分。当我尝试按照您的建议进行更改时，它搞砸了其他事情。现在我在没有采纳你的建议的情况下发布了整篇文章，请让我知道该怎么做。非常感谢！
这是同样的问题 - 每次，你都这样做......re.findall(matcher, text)*[0]* - 你需要删除结尾[0]，检查长度，并且只在之后添加（或扩展）您已经验证那里实际上 is 是一个 0 元素...