【发布时间】:2014-04-10 17:02:07
【问题描述】:
所以我有一个我想要阅读并打印出这些信息的文件列表。它不断给我错误list index out of range。不知道出了什么问题。对于 line2,如果我添加 matches[:10] 它可以用于前 10 个文件。但我需要它来处理所有文件。检查了一些旧帖子,但仍然无法让我的代码工作。
re.findall 在我分段编写此代码之前工作过。不确定它是否不再工作。谢谢。
import re, os
topdir = r'E:\Grad\LIS\LIS590 Text mining\Part1\Part1' # Topdir has to be an object rather than a string, which means that there is no paranthesis.
matches = []
for root, dirnames, filenames in os.walk(topdir):
for filename in filenames:
if filename.endswith(('.txt','.pdf')):
matches.append(os.path.join(root, filename))
capturedorgs = []
capturedfiles = []
capturedabstracts = []
orgAwards={}
for filepath in matches:
with open (filepath,'rt') as mytext:
mytext=mytext.read()
matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
capturedorgs.append(matchOrg)
# code to capture files
matchFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext)[0]
capturedfiles.append(matchFile)
# code to capture abstracts
matchAbs=re.findall(r'Abstract\s+\:\s+(\w.+)',mytext)[0]
capturedabstracts.append(matchAbs)
# total awarded money
matchAmt=re.findall(r'Total\s+Amt\.\s+\:\s+\$(\d+)',mytext)[0]
if matchOrg not in orgAwards:
orgAwards[matchOrg]=[]
orgAwards[matchOrg].append(int(matchAmt))
for each in capturedorgs:
print(each,"\n")
for each in capturedfiles:
print(each,"\n")
for each in capturedabstracts:
print (each,"\n")
# add code to print what is in your other two lists
from collections import Counter
countOrg=Counter(capturedorgs)
print (countOrg)
for each in orgAwards:
print(each,sum(orgAwards[each]))
错误信息:
Traceback (most recent call last):
File "C:\Python32\Assignment1.py", line 17, in <module>
matchOrg=re.findall(r'NSF\s+Org\s+\:\s+(\w+)',mytext)[0]
IndexError: list index out of range
【问题讨论】:
-
for filepath in matches[]:? -
我在尝试不同的东西,忘记删除
[]。更新了我的代码。
标签: python list python-2.7