【发布时间】:2018-08-07 18:59:57
【问题描述】:
你好 stackoverflow 社区!多年来,我一直使用这个社区来完成工作、学校和个人探索的小型一次性项目;然而,这是我发布的第一个问题......所以要小心;)
我正在尝试从目录和所有子目录中读取每个文件,然后使用 Python 将结果累积到一个字典中。现在脚本(见下文)正在根据需要读取所有文件,但每个文件的结果都是单独的。我正在寻求帮助以累积成一个。
代码
import re
import os
import sys
import os.path
import fnmatch
import collections
def search( file ):
if os.path.isdir(path) == True:
for root, dirs, files in os.walk(path):
for file in files:
# words = re.findall('\w+', open(file).read().lower())
words = re.findall('\w+', open(os.path.join(root, file)).read().lower())
ignore = ['the','a','if','in','it','of','or','on','and','to']
counter=collections.Counter(x for x in words if x not in ignore)
print(counter.most_common(10))
else:
words = re.findall('\w+', open(path).read().lower())
ignore = ['the','a','if','in','it','of','or','on','and','to']
counter=collections.Counter(x for x in words if x not in ignore)
print(counter.most_common(10))
path = raw_input("Enter file and path")
结果
Enter file and path./dirTest
[('this', 1), ('test', 1), ('is', 1), ('just', 1)]
[('this', 1), ('test', 1), ('is', 1), ('just', 1)]
[('test', 2), ('is', 2), ('just', 2), ('this', 1), ('really', 1)]
[('test', 3), ('just', 2), ('this', 2), ('is', 2), ('power', 1),
('through', 1), ('really', 1)]
[('this', 2), ('another', 1), ('is', 1), ('read', 1), ('can', 1),
('file', 1), ('test', 1), ('you', 1)]
期望的结果 - 示例
[('this', 5), ('another', 1), ('is', 5), ('read', 1), ('can', 1),
('file', 1), ('test', 5), ('you', 1), ('power', 1), ('through', 1),
('really', 2)]
任何指导将不胜感激!
【问题讨论】:
标签: python word-count os.walk