计算字典中的单词（Python）答案

【问题标题】：Counting words in a dictionary (Python)计算字典中的单词（Python）
【发布时间】：2013-05-24 03:02:38
【问题描述】：

我有这段代码，我想打开一个指定的文件，然后每次有一个while循环它就会统计它，最后输出一个特定文件中的while循环总数。我决定将输入文件转换为字典，然后创建一个 for 循环，每次看到单词 while 后跟一个空格时，它会在最后打印 WHILE_ 之前将 +1 计数添加到 WHILE_。

但是这似乎不起作用，我不知道为什么。任何解决此问题的帮助将不胜感激。

这是我目前的代码：

WHILE_ = 0
INPUT_ = input("Enter file or directory: ")


OPEN_ = open(INPUT_)
READLINES_ = OPEN_.readlines()
STRING_ = (str(READLINES_))
STRIP_ = STRING_.strip()
input_str1 = STRIP_.lower()


dic = dict()
for w in input_str1.split():
    if w in dic.keys():
        dic[w] = dic[w]+1
    else:
        dic[w] = 1
DICT_ = (dic)


for LINE_ in DICT_:
    if  ("while\\n',") in LINE_:
        WHILE_ += 1
    elif ('while\\n",') in LINE_:
        WHILE_ += 1
    elif ('while ') in LINE_:
        WHILE_ += 1

print ("while_loops {0:>12}".format((WHILE_)))

这是我正在使用的输入文件：

'''A trivial test of metrics
Author: Angus McGurkinshaw
Date: May 7 2013
'''

def silly_function(blah):
    '''A silly docstring for a silly function'''
    def nested():
        pass
    print('Hello world', blah + 36 * 14)
    tot = 0  # This isn't a for statement
    for i in range(10):
        tot = tot + i
        if_im_done = false  # Nor is this an if
    print(tot)

blah = 3
while blah > 0:
    silly_function(blah)
    blah -= 1
    while True:
        if blah < 1000:
            break

输出应该是 2，但我的代码现在打印 0

【问题讨论】：

你为什么给你的变量起这么奇怪丑陋的名字？
它们现在只是占位符
标准库包括a module to parse Python code。
这太丑陋了，甚至无法尝试理解。可读性对您自己和他人都很重要——尤其是在您寻求帮助时。

标签： python dictionary counting

【解决方案1】：

这是一个非常奇怪的设计。您调用readlines 来获取字符串列表，然后在该列表上调用str，这会将整个内容连接成一个大字符串，每行的引用repr 用逗号连接并用正方形包围括号，然后将结果拆分为空格。我不知道你为什么会这样做。

你奇怪的变量名，额外无用的代码行如DICT_ = (dic)等只会进一步混淆事物。

但我可以解释为什么它不起作用。在完成所有这些愚蠢操作后尝试打印出DICT_，您会看到包含while 的唯一键是while 和'while。由于这些都不匹配您要查找的任何模式，因此您的计数最终为 0。

还值得注意的是，即使有多个模式实例，您也只会将 1 添加到 WHILE_，因此您的整个计数字典是无用的。

如果您不混淆您的字符串，尝试恢复它们，然后尝试匹配错误恢复的版本，这将容易得多。直接做就好了。

在此期间，我还将修复一些其他问题，以便您的代码可读、更简单，并且不会泄漏文件等等。这是您尝试手动破解的逻辑的完整实现：

import collections

filename = input("Enter file: ")
counts = collections.Counter()
with open(filename) as f:
    for line in f:
        counts.update(line.strip().lower().split())
print('while_loops {0:>12}'.format(counts['while']))

当您在示例输入上运行此命令时，您会正确获得 2。并且扩展它来处理if 和for 是微不足道和显而易见的。

但是，请注意，您的逻辑存在一个严重问题：任何看起来像关键字但位于注释或字符串中间的内容仍会被拾取。如果不编写某种代码来去除 cmets 和字符串，就没有办法解决这个问题。这意味着您将多算if 和for 1。明显的剥离方式——line.partition('#')[0] 和引号类似——是行不通的。首先，在if 关键字之前有一个字符串是完全有效的，例如"foo" if x else "bar"。其次，你不能这样处理多行字符串。

这些问题以及其他类似问题是您几乎肯定需要真正的解析器的原因。如果您只是想解析 Python 代码，标准库中的the ast module 是执行此操作的明显方法。如果您想为各种不同的语言编写快速而肮脏的解析器，请尝试pyparsing，它非常好，并附带了一些很好的示例。

这是一个简单的例子：

import ast

filename = input("Enter file: ")
with open(filename) as f:
    tree = ast.parse(f.read())
while_loops = sum(1 for node in ast.walk(tree) if isinstance(node, ast.While))
print('while_loops {0:>12}'.format(while_loops))

或者，更灵活：

import ast
import collections

filename = input("Enter file: ")
with open(filename) as f:
    tree = ast.parse(f.read())
counts = collections.Counter(type(node).__name__ for node in ast.walk(tree))    
print('while_loops {0:>12}'.format(counts['While']))
print('for_loops {0:>14}'.format(counts['For']))
print('if_statements {0:>10}'.format(counts['If']))

【讨论】：

很好的答案和使用ast模块的很好的例子。
@JonClements: 好吧，我对 AST 所做的只是walk 和type(node)，所以它并不能真正展示你可以从中获得的真正乐趣（例如，@ 987654323@).