在 Python 中计算源频率答案

【问题标题】：Calculating source frequency in Python在 Python 中计算源频率
【发布时间】：2021-01-25 15:11:24
【问题描述】：

我是 python 新手；我正在寻找计算源频率。我有文件（来源在标记中），我想找到所有来源中显示的单词来计算。例如，显示来源的单词“beautiful”，结果单词“beautiful”在 5 个来源中。我已经有了python代码来查找一个单词，但是我需要从文件中查找所有单词，我应该如何更改代码？？

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()

            if ('beautiful' in text):
                f.write('The word excist in the file ' + filename[:-4] + '\n')
            else:
                f.write('The word doen't excist in the file' + filename[:-4] + '\n')

感谢您的帮助，谢谢！

【问题讨论】：

在f.write('The word doen't excist in the file' + filename[:-4] + '\n') 中，您需要转义'，例如doen\'t 而不是don't
如果您要对很多非常大的文件执行此操作，并且如果性能成为问题，您可能需要查看this answer 以获取性能改进的想法。

标签： python pandas xcode file frequency

【解决方案1】：

如上所述，您需要转义 ' 字符。逃避它的方法是把'\'。赞doen\'t

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()
            text = text.strip().lower()
            text = text.replace(".", "").replace(",", "").replace("\"", "").replace("'", "") # replace all .,"'
            words = text.split(" ") # split the text
            unique_words = set(words)
            count_dict = {}
            for each_word in words:
                if(each_word in count_dict):
                    count_dict[each_word] += 1
                else:
                    count_dict[each_word] = 1
            for k in count_dict:
                f.write('The word' + k +'excist in the file ' + filename[:-4] + ' for ' + str(count_dict[k]) + ' number of times' '\n')

#             if ('beautiful' in text):
#                 f.write('The word excist in the file ' + filename[:-4] + '\n')
#             else:
#                 f.write('The word doen\'t excist in the file' + filename[:-4] + '\n')

【讨论】：

这只是作为结果显示的文本。我需要的是为所有单词生成一个结果，表明它们出现了多少来源。
我觉得你应该自己试一试。但我已经更改了代码以适应它。
谢谢！是的，它可以拆分单词，但我已经有了我需要的只是比较以查看它们显示在哪些文件中。例如，将包含所有单词的文件与其他文件进行比较
@elle 不要忘记接受答案。听起来不错:)