【问题标题】:Calculating source frequency in Python在 Python 中计算源频率
【发布时间】:2021-01-25 15:11:24
【问题描述】:

我是 python 新手;我正在寻找计算源频率。我有文件(来源在标记中),我想找到所有来源中显示的单词来计算。例如,显示来源的单词“beautiful”,结果单词“beautiful”在 5 个来源中。我已经有了python代码来查找一个单词,但是我需要从文件中查找所有单词,我应该如何更改代码??

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()

            if ('beautiful' in text):
                f.write('The word excist in the file ' + filename[:-4] + '\n')
            else:
                f.write('The word doen't excist in the file' + filename[:-4] + '\n')

感谢您的帮助,谢谢!

【问题讨论】:

  • f.write('The word doen't excist in the file' + filename[:-4] + '\n') 中,您需要转义',例如doen\'t 而不是don't
  • 如果您要对很多非常大的文件执行此操作,并且如果性能成为问题,您可能需要查看this answer 以获取性能改进的想法。

标签: python pandas xcode file frequency


【解决方案1】:

如上所述,您需要转义 ' 字符。逃避它的方法是把'\'。赞doen\'t

from os import listdir

with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
        with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()
            text = text.strip().lower()
            text = text.replace(".", "").replace(",", "").replace("\"", "").replace("'", "") # replace all .,"'
            words = text.split(" ") # split the text
            unique_words = set(words)
            count_dict = {}
            for each_word in words:
                if(each_word in count_dict):
                    count_dict[each_word] += 1
                else:
                    count_dict[each_word] = 1
            for k in count_dict:
                f.write('The word' + k +'excist in the file ' + filename[:-4] + ' for ' + str(count_dict[k]) + ' number of times' '\n')

#             if ('beautiful' in text):
#                 f.write('The word excist in the file ' + filename[:-4] + '\n')
#             else:
#                 f.write('The word doen\'t excist in the file' + filename[:-4] + '\n')

【讨论】:

  • 这只是作为结果显示的文本。我需要的是为所有单词生成一个结果,表明它们出现了多少来源。
  • 我觉得你应该自己试一试。但我已经更改了代码以适应它。
  • 谢谢!是的,它可以拆分单词,但我已经有了我需要的只是比较以查看它们显示在哪些文件中。例如,将包含所有单词的文件与其他文件进行比较
  • @elle 不要忘记接受答案。听起来不错:)
猜你喜欢
  • 2021-09-21
  • 1970-01-01
  • 2012-12-31
  • 1970-01-01
  • 1970-01-01
  • 2015-01-07
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多