如何计算文件中一行中单词的出现次数？答案

【问题标题】：How to count occurrences of a word in a line from file?如何计算文件中一行中单词的出现次数？
【发布时间】：2019-10-31 21:31:28
【问题描述】：

我有以下问题。我的 Python 初学者课程即将迎来期中考试，虽然我理解了练习期中的其他问题，但这一次让我有点难过。首先，这是问题的文本。我遇到麻烦的地方是弄清楚如何遍历一行中的每个单词并检查它是否已经被看到。我觉得很难概念化。首先，这是问题的正文：

编写一个名为 cloneLines 的函数，它接受两个参数：
1. inFile，一个字符串，在cloneLines被调用之前存在的一个输入文件的名字
2. outFile，一个字符串，cloneLines创建并写入的输出文件的名称

函数cloneLines 逐行读取inFile 的内容，并将包含至少一个在该行中多次出现的单词的任何行写入outFile。您可以假设输入文件仅包含小写字母、空格和换行符。

例如，如果下面是文件william.txt的内容：

double double toil and trouble
fire burn and caldron bubble
eye of newt and toe of frog
fillet of a fenny snake
in the caldron boil and bake
double double toil and trouble
fire burn and caldron bubble

以下函数调用：

inFile = 'william.txt'
outFile = 'clones.txt'
cloneLines(inFile, outFile)

应该创建文件clones.txt，内容为：

double double toil and trouble
eye of newt and toe of frog
double double toil and trouble

我只知道打开文件进行读写和开始一个 for 循环。再说一次，我很难理解这一点。任何额外阅读的建议都会非常有帮助。我应该分割从文件中读取的行吗？我只需要指出一个大致的方向。

def cloneLines (inFile, outFile):
    inputfile = open(infile)
    outputfile = open(outfile, 'w')

    for line in inputfile.read():
        ...

【问题讨论】：

到目前为止做得很好。您的下一步将是：1. 测试 line 是否有重复的单词（查看 splitting python 字符串和 read 上的 collections Counter 类）和 2. 如果测试为真，请编写排队。
检查一个单词是否在一行中出现多次的简单方法是使用set 并检查是否 len(set(words)) == len(words)，其中 words = line。分裂（）。注意：我假设从您的示例中该词不必重复，而只是在一行中出现多次。
如果要逐行读取文件，请使用for line in inputfile:。请注意，这些行都将以 '\n' 换行符结尾，因此您可能希望在循环中的代码开头使用 line = line.rstrip() 将其删除。

标签： python file writing

【解决方案1】：

以下内容将写入输出文件，任何在该行上多次包含相同单词的行。

import sys

class SplitStream:
    """
    This is just so you can see the contents
    of the output file
    without having to open the output file
    """
    def __init__(self, s1, s2):
        self.s1 = s1
        self.s2 = s2
    def write(self, arg):
        self.s1.write(arg)
        self.s2.write(arg)


def cloneLines(inFile:str, outFile:str):
    inFile  = str(inFile)
    outFile = str(outFile)
    with open(inFile , mode = "r") as i_f:
        with open(outFile, mode="w") as o_f:
            o_f = SplitStream(o_f, sys.stdout)
            # TODO remove `SplitStream`
            for line in i_f:
                if contains_a_word_more_than_once(line):
                    o_f.write(line)

def contains_a_word_more_than_once(stryng):
    stryng = str(stryng)
    sep_words = stryng.split(" ")

    # if `stryng` is...
    #     "fillet of a fenny snake"
    #
    # then `sep_words` is:
    #     ["fillet", "of", "a", "fenny", "snake"]

    d = dict()    
    for word in sep_words:
        if word in d.keys():
            return True
        d[word] = True
    return False

【讨论】：