【问题标题】:Print out the character, word, and line amounts using Python使用 Python 打印出字符、单词和行数
【发布时间】:2015-10-03 21:58:07
【问题描述】:

这是我目前所拥有的:

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    infile = open(filename)
    lines = infile.readlines()
    words = infile.read()
    chars = infile.read()
    infile.close()
    print("line count:", len(lines))
    print("word count:", len(words.split()))
    print("character counter:", len(chars))

执行时,正确返回行数,但对于单词和字符数返回 0。不知道为什么...

【问题讨论】:

    标签: python file character line word


    【解决方案1】:

    您可以遍历文件一次并计算行数、单词和字符数,而无需多次返回开头,您需要使用您的方法来执行此操作,因为在计算行数时会耗尽迭代器:

    def stats(filename):
        ' prints the number of lines, words, and characters in file filename'
        lines = chars = 0
        words = []
        with open(filename) as infile:
            for line in infile:
                lines += 1
                words.extend(line.split())
                chars += len(line)
        print("line count:", lines)
        print("word count:", len(words))
        print("character counter:", chars)
        return len(words) > len(set(words))  # Returns True if duplicate words
    

    或者使用文件在字符末尾的副作用:

    def stats(filename):
        ' prints the number of lines, words, and characters in file filename'
        words = []
        with open(filename) as infile:
            for lines, line in enumerate(infile, 1):
                words.extend(line.split())
            chars = infile.tell()
        print("line count:", lines)
        print("word count:", len(words))
        print("character counter:", chars)
        return len(words) > len(set(words))  # Returns True if duplicate words
    

    【讨论】:

      【解决方案2】:

      读取位置在末尾需要infile.seek(0)回到文件开头,seek(0)将其重置到开头,以便再次读取。

      infile = open('data')
      lines = infile.readlines()
      infile.seek(0)
      print(lines)
      words = infile.read()
      infile.seek(0)
      
      chars = infile.read()
      infile.close()
      print("line count:", len(lines))
      print("word count:", len(words.split()))
      print("character counter:", len(chars))
      

      输出:

      line count: 2
      word count: 19
      character counter: 113
      

      其他方式....

      from collections import Counter
      from itertools import chain
      infile = open('data')
      
      lines = infile.readlines()
      cnt_lines = len(lines)
      
      words = list(chain.from_iterable([x.split() for x in lines]))
      cnt_words = len(words)
      
      cnt_chars = len([ c for word in words  for c in word])
      
      # show words frequency
      print(Counter(words))
      

      【讨论】:

      • 然后在此之后,我必须检查文件是否有任何重复的单词,从而返回 True 或 False,具体取决于大小写。你知道怎么做吗?
      • 为什么要无缘无故地创建四个列表? OP 不想要数据,嘿想要计数
      • @PadraicCunningham OP 想知道更多,而不是更少。
      • @LetzerWille,了解更多关于究竟是什么,如何编写内存效率最低的代码?你听说过生成器或求和函数吗?
      【解决方案3】:

      调用readlines后你已经用尽了迭代器,你可以回到开始,但实际上你根本不需要将所有文件读入内存:

       def stats(filename):
          chars, words, dupes = 0, 0, False
          seen = set()
          with open(filename) as f:
              for i, line in enumerate(f, 1):
                  chars += len(line)
                  spl = line.split()
                  words += len(spl)
                  if dupes or not seen.isdisjoint(spl):
                      dupes = True
                  elif not dupes:
                      seen.update(spl)
          return i, chars, words, dupes
      

      然后通过解包来赋值:

      no_lines, no_chars, no_words, has_dupes = stats("your_file")
      

      如果您不想包含行尾,您可能需要使用chars += len(line.rstrip())。该代码仅存储所需的数据量,使用 readlines、read、完整数据的 dicts 等。这意味着对于大文件,您的代码不会很实用

      【讨论】:

      • 我最初采用了这种方法,但是 OP 在另一个答案中添加了一条评论,他们需要返回是否有重复的单词,因此将单词更改为列表。
      • 确实,可以使用集合和标志来避免存储不必要的数据。
      【解决方案4】:
      File_Name = 'file.txt'
      
      line_count = 0
      word_count = 0
      char_count = 0
      
      with open(File_Name,'r') as fh:
          # This will produce a list of lines.
          # Each line of the file will be an element of the  list. 
          data = fh.readlines()
      
          # Count of  total number for list elements == total number of lines. 
          line_count = len(data)
      
          for line in data:
              word_count = word_count + len(line.split())
              char_count = char_count + len(line)
      
      print('Line Count : ' , line_count )
      print('Word Count : ', word_count)
      print('Char Count : ', char_count)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-09-09
        • 1970-01-01
        • 2019-06-28
        • 2021-04-06
        • 2020-11-06
        相关资源
        最近更新 更多