使用 Python 打印出字符、单词和行数答案

【问题标题】：Print out the character, word, and line amounts using Python使用 Python 打印出字符、单词和行数
【发布时间】：2015-10-03 21:58:07
【问题描述】：

这是我目前所拥有的：

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    infile = open(filename)
    lines = infile.readlines()
    words = infile.read()
    chars = infile.read()
    infile.close()
    print("line count:", len(lines))
    print("word count:", len(words.split()))
    print("character counter:", len(chars))

执行时，正确返回行数，但对于单词和字符数返回 0。不知道为什么...

【问题讨论】：

标签： python file character line word

【解决方案1】：

您可以遍历文件一次并计算行数、单词和字符数，而无需多次返回开头，您需要使用您的方法来执行此操作，因为在计算行数时会耗尽迭代器：

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    lines = chars = 0
    words = []
    with open(filename) as infile:
        for line in infile:
            lines += 1
            words.extend(line.split())
            chars += len(line)
    print("line count:", lines)
    print("word count:", len(words))
    print("character counter:", chars)
    return len(words) > len(set(words))  # Returns True if duplicate words

或者使用文件在字符末尾的副作用：

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    words = []
    with open(filename) as infile:
        for lines, line in enumerate(infile, 1):
            words.extend(line.split())
        chars = infile.tell()
    print("line count:", lines)
    print("word count:", len(words))
    print("character counter:", chars)
    return len(words) > len(set(words))  # Returns True if duplicate words

【讨论】：

【解决方案2】：

读取位置在末尾需要infile.seek(0)回到文件开头，seek(0)将其重置到开头，以便再次读取。

infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)

chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))

输出：

line count: 2
word count: 19
character counter: 113

其他方式....：

from collections import Counter
from itertools import chain
infile = open('data')

lines = infile.readlines()
cnt_lines = len(lines)

words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)

cnt_chars = len([ c for word in words  for c in word])

# show words frequency
print(Counter(words))

【讨论】：

然后在此之后，我必须检查文件是否有任何重复的单词，从而返回 True 或 False，具体取决于大小写。你知道怎么做吗？
为什么要无缘无故地创建四个列表？ OP 不想要数据，嘿想要计数
@PadraicCunningham OP 想知道更多，而不是更少。
@LetzerWille，了解更多关于究竟是什么，如何编写内存效率最低的代码？你听说过生成器或求和函数吗？

【解决方案3】：

调用readlines后你已经用尽了迭代器，你可以回到开始，但实际上你根本不需要将所有文件读入内存：

 def stats(filename):
    chars, words, dupes = 0, 0, False
    seen = set()
    with open(filename) as f:
        for i, line in enumerate(f, 1):
            chars += len(line)
            spl = line.split()
            words += len(spl)
            if dupes or not seen.isdisjoint(spl):
                dupes = True
            elif not dupes:
                seen.update(spl)
    return i, chars, words, dupes

然后通过解包来赋值：

no_lines, no_chars, no_words, has_dupes = stats("your_file")

如果您不想包含行尾，您可能需要使用chars += len(line.rstrip())。该代码仅存储所需的数据量，使用 readlines、read、完整数据的 dicts 等。这意味着对于大文件，您的代码不会很实用

【讨论】：

我最初采用了这种方法，但是 OP 在另一个答案中添加了一条评论，他们需要返回是否有重复的单词，因此将单词更改为列表。
确实，可以使用集合和标志来避免存储不必要的数据。

【解决方案4】：

File_Name = 'file.txt'

line_count = 0
word_count = 0
char_count = 0

with open(File_Name,'r') as fh:
    # This will produce a list of lines.
    # Each line of the file will be an element of the  list. 
    data = fh.readlines()

    # Count of  total number for list elements == total number of lines. 
    line_count = len(data)

    for line in data:
        word_count = word_count + len(line.split())
        char_count = char_count + len(line)

print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)

【讨论】：