在文本文件Python中搜索大写单词的数量答案

【问题标题】：Searching for the amount of capital words in a text file Python在文本文件Python中搜索大写单词的数量
【发布时间】：2020-02-28 13:38:51
【问题描述】：

我需要帮助整理文本文件

我尝试了多种 for 循环变体。我还尝试去除所有空格并单独计算文件中的字母。我还尝试了 strip 函数的多种变体和不同的 if 语句

for character in file:
    if character.isupper():
        capital += 1
        file.readline().rstrip()
        break

print(capital)

我希望程序读取文档中的每个单词或字母并返回其中包含的大写单词的总数。

【问题讨论】：

当您执行for character in file: 时，您实际上是在遍历行，而不是字符。
如何遍历行中的字符？
您可以使用另一个循环来遍历字符。

标签： python string file search capitalization

【解决方案1】：

如果目标是计算以大写字母开头的单词，那么我会使用布尔值是整数的子类型这一事实：

with open('my_textfile.txt', 'r') as text:
    print(sum(word.istitle() for row in text for word in row))

【讨论】：

【解决方案2】：

假设我们有一个示例文件doc.txt，其内容如下：

这是一个用于识别大写单词的测试文件。我将其创建为示例，因为问题的要求可能会有所不同。例如，像 SQL 这样的首字母缩略词应该算作大写单词吗？如果否：这应该导致八个大写单词。如果是：这应该是九个。

如果您想计算大写（也称为标题大小写）单词，但排除首字母缩略词等全大写单词，您可以执行以下操作：

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in line.split():                                            
                if word.istitle():                                               
                    print(word)                                                  
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 8

如果要计算全大写单词，您可以修改该功能以仅检查单词的第一个字母。请注意，filter(None, ...) 函数将确保 word 绝不是空字符串，从而避免在这些情况下抛出的 IndexError：

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in filter(None, line.split()):                              
                if word[0].isupper():                                            
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 9

如果你有更复杂的需求，你可以得到一个这样的单词迭代：

from itertools import chain                                                      


def get_words(filename):                                                         
    with open(filename, 'r') as fp:                                              
        words = chain.from_iterable(line.split() for line in fp)                 
        yield from words

【讨论】：

【解决方案3】：

两件事：

确保您迭代的是字符而不是单词或句子。放一些打印语句来检查。
删除 if 块中的 break 语句。这将立即退出您的 for 循环，并导致您只数 1。

for sentence in file:
    for char in sentence:
        if char.isupper():
            capital += 1

print(capital)

【讨论】：

迭代不应该超过文字吗？ OP将目标定义为：'找到大写单词的数量'。例如，O'Reilly 是一个包含两个大写字母的大写单词。看来 str.istitle() 更适合实现目标。
微妙但非常重要的一点。我同意。遍历单词并使用 .istitle() 将是最合适的方法。