Python：使用 .isalpha() 计算字数中的特定单词/字符答案

【问题标题】：Python: Using .isalpha() to count specific words/characters in a word countPython：使用 .isalpha() 计算字数中的特定单词/字符
【发布时间】：2019-10-21 20:43:40
【问题描述】：

我创建了一个可以计算文本文件中特定单词或字符的函数。

但我想创建一个条件，其中函数只计算一个被字母包围的字符。例如在文本文件中。

'This test is an example, this text doesn't have any meaning. It is only an example.'

如果我要通过我的函数运行此文本，测试撇号 (') 的计数，它将返回 3。但是我希望它返回 1，仅适用于 2 个字母字符内的撇号（例如 is not 或不会），但我希望它忽略没有被字母包围的所有其他撇号，例如单引号。

我尝试使用 .isalpha() 方法，但语法有问题。

【问题讨论】：

标签： python count word-count isalpha

【解决方案1】：

我认为正则表达式会更好，但如果你必须使用isalpha，类似：

s = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
sum(s[i-1].isalpha() and s[i]=="'" and s[i+1].isalpha() for i in range(1,len(s)-1))

返回 1。

【讨论】：

【解决方案2】：

如果您只是想忽略包含字符串本身的引号，最简单的方法可能是在计数之前将那些从字符串中取出的strip。

>>> text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"
>>> text.strip("'").count("'")
1

另一种方法是使用像\w'\w 这样的正则表达式，即字母，然后是'，然后是字母：

>>> sum(1 for _ in re.finditer("\w'\w", text))
1

这也适用于字符串中的引号：

>>> text = "Text that has a 'quote' in it."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

但它也会漏掉后面不跟另一个字母的撇号：

>>> text = "All the houses' windows were broken."
>>> sum(1 for _ in re.finditer("\w'\w", text))
0

【讨论】：

【解决方案3】：

正如 xnx 已经指出的，正确的方法是使用正则表达式：

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

print(len(re.findall("[a-zA-Z]'[a-zA-Z]", text)))
"""
Out:
    1
"""

这里模式中的撇号被一组英文字母包围，但是有许多预定义的字符集，详见RE docs。

【讨论】：

【解决方案4】：

你应该只使用正则表达式：

import re

text = "'This test is an example, this text doesn't have any meaning. It is only an example.'"

wordWrappedApos = re.compile(r"\w'\w")
found = re.findall(wordWrappedApos, text)
print(found)
print(len(found))

如果您想确保其中没有数字，请将“\w”替换为“[A-Za-z]”。

【讨论】：