在字符串中查找大写首字母缩写词答案

【问题标题】：Finding uppercase acronyms in a string在字符串中查找大写首字母缩写词
【发布时间】：2023-04-09 00:54:01
【问题描述】：

我正在尝试在字符串中查找大写首字母缩略词。例如，如果输入是“我需要尽快见到你，因为 YOLO，你知道”应该返回 ["ASAP", "YOLO"]。

#!/usr/bin/env python3

import string


def acronyms(s):

    s.translate(string.punctuation)
    for i, x in enumerate(s):
        while x.upper():
            print(x)
            i += 1


def main():
    print(acronyms("""I need to see you ASAP, because YOLO, you know."""))


if __name__ == "__main__":
    main()

我试图去掉标点符号，然后循环遍历字符串，当它是大写时打印出字母。它导致了一个无限循环。我想使用字符串操作来解决这个问题，所以没有 RegEx

编辑：

为了提高效率而删除标点符号的变化

发件人：

exclude = set(string.punctuation)
        s = "".join(ch for ch in s if ch not in exclude)

收件人：

s.translate(string.punctuation)

【问题讨论】：

标签： python string loops uppercase

【解决方案1】：

我想指出几件事。一，你最终得到了一个挂起的程序，因为你有一个while True，而不是一个break。然后，当您执行n+=1 时，您有点让enumerate 变得毫无意义。

for i, x in enumerate(s):
    n+=1

这一切都可以很容易地简化，不用enumerate needed。

def acronyms(s):

    exclude = set(string.punctuation)
    s = "".join(ch for ch in s if ch not in exclude)
    acro = [x for x in s.split() if x.isupper()]
    return acro

输出

['I', 'ASAP', 'YOLO']

遗憾的是，我们确实有一个额外的 I，它恰好不是首字母缩略词，因此一种解决方法可能是确保 x 在附加之前绝不是一个字母。

acro = [x for x in s.split() if x.isupper() and len(x) != 1]

【讨论】：

在 OP 代码上构建的不错，但我真的认为使用 s.split() 和 exclude 对象不是要走的路。它完成了工作，但距离产生强大的标记化还很远
@Matt 不是那种人，但堆栈溢出不是为人们编写整个代码的地方。给人们完整的答案，他们不会从中学到东西。
我尝试实现这个并且一些答案导致了一个元组而不是一个字符串，所以我会尝试找到一种方法来解决它

【解决方案2】：

您的 while 循环遍历第一个字符，但永远不会跳到下一个字符。

您还想过滤掉“I”，因为单个字母通常不被归类为首字母缩略词。

string.isupper() 函数会检查整个字符串而不是单个字符，因此我建议您使用它。它看起来像这样：

def acronyms(s):
    words = s.split()
    acronyms = []
    for word in words:
        if word.isupper() and len(word) > 1:
            acronyms.append(word)
    return acronyms

【讨论】：

【解决方案3】：

我强烈推荐使用 nltk 的出色标记化包，它可以出色地处理边缘情况和标点符号。

对于将首字母缩写词定义为的简化方法：

所有字符均按字母顺序排列
所有字符都是大写的

以下内容就足够了：

from nltk.tokenize import word_tokenize

def get_acronyms(text):
    return [
        token for token in word_tokenize(text)
        if token.isupper()
    ]

【讨论】：

【解决方案4】：

这应该可以工作：

def acronyms(x):
  ans = []
  y = x.split(" ")
  for i in y:
    if i.isupper():
      ans += [i]
  return ans

isupper() 只要没有小写字母就返回True，即使有标点符号

【讨论】：