如何打印 YAML 字符串的特定部分答案

【问题标题】：How do I print a specific part of a YAML string如何打印 YAML 字符串的特定部分
【发布时间】：2019-09-24 16:13:55
【问题描述】：

我的 YAML 数据库：

left:
  - title: Active Indicative
    fill: "#cb202c"
    groups:
      - "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"

我的 Python 代码：

import io
import yaml

with open("C:/Users/colin/Desktop/LBot/latin3_2.yaml", 'r', encoding="utf8") as f:
    doc = yaml.safe_load(f)
txt = doc["left"][1]["groups"][1]
print(txt)

【问题讨论】：

从txt提取[ ]里面的文字怎么样？
我该怎么做？
这根本不是 YAML 特有的——毕竟，您已经从较大的结构中提取了要拆分的字符串；它是 YAML 中的单个字符串，因此在 Python 中将其转换为单个字符串是 YAML 解析器的职责。如果您只是将字符串 'Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]' 作为输入，而将 YAML 部分完全排除在外，那么您仍然会遇到完全相同的问题（但混合的不相关/复杂因素更少）。
...关于这一点，请注意minimal reproducible example 指南——我们确实要求最少个问题，尽可能简化，同时仍然允许复制和测试手头的问题。
...也就是说，您确定要在索引时使用[1]s 而不是[0]s？请记住，[1] 是列表中的 second 元素； [0] 是第一个。

标签： python yaml pyyaml

【解决方案1】：

我没有 PyYaml 解决方案，但如果你已经有了 YAML 文件中的字符串，你可以使用 Python 的 regex 模块来提取 [ ] 中的文本。

import re

txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"

parts = txt.split(" | ")
print(parts)  
# ['Present', 'dūc[ō]', 'dūc[is]', 'dūc[it]', 'dūc[imus]', 'dūc[itis]', 'dūc[unt]']

pattern = re.compile("\\[(.*?)\\]")
output = []
for part in parts:
    match = pattern.search(part)
    if match:
        # group(0) is the matched part, ex. [ō]
        # group(1) is the text inside the (.*?), ex. ō
        output.append(match.group(1))
    else:
        output.append(part)

print(" | ".join(output))
# Present | ō | is | it | imus | itis | unt

代码首先将文本拆分为单独的部分，然后循环遍历每个部分 search-ing 以获取模式 [x]。如果找到它，它将从match object 中提取括号内的文本并将其存储在列表中。如果part 与模式不匹配（例如'Present'），它会按原样添加。

最后，将所有提取的字符串join-ed 一起重新构建没有括号的字符串。

编辑基于comment：

如果您只需要[ ] 中的一个字符串，您可以使用相同的正则表达式模式，但在整个txt 上使用findall 方法，这将返回匹配字符串的list em>与它们被发现的顺序相同。

import re

txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"

pattern = re.compile("\\[(.*?)\\]")
matches = pattern.findall(txt)
print(matches) 
# ['ō', 'is', 'it', 'imus', 'itis', 'unt']

那么只需使用一些变量从列表中选择一个项目：

selected_idx = 1  # 0-based indexing so this means the 2nd character
print(matches[selected_idx])
# is

【讨论】：

感谢您的回答！我将如何实现这一点，以便输出仅为ō 或is？
澄清：我将如何实现这一点，以便我可以选择是想要第一个选项ō还是第二个选项is？
还不清楚。预期的字符串输出是什么？只是ō 或is？如果您使用我的答案，一旦找到第一个匹配模式，您可以 break 退出循环。
因此，如果变量 == 1，则输出将为 o，但如果变量 == 2，则输出将为 is
@ColinMiller 我编辑了我的答案以展示如何选择一个特定的[x]。