分割一个 .yml 文件答案

【问题标题】：Dividing a .yml file up分割一个 .yml 文件
【发布时间】：2018-12-26 15:59:39
【问题描述】：

我需要将 .yml 文件分解为 3 个部分：页眉、工作（我需要编辑的部分）和页脚。页眉是“资源：”块之前的所有内容，页脚是资源块之后的所有内容。我基本上需要创建代码来创建 3 个列表、字典、字符串等任何工作，以保存 YAML 文件的这三个部分，然后允许我针对工作部分运行更多代码，然后在最后将它们连接在一起并生成具有相同缩进的新文档。不应对标题或尾部进行任何更改。

注意：我已经查找了有关 yaml 解析和诸如此类的所有内容，但似乎无法有效地实施我找到的建议。不涉及导入 yaml 的解决方案将是首选，但如果必须，请解释导入 yaml 代码的实际情况，以便我了解我在搞砸什么。

【问题讨论】：

您能发布一个您正在查看的非常简单的yaml 格式和一个简单的output 吗？这样我们就更容易看到你在看什么

标签： python python-3.x parsing yaml edit

【解决方案1】：

包含一个或多个 YAML 文档的文件（简称：YAML 文件，因为 2006 年 9 月，已建议使用扩展名.yaml)，是文本文件，并且可以从这些部分连接起来。唯一的要求是最后你有一个文本文件是一个有效的 YAML 文件。

最简单的当然是将页眉和页脚分开文件，但是当您谈论多个 YAML 文件时很快变得笨拙。然而，总是可以做一些基本的解析文件内容。

由于您的工作部分以Resource: 开头，并且您表示 3 列表或字典（您不能在 YAML 文件）。 YAML 文档的根级数据结构需要是 mapping 和其他所有内容，除了该映射的键需要缩进（理论上它只需要缩进更多，但实际上这几乎总是意味着键不缩进），喜欢（m.yaml）：

# header
a: 1
b:
  - 2
  - c: 3    # end of header
Resource:
# footer
c: 
d: "the end"   # really

或者根级别需要是一个序列（s.yaml）：

# header
- a: 1
  b:
  - 2
  - c: 3
- 42         # end of header
- Resource:
# footer
- c: 
  d: "the end"  # really

两者都可以在不加载 YAML 的情况下轻松拆分，这是执行此操作的示例代码具有根级映射的文件：

from pathlib import Path
from ruamel.yaml import YAML

inf = Path('m.yaml')
header = []  # list of lines
resource = [] 
footer = []
for line in inf.open():
    if not resource:
        if line.startswith('Resource:'):  # check if we are at end of the header
            resource.append(line)
            continue
        header.append(line)
        continue
    elif not footer:
        if not line or line[0] == ' ':   # still in the resource part
            resource.append(line)
            continue
    footer.append(line)

# you now have lists of lines for the header and the footer
# define the new data structure for the resource this is going to be a single key/value dict
upd_resource = dict(Resource=['some text', 'for the resource spec', {'a': 1, 'b': 2}])

# write the header lines, dump the resource lines, write the footer lines

outf = Path('out.yaml')

with outf.open('w') as out:
    out.write(''.join(header))
    yaml = YAML()
    yaml.indent(mapping=2, sequence=2, offset=0)  # the default values
    yaml.dump(upd_resource, out)
    out.write(''.join(footer))

print(outf.read_text())

这给出了：

# header
a: 1
b:
  - 2
  - c: 3    # end of header
Resource:
- some text
- for the resource spec
- a: 1
  b: 2
# footer
c: 
d: "the end"   # really

在解析 YAML 文件时做同样的事情并不困难。以下自动处理两种情况（根级别是映射还是序列）：

from pathlib import Path
from ruamel.yaml import YAML

inf = Path('s.yaml')
upd_resource_val = ['some text', 'for the resource spec', {'a': 1, 'b': 2}]
outf = Path('out.yaml')


yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=2, offset=0)
yaml.preserve_quotes = True
data = yaml.load(inf)
if isinstance(data, dict):
    data['Resource'] = upd_resource_val
else:  # assume a list, 
    for item in data:  # search for the item which has as value a dict with key Resource
        try:
            if 'Resource' in item:
                item['Resource'] = upd_resource_val
                break
        except TypeError:
            pass
yaml.dump(data, outf)

这将创建以下out.yaml：

# header
- a: 1
  b:
  - 2
  - c: 3
- 42         # end of header
- Resource:
  - some text
  - for the resource spec
  - a: 1
    b: 2
# footer
- c:
  d: "the end"  # really

如果 m.yaml 文件是输入，则输出将是与基于文本的“连接”示例代码完全相同。

【讨论】：

当然，如果你知道所有文件都有一个映射或所有文件在顶部都有一个序列，你可以通过删除根级别来简化最后一个示例代码@ 987654333@声明。