【问题标题】:Configuring ruamel.yaml to allow duplicate keys配置 ruamel.yaml 以允许重复键
【发布时间】:2019-08-27 16:05:06
【问题描述】:

我正在尝试使用 ruamel.yaml 库来处理包含重复键的 Yaml 文档。在这种情况下,重复键恰好是合并键<<:

这是 yaml 文件,dupe.yml:

foo: &ref1
  a: 1

bar: &ref2
  b: 2

baz:
  <<: *ref1
  <<: *ref2
  c: 3

这是我的脚本:

import ruamel.yaml

yml = ruamel.yaml.YAML()
yml.allow_duplicate_keys = True
doc = yml.load(open('dupe.yml'))

assert doc['baz']['a'] == 1
assert doc['baz']['b'] == 2
assert doc['baz']['c'] == 3

运行时,它会引发此错误:

Traceback (most recent call last):
  File "rua.py", line 5, in <module>
    yml.load(open('dupe.yml'))
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/main.py", line 331, in load
    return constructor.get_single_data()
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 111, in get_single_data
    return self.construct_document(node)
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 121, in construct_document
    for _dummy in generator:
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 1543, in construct_yaml_map
    self.construct_mapping(node, data, deep=True)
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 1448, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 174, in construct_object
    for _dummy in generator:
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 1543, in construct_yaml_map
    self.construct_mapping(node, data, deep=True)
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 1399, in construct_mapping
    merge_map = self.flatten_mapping(node)
  File "/usr/local/lib/python3.7/site-packages/ruamel/yaml/constructor.py", line 1350, in flatten_mapping
    raise DuplicateKeyError(*args)
ruamel.yaml.constructor.DuplicateKeyError: while constructing a mapping
  in "dupe.yml", line 8, column 3
found duplicate key "<<"
  in "dupe.yml", line 9, column 3

To suppress this check see:
   http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys

Duplicate keys will become an error in future releases, and are errors
by default when using the new API.

如何让 ruamel 正确读取此文件?文档说allow_duplicate_keys = True 将使加载程序容忍重复的密钥,但它似乎不起作用。

我正在使用 Python 3.7 和 ruamel.yaml 0.15.90。

【问题讨论】:

  • 在使用 Python 3 时,我建议您使用 pathlib.Path 作为加载/转储的参数:yml.load(Path('dupe.yaml'))。另请注意,自 2006 年 9 月以来,YAML 文件的推荐扩展名一直是 .yaml

标签: python yaml ruamel.yaml


【解决方案1】:

那个

yaml.allow_duplicate_keys = True

仅适用于 0.15.91 之前版本中的非合并键。

在 0.15.91+ 中,这是可行的,并且合并键采用该键的第一个实例化的值(与非合并键一样),这意味着它就像您编写的那样工作:

baz:
  <<: *ref1
  c: 3

并且不是就像你写的那样:

baz:
  <<: [*ref1, *ref2]
  c: 3

如果您需要,您必须对处理合并键的 flatten 例程进行猴子修补(这会影响以下所有带有双合并键的 YAML 文件的加载):

import sys
import ruamel.yaml

yaml_str = """\
foo: &ref1
  a: 1

bar: &ref2
  b: 2

baz:
  <<: *ref1
  <<: *ref2
  c: 3

"""

def my_flatten_mapping(self, node):

    def constructed(value_node):
        # type: (Any) -> Any
        # If the contents of a merge are defined within the
        # merge marker, then they won't have been constructed
        # yet. But if they were already constructed, we need to use
        # the existing object.
        if value_node in self.constructed_objects:
            value = self.constructed_objects[value_node]
        else:
            value = self.construct_object(value_node, deep=False)
        return value

    merge_map_list = []
    index = 0
    while index < len(node.value):
        key_node, value_node = node.value[index]
        if key_node.tag == u'tag:yaml.org,2002:merge':
            if merge_map_list and not self.allow_duplicate_keys:  # double << key
                args = [
                    'while constructing a mapping',
                    node.start_mark,
                    'found duplicate key "{}"'.format(key_node.value),
                    key_node.start_mark,
                    """
                    To suppress this check see:
                       http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys
                    """,
                    """\
                    Duplicate keys will become an error in future releases, and are errors
                    by default when using the new API.
                    """,
                ]
                if self.allow_duplicate_keys is None:
                    warnings.warn(DuplicateKeyFutureWarning(*args))
                else:
                    raise DuplicateKeyError(*args)
            del node.value[index]
            # if key/values from later merge keys have preference you need
            # to insert value_node(s) at the beginning of merge_map_list
            # instead of appending
            if isinstance(value_node, ruamel.yaml.nodes.MappingNode):
                merge_map_list.append((index, constructed(value_node)))
            elif isinstance(value_node, ruamel.yaml.nodes.SequenceNode):
                for subnode in value_node.value:
                    if not isinstance(subnode, ruamel.yaml.nodes.MappingNode):
                        raise ruamel.yaml.constructor.ConstructorError(
                            'while constructing a mapping',
                            node.start_mark,
                            'expected a mapping for merging, but found %s' % subnode.id,
                            subnode.start_mark,
                        )
                    merge_map_list.append((index, constructed(subnode)))
            else:
                raise ConstructorError(
                    'while constructing a mapping',
                    node.start_mark,
                    'expected a mapping or list of mappings for merging, '
                    'but found %s' % value_node.id,
                    value_node.start_mark,
                )
        elif key_node.tag == u'tag:yaml.org,2002:value':
            key_node.tag = u'tag:yaml.org,2002:str'
            index += 1
        else:
            index += 1
    return merge_map_list

ruamel.yaml.constructor.RoundTripConstructor.flatten_mapping = my_flatten_mapping

yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
data = yaml.load(yaml_str)
for k in data['baz']:
    print(k, '>', data['baz'][k])

以上给出:

c > 3
a > 1
b > 2

【讨论】:

  • 谢谢。我意识到最初的问题是模棱两可的,所以我更新了它以澄清我正在寻找的行为是你的第二个 sn-p:基本上将重复的合并键组合成一个。
  • @mamacdon 我真的很想知道是什么程序一开始就创建了那种糟糕的 YAML。我可以更新我的答案,但是从您的示例中不清楚哪个值具有优先权([*ref1, *ref2][*ref2, *ref1] 不同,以防锚定映射具有共同的键,但具有不同的值。您需要什么版本(让我知道如果您不了解其中的区别)?
  • YAML 是手写的。它来自我们团队的 Concourse CI 管道中使用的配置文件。我不知道它是什么版本的 YAML。 Concourse CI 似乎使用库 gopkg.in/yaml.v2 作为其解析器。因此,我们将一直针对库允许的任何输入进行开发。
【解决方案2】:

在阅读了库源代码后,我找到了一种解决方法。将选项设置为 None 可防止该错误。

yml.allow_duplicate_keys = None

警告仍会打印到控制台,但不是致命的,程序将继续运行。

【讨论】:

    猜你喜欢
    • 2020-11-01
    • 2018-08-31
    • 1970-01-01
    • 2023-04-03
    • 2013-08-22
    • 1970-01-01
    • 2012-09-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多