如何使用 ruamel.yaml 注释掉 YAML 部分？答案

【问题标题】：How to comment out a YAML section using ruamel.yaml?如何使用 ruamel.yaml 注释掉 YAML 部分？
【发布时间】：2017-05-07 16:37:26
【问题描述】：

最近我尝试使用ruamel.yaml 管理我的 docker-compose 服务配置（即docker-compose.yml）。

我需要在需要时注释掉服务块并取消注释。假设我有以下文件：

version: '2'
services:
    srv1:
        image: alpine
        container_name: srv1
        volumes:
            - some-volume:/some/path
    srv2:
        image: alpine
        container_name: srv2
        volumes_from:
            - some-volume
volumes:
    some-volume:

是否有一些解决方法可以注释掉 srv2 块？就像下面的输出：

version: '2'
services:
    srv1:
        image: alpine
        container_name: srv1
        volumes:
            - some-volume:/some/path
    #srv2:
    #    image: alpine
    #    container_name: srv2
    #    volumes_from:
    #        - some-volume
volumes:
    some-volume:

另外，有没有办法取消注释这个块？（假设我已经持有原来的srv2块，我只需要一个方法来删除这些注释行）

【问题讨论】：

标签： python pyyaml ruamel.yaml

【解决方案1】：

如果 srv2 是 YAML 中所有映射的唯一键，那么“简单”的方法是遍历 de 行，测试 de 剥离版本的行是否以 srv2: 开头，注意前导空格的数量并注释掉该行和后面的行，直到您注意到前导空格相等或更少的行。这样做的好处是，除了简单快速之外，它还可以处理不规则的缩进（如您的示例：srv1 之前的 4 个位置和some-volume 之前的 6 个位置）。

使用ruamel.yaml 也可以这样做，但不那么简单。您必须知道，当 round_trip_loading 时，ruamel.yaml 通常会将注释附加到已处理的最后一个结构（映射/序列），并且由于在您的示例中注释掉 srv1 的结果与 srv2 完全不同（即第一个键值对，如果被注释掉，则不同于所有其他键值对）。

如果您将预期输出标准化为四个缩进位置并在srv1 之前添加注释以进行分析，请加载该注释，您可以搜索注释结束的位置：

from ruamel.yaml.util import load_yaml_guess_indent

yaml_str = """\
version: '2'
services:
    #a
    #b
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume:/some/path
    #srv2:
    #    image: alpine
    #    container_name: srv2
    #    volumes_from:
    #      - some-volume
volumes:
    some-volume:
"""

data, indent, block_seq_indent = load_yaml_guess_indent(yaml_str)
print('indent', indent, block_seq_indent)

c0 = data['services'].ca
print('c0:', c0)
c0_0 = c0.comment[1][0]
print('c0_0:', repr(c0_0.value), c0_0.start_mark.column)

c1 = data['services']['srv1']['volumes'].ca
print('c1:', c1)
c1_0 = c1.end[0]
print('c1_0:', repr(c1_0.value), c1_0.start_mark.column)

哪个打印：

indent 4 2
c0: Comment(comment=[None, [CommentToken(), CommentToken()]],
  items={})
c0_0: '#a\n' 4
c1: Comment(comment=[None, None],
  items={},
  end=[CommentToken(), CommentToken(), CommentToken(), CommentToken(), CommentToken()])
c1_0: '#srv2:\n' 4

所以你“只有”，如果你注释掉第一个键值对，你必须创建第一个类型注释 (c0)，如果你注释掉任何其他键，你必须创建另一个 (c1) -值对。 startmark 是 StreamMark()（来自 ruamel/yaml/error.py），创建 cmets 时该实例的唯一重要属性是 column。

幸运的是，这比上面显示的要容易一些，因为没有必要将 cmets 附加到 volumes 值的“末尾”，将它们附加到 srv1 值的末尾具有相同的效果。

在下面的comment_block 需要一个键列表，它是要被注释掉的元素的路径。

import sys
from copy import deepcopy
from ruamel.yaml import round_trip_dump
from ruamel.yaml.util import load_yaml_guess_indent
from ruamel.yaml.error import StreamMark
from ruamel.yaml.tokens import CommentToken


yaml_str = """\
version: '2'
services:
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume:/some/path
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
volumes:
    some-volume:
"""


def comment_block(d, key_index_list, ind, bsi):
    parent = d
    for ki in key_index_list[:-1]:
        parent = parent[ki]
    # don't just pop the value for key_index_list[-1] that way you lose comments
    # in the original YAML, instead deepcopy and delete what is not needed
    data = deepcopy(parent)
    keys = list(data.keys())
    found = False
    previous_key = None
    for key in keys:
        if key != key_index_list[-1]:
            if not found:
                previous_key = key
            del data[key]
        else:
            found = True
    # now delete the key and its value
    del parent[key_index_list[-1]]
    if previous_key is None:
        if parent.ca.comment is None:
            parent.ca.comment = [None, []]
        comment_list = parent.ca.comment[1]
    else:
        comment_list = parent[previous_key].ca.end = []
        parent[previous_key].ca.comment = [None, None]
    # startmark can be the same for all lines, only column attribute is used
    start_mark = StreamMark(None, None, None, ind * (len(key_index_list) - 1))
    for line in round_trip_dump(data, indent=ind, block_seq_indent=bsi).splitlines(True):
        comment_list.append(CommentToken('#' + line, start_mark, None))

for srv in ['srv1', 'srv2']:
    data, indent, block_seq_indent = load_yaml_guess_indent(yaml_str)
    comment_block(data, ['services', srv], ind=indent, bsi=block_seq_indent)
    round_trip_dump(data, sys.stdout,
                    indent=indent, block_seq_indent=block_seq_indent,
                    explicit_end=True,
    )

哪个打印：

version: '2'
services:
    #srv1:
    #    image: alpine
    #    container_name: srv1
    #    volumes:
    #      - some-volume:/some/path
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
volumes:
    some-volume:
...
version: '2'
services:
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume:/some/path
    #srv2:
    #    image: alpine
    #    container_name: srv2      # second container
    #    volumes_from:
    #      - some-volume
volumes:
    some-volume:
...

（explicit_end=True 不是必需的，这里使用它来自动区分两个 YAML 转储）。

也可以通过这种方式移除 cmets。递归搜索注释属性 (.ca) 以查找已注释掉的候选对象（可能会提示从何处开始）。从 cmets 中去除前导 # 并连接，然后是 round_trip_load。根据 cmets 的列，您可以确定在何处附加未注释的键值对。

【讨论】：

我的示例输出严格缩进 4 个空格，奇怪的是为什么它会在浏览器中打印 6 个空格。
@cherrot 不是，some-volume: 之前有 6 个缩进，其中破折号偏移为 4（即块序列缩进）。这当然是你的计数方式，但是像- a 这样的序列元素被计算为缩进 2，偏移量为 0。也就是说，some-volume 的 s 比 volumes 的 v 远 6 列，这算作 6 个缩进
@cherrot 这不是我想出来的，它是 PyYAML 对不计破折号的映射和序列只有一个“缩进”控件的结果。我曾经考虑将其拆分为 ruamel.yaml 的两个参数，但遇到了多个问题。添加block-sequence-indent 是我目前能做的最好的事情。
我明白了。感谢您的解释@anthon！

【解决方案2】：

添加受@Anthon 回答启发的uncomment_block 函数，以及comment_block 的一些增强功能：

from copy import deepcopy
from ruamel.yaml import round_trip_dump, round_trip_load
from ruamel.yaml.error import StreamMark
from ruamel.yaml.tokens import CommentToken


def comment_block(root, key_hierarchy_list, indent, seq_indent):
    found = False
    comment_key = key_hierarchy_list[-1]
    parent = root
    for ki in key_hierarchy_list[:-1]:
        parent = parent[ki]
    # don't just pop the value for key_hierarchy_list[-1] that way you lose comments
    # in the original YAML, instead deepcopy and delete what is not needed
    block_2b_commented = deepcopy(parent)
    previous_key = None
    for key in parent.keys():
        if key == comment_key:
            found = True
        else:
            if not found:
                previous_key = key
            del block_2b_commented[key]

    # now delete the key and its value, but preserve its preceding comments
    preceding_comments = parent.ca.items.get(comment_key, [None, None, None, None])[1]
    del parent[comment_key]

    if previous_key is None:
        if parent.ca.comment is None:
            parent.ca.comment = [None, []]
        comment_list = parent.ca.comment[1]
    else:
        comment_list = parent[previous_key].ca.end = []
        parent[previous_key].ca.comment = [None, None]

    if preceding_comments is not None:
        comment_list.extend(preceding_comments)

    # startmark can be the same for all lines, only column attribute is used
    start_mark = StreamMark(None, None, None, indent * (len(key_hierarchy_list) - 1))
    skip = True
    for line in round_trip_dump(block_2b_commented, indent=indent, block_seq_indent=seq_indent).splitlines(True):
        if skip:
            if not line.startswith(comment_key + ':'):
                continue
            skip = False
        comment_list.append(CommentToken('#' + line, start_mark, None))

    return False


def uncomment_block(root, key_hierarchy_list, indent, seq_indent):
    '''
    FIXME: comments may be attached to the parent's neighbour
    in document like the following. (srv2 block is attached by volumes, not servies, not srv1).
    version: '2'
       services:
           srv1: foobar
           #srv2:
           #    image: alpine
           #    container_name: srv2
           #    volumes_from:
           #        - some-volume
       volumes:
           some-volume:
    '''
    found = False
    parent = root
    commented_key = key_hierarchy_list[-1]
    comment_indent = indent * (len(key_hierarchy_list) - 1)
    for ki in key_hierarchy_list[:-1]:
        parent = parent[ki]

    if parent.ca.comment is not None:
        comment_list = parent.ca.comment[1]
        found, start, stop = _locate_comment_boundary(comment_list, commented_key, comment_indent)

    if not found:
        for key in parent.keys():
            bro = parent[key]
            while hasattr(bro, 'keys') and bro.keys():
                bro = bro[bro.keys()[-1]]

            if not hasattr(bro, 'ca'):
                continue

            comment_list = bro.ca.end
            found, start, stop = _locate_comment_boundary(comment_list, commented_key, comment_indent)

    if found:
        block_str = u''
        commented = comment_list[start:stop]
        for ctoken in commented:
            block_str += ctoken.value.replace('#', '', 1)
        del(comment_list[start:stop])

        block = round_trip_load(block_str)
        parent.update(block)
    return found


def _locate_comment_boundary(comment_list, commented_key, comment_indent):
    found = False
    start_idx = 0
    stop_idx = len(comment_list)
    for idx, ctoken in enumerate(comment_list):
        if not found:
            if ctoken.start_mark.column == comment_indent\
                    and ctoken.value.replace('#', '', 1).startswith(commented_key):
                found = True
                start_idx = idx
        elif ctoken.start_mark.column != comment_indent:
            stop_idx = idx
            break
    return found, start_idx, stop_idx


if __name__ == "__main__":
    import sys
    from ruamel.yaml.util import load_yaml_guess_indent

    yaml_str = """\
version: '2'
services:
    # 1 indent after services
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume
        # some comments
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
        # 2 indent after srv2 volume
# 0 indent before volumes
volumes:
    some-volume:
"""

    for srv in ['srv1', 'srv2']:
        # Comment a service block
        yml, indent, block_seq_indent = load_yaml_guess_indent(yaml_str)
        comment_block(yml, ['services', srv], indent=indent, seq_indent=block_seq_indent)
        commented = round_trip_dump(
            yml, indent=indent, block_seq_indent=block_seq_indent, explicit_end=True,
        )
        print(commented)

        # Now uncomment it
        yml, indent, block_seq_indent = load_yaml_guess_indent(commented)
        uncomment_block(yml, ['services', srv], indent=indent, seq_indent=block_seq_indent)

        round_trip_dump(
            yml, sys.stdout, indent=indent, block_seq_indent=block_seq_indent, explicit_end=True,
        )

输出：

version: '2'
services:
    # 1 indent after services
    #srv1:
    #    image: alpine
    #    container_name: srv1
    #    volumes:
    #      - some-volume
    #        # some comments
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
        # 2 indent after srv2 volume
# 0 indent before volumes
volumes:
    some-volume:
...

version: '2'
services:
    # 1 indent after services
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
        # 2 indent after srv2 volume
# 0 indent before volumes
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume
        # some comments
volumes:
    some-volume:
...
version: '2'
services:
    # 1 indent after services
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume
        # some comments
    #srv2:
    #    image: alpine
    #    container_name: srv2      # second container
    #    volumes_from:
    #      - some-volume
    #        # 2 indent after srv2 volume
    ## 0 indent before volumes
volumes:
    some-volume:
...

version: '2'
services:
    # 1 indent after services
    srv1:
        image: alpine
        container_name: srv1
        volumes:
          - some-volume
        # some comments
    srv2:
        image: alpine
        container_name: srv2  # second container
        volumes_from:
          - some-volume
        # 2 indent after srv2 volume
# 0 indent before volumes
volumes:
    some-volume:
...

【讨论】：