【问题标题】:Merge YAML files with overriding values in list elements将 YAML 文件与列表元素中的覆盖值合并
【发布时间】:2019-10-22 08:53:20
【问题描述】:

我想合并两个包含列表元素的 YAML 文件。 (A) 和 (B) 合并成一个新文件 (C)。

如果 (A) 中的列表条目也已在 (B) 中定义,我想覆盖它们的现有属性值。

如果未在 (A) 中定义但在 (B) 中定义,我想向列表条目添加新属性。

如果 (A) 中不存在,我还想添加 (B) 的新列表条目。

YAML 文件 A:

list:
  - id: 1
    name: "name-from-A"
  - id: 2
    name: "name-from-A"

YAML 文件 B:

list:
  - id: 1
    name: "name-from-B"
  - id: 2
    title: "title-from-B"
  - id: 3
    name: "name-from-B"
    title: "title-from-B"

合并后的 YAML 文件(C),我要生成:

list:
  - id: 1
    name: "name-from-B"
  - id: 2
    name: "name-from-A"
    title: "title-from-B"
  - id: 3
    name: "name-from-B"
    title: "title-from-B"

我需要在 Bash 脚本中使用此功能,但我可以在环境中使用 Python。

是否有任何独立的 YAML 处理器(如 yq)可以做到这一点?

如何在 Python 脚本中实现类似的功能?

【问题讨论】:

  • 到目前为止你尝试过什么?给我们看一些代码!

标签: python bash merge yaml


【解决方案1】:

你可以使用ruamel.yamlpython包来做。

如果您已经安装了 python,请在终端中运行以下命令:

pip install ruamel.yaml

python 代码改编来自here(经过测试,工作正常)

import ruamel.yaml
yaml = ruamel.yaml.YAML()

#Load the yaml files
with open('/test1.yaml') as fp:
    data = yaml.load(fp)
with open('/test2.yaml') as fp:
    data1 = yaml.load(fp)
# dict to contain merged ids
merged = dict()

#Add the 'list' from test1.yaml to test2.yaml 'list'
for i in data1['list']:
    for j in data['list']:
        # if same 'id'
        if i['id'] == j['id']:
            i.update(j)
            merged[i['id']] = True

# add new ids if there is some
for j in data['list']:
    if not merged.get(j['id'], False):
        data1['list'].append(j)

#create a new file with merged yaml
with open('/merged.yaml', 'w') as yaml_file:
    yaml.dump(data1, yaml_file)

【讨论】:

    【解决方案2】:

    您可以合并命令行传递的 yaml 文件:

    import sys
    import yaml
    
    def merge_dict(m_list, s):
        for m in m_list:
            if m['id'] == s['id']:
                m.update(**s)
                return
        m_list.append(s)
    
    merged_list = []
    for f in sys.argv[1:]:
        with open(f) as s:
            for source in yaml.safe_load(s)['list']:
                merge_dict(merged_list, source)
    
    print(yaml.dump({'list': merged_list}), end='')
    

    结果:

    list:
    - id: 1
      name: name-from-B
    - id: 2
      name: name-from-A
      title: title-from-B
    - id: 3
      name: name-from-B
      title: title-from-B
    

    【讨论】:

      【解决方案3】:

      根据答案(谢谢大家),我创建了一个解决方案,以一种相当通用的方式处理我需要 ATM 的所有合并功能(我需要在许多不同类型的 Kubernetes 描述符上使用它)。

      它基于 Ruamel。

      它处理多级列表,不仅通过索引合并列表元素,还通过适当的项目标识来管理。

      它比我希望的要复杂(它遍历 YAML 树)。

      脚本和核心方法:

      import ruamel.yaml
      from ruamel.yaml.comments import CommentedMap, CommentedSeq
      
      
      #
      # Merges a node from B with its pair in A
      #
      # If the node exists in both A and B, it will merge
      # all children in sync
      #
      # If the node only exists in A, it will do nothing.
      #
      # If the node only exists in B, it will add it to A and stops
      #
      # attrPath DOES NOT include attrName
      #
      def mergeAttribute(parentNodeA, nodeA, nodeB, attrName, attrPath):
      
          # If both is None, there is nothing to merge
          if (nodeA is None) and (nodeB is None):
              return
      
          # If NodeA is None but NodeB has value, we simply set it in A
          if (nodeA is None) and (parentNodeA is not None):
              parentNodeA[attrName] = nodeB
              return
      
          if attrPath == '':
              attrPath = attrName
          else:
              attrPath = attrPath + '.' + attrName
      
          if isinstance(nodeB, CommentedSeq):
      
              # The attribute is a list, we need to merge specially
              mergeList(nodeA, nodeB, attrPath)
      
          elif isinstance(nodeB, CommentedMap):
      
              # A simple object to be merged
              mergeObject(nodeA, nodeB, attrPath)
      
          else:
              # Primitive type, simply overwrites
              parentNodeA[attrName] = nodeB
      
      
      #
      # Lists object attributes and merges the attribute values if possible
      #
      def mergeObject(nodeA, nodeB, attrPath):
      
          for attrName in nodeB:
      
              subNodeA = None
              if attrName in nodeA:
                  subNodeA = nodeA[attrName]
      
              subNodeB = None
              if attrName in nodeB:
                  subNodeB = nodeB[attrName]
      
              mergeAttribute(nodeA, subNodeA, subNodeB, attrName, attrPath)
      
      
      #
      # Merges two lists by properly identifying each item in both lists
      # (using the merge-directives).
      #
      # If an item of listB is identified in listA, it will be merged onto the item
      # of listA
      #
      def mergeList(listA, listB, attrPath):
      
          # Iterating the list from B
          for itemInB in listB:
      
              itemInA = findItemInList(listA, itemInB, attrPath)
      
              if itemInA is None:
                  listA.append(itemInB)
                  continue
      
              # Present in both, we need to merge them
              mergeObject(itemInA, itemInB, attrPath)
      
      
      #
      # Finds an item in the list by using the appropriate ID field defined for that
      # attribute-path.
      #
      # If there is no id attribute defined for the list, it returns None
      #
      def findItemInList(listA, itemB, attrPath):
      
          if attrPath not in listsWithId:
              # No id field defined for the list, only "dumb" merging is possible
              return None
      
          # Finding out the name of the id attribute in the list items
          idAttrName = listsWithId[attrPath]
      
          idB = None
          if idAttrName is not None:
              idB = itemB[idAttrName]
      
          # Looking for the item by its ID
          for itemA in listA:
      
              idA = None
              if idAttrName is not None:
                  idA = itemA[idAttrName]
      
              if idA == idB:
                  return itemA
      
          return None
      
      # ------------------------------------------------------------------------------
      
      
      yaml = ruamel.yaml.YAML()
      
      # Load the merge directives
      with open('merge-directives.yaml') as fp:
          mergeDirectives = yaml.load(fp)
      
      listsWithId = mergeDirectives['lists-with-id']
      
      # Load the yaml files
      with open('a.yaml') as fp:
          dataA = yaml.load(fp)
      
      with open('b.yaml') as fp:
          dataB = yaml.load(fp)
      
      mergeObject(dataA, dataB, '')
      
      # create a new file with the merged yaml
      yaml.dump(dataA, file('c.yaml', 'w'))
      

      帮助程序配置文件 (merge-directives.yaml),用于指示识别(甚至是多级)列表中的元素。

      对于原始问题中的数据结构,只需要 'list: "id" ' 配置条目,但我添加了一些其他键来演示用法。

      #
      # Lists that contain identifiable elements.
      #
      # Each sub-key is a property path denoting the list element in the YAML 
      # data structure.
      #
      # The value is the name of the attribute in the list element that
      # identifies the list element so that pairing can be made.
      #
      lists-with-id:
          list: "id"
          list.sub-list: "id"
          a.listAttrShared: "name"
      

      尚未进行大量测试,但这里有两个测试文件比原始问题的测试更完整。

      a.yaml:

      a:
          attrShared: value-from-a
          listAttrShared:
              - name: a1
              - name: a2
          attrOfAOnly: value-from-a
      list:
          - id: 1
            name: "name-from-A"
            sub-list:
                - id: s1
                  name: "name-from-A"
                  comments: "doesn't exist in B, so left untouched"
                - id: s2
                  name: "name-from-A"
            sub-list-with-no-identification:
                - "comment 1"
                - "comment 2"
          - id: 2
            name: "name-from-A"
      
      

      b.yaml:

      a:
          attrShared: value-from-b
          listAttrShared:
              - name: b1
              - name: b2
          attrOfBOnly: value-from-b
      list:
          - id: 1
            name: "name-from-B"
            sub-list:
                - id: s2
                  name: "name-from-B"
                  title: "title-from-B"
                  comments: "overwrites name in A with name in B + adds title from B"
                - id: s3
                  name: "name-from-B"
                  comments: "only exists in B so added to A's list"
            sub-list-with-no-identification:
                - "comment 3"
                - "comment 4"
          - id: 2
            title: "title-from-B"
          - id: 3
            name: "name-from-B"
            title: "title-from-B"
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-06-15
        • 1970-01-01
        • 2016-06-03
        • 2018-04-11
        • 1970-01-01
        • 1970-01-01
        • 2010-11-19
        • 2020-12-29
        相关资源
        最近更新 更多