【问题标题】:Flatten nested dictionary to key and joined string value将嵌套字典展平为键和连接的字符串值
【发布时间】:2019-10-25 11:09:07
【问题描述】:

我需要一个函数来扁平化嵌套字典,格式如下:

dict_test = {
    "id" : "5d4c2c0fd89234260ec81",
    "Reference Number" : "JA-L800D-191",
    "entities_discovered" : {
        "OTHER_ID" : [ 
            "L800DFAG02191"
        ],
        "CODE_ID" : [ 
            "160472708",
            "276954773"
        ]
    },
    "label_field" : [ 
        "ELECTRONICS",
        "HDMI"
    ],
    "numeric_field" : [ 
        491, 
        492
    ],

}

我正在使用的函数根据需要将字典展平为一维(键:值),但不会在同一键迭代中加入值。

def flatten(d):
    agg = {}
    def _flatten(d, prev_key=''):
        if isinstance(d, list):
            for i, item in enumerate(d):
                new_k = '%s.%s' % (prev_key, i) if prev_key else i
                _flatten(item, prev_key=new_k)
        elif isinstance(d, dict):
            for k, v in d.items():
                new_k = '%s.%s' % (prev_key, k) if prev_key else k
                _flatten(v, prev_key=new_k)
        else:
            agg[prev_key] = d

    _flatten(d)
    return agg

我目前的输出是:

{
    "id" : "5d4c2c0fd89234260ec81",
    "Reference Number" : "JA-L800D-191",
    "entities_discovered.OTHER_ID.0" : "L800DFAG02191",
    "entities_discovered.CODE_ID.0" : "160472708",
    "entities_discovered.CODE_ID.1" : "276954773",
    "label_field.0" : "ELECTRONICS",
    "label_field.1" : "HDMI",
    "numeric_field.0" : 491, 
    "numeric_field.1" : 492
}

但实际上我正在寻找类似的东西(将值加入同一个字符串并用 , 或 | 分隔):

{
    "id" : "5d4c2c0fd89234260ec81",
    "Reference Number" : "JA-L800D-191",
    "OTHER_ID" : "L800DFAG02191",
    "CODE_ID" : "160472708, 276954773",
    "label_field" : "ELECTRONICS, HDMI",
    "numeric_field" : ¨491, 492¨
}

【问题讨论】:

  • 如果您的list 不包含任何其他dictlist 项目,那么您可以将if isinstance(d, list) 分支内的代码更改为:agg[prev_key] : ', '.join([str(i) for i in d])

标签: python python-3.x dictionary


【解决方案1】:

您可以使用join() 内置方法将值连接在一起。

def do():
    dict_test = {
        "id": "5d4c2c0fd89234260ec81",
        "Reference Number": "JA-L800D-191",
        "entities_discovered": {
            "OTHER_ID": [
                "L800DFAG02191"
            ],
            "CODE_ID": [
                "160472708",
                "276954773"
            ]
        },
        "label_field": [
            "ELECTRONICS",
            "HDMI"
        ],
        "numeric_field": [
            491,
            492
        ],
    }

    new_dict = {}
    for key, value in dict_test.items():
        if isinstance(value, dict):
            for _key, _value in value.items():
                if isinstance(_value, list):
                    new_dict.update({_key: ', '.join([str(item) for item in _value])})

        elif isinstance(value, list):
            new_dict.update({key: ', '.join([str(item) for item in value])})

        else:
            new_dict.update({key: value})

    return new_dict


if __name__ == '__main__':
    print(do())

输出:

{
    'id': '5d4c2c0fd89234260ec81',
    'Reference Number': 'JA-L800D-191',
    'OTHER_ID': 'L800DFAG02191',
    'CODE_ID': '160472708, 276954773',
    'label_field': 'ELECTRONICS, HDMI',
    'numeric_field': '491, 492'
}

【讨论】:

  • 我觉得这太长了!
  • 感谢您的反馈,我只是写得很快,我会尽快编辑它。?
【解决方案2】:
def recursive_flatten_dict(tmp, dict_test):
    for i,v in dict_test.items():
        if type(v) == type({}):
            recursive_flatten_dict(tmp,v)
        else:
            tmp[i] = v
    return tmp

recursive_flatten_dict({},dict_test)

【讨论】:

    【解决方案3】:

    使用生成器的简单递归:

    def flatten(d):
       for a, b in d.items():
         if isinstance(b, dict):
            yield from flatten(b)
         else:
            yield (a, b if not isinstance(b, list) else ', '.join(map(str, b)))
    
    
    print(dict(flatten(dict_test)))
    

    输出:

    {
     'id': '5d4c2c0fd89234260ec81', 
     'Reference Number': 'JA-L800D-191', 
     'OTHER_ID': 'L800DFAG02191', 
     'CODE_ID': '160472708, 276954773', 
     'label_field': 'ELECTRONICS, HDMI', 
     'numeric_field': '491, 492'
    }
    

    【讨论】:

      【解决方案4】:
      def flatten(dict_test): 
          for key in ['label_field', 'numeric_field']: 
              dict_test[key]= ', '.join([str(c) for c in dict_test[key]])
      
          for c in dict_test['entities_discovered'].keys(): 
              dict_test[c]= ', '.join(dict_test['entities_discovered'][c])
      
          return dict_test
      

      上面的函数完成了这项工作。我希望这是你在找什么?

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-07-12
        • 2020-12-24
        • 2014-03-29
        • 2021-03-14
        • 1970-01-01
        • 1970-01-01
        • 2019-11-14
        • 1970-01-01
        相关资源
        最近更新 更多