【问题标题】:flattened dictionary into nested dictionary of dictionaries of lists将字典扁平化为列表字典的嵌套字典
【发布时间】:2020-02-29 05:04:49
【问题描述】:

所以我似乎无法弄清楚如何有效地实现这一点。我希望根据特定键作为输入,将扁平字典嵌套到列表字典字典中。拼命学习

鉴于我的数据如下所示:

data= [
  {
    "player": "Kevin Durant",
    "team": "Thunder",
    "location": "Oklahoma City",
    "points": 15

  },
  {
    "player": "Jeremy Lin",
    "team": "Lakers",
    "location": "Los Angeles",
    "points": 22
  },
  {
    "player": "Kobe Bryant",
    "team": "Lakers",
    "location": "Los Angeles",
    "points": 51
  },
  {
    "player": "Blake Griffin",
    "team": "Clippers",
    "location": "Los Angeles",
    "points": 26
  }
]

如果我给它提供 reorder(data,['location','team','player']) 之类的参数,例如,我想返回类似的东西

result={
  "Los Angeles": {
    "Clippers": {
      "Blake Griffin": [
        {
          "points": 26
        }
      ]
    },
    "Lakers": {
      "Kobe Bryant": [
        {
          "points": 51
        }
      ],
      "Jeremy Lin": [
        {
          "points": 22
        }
      ]
    }
  },
  "Oklahoma City": {
    "Thunder": {
      "Kevin Durant": [
        {
          "points": 15
        }
      ]
    }
  }, 
}

【问题讨论】:

    标签: python json dictionary


    【解决方案1】:

    您可以使用 setdefault 函数在您浏览数据时自动构建嵌套级别:

    data= [
      {
        "player": "Kevin Durant",
        "team": "Thunder",
        "location": "Oklahoma City",
        "points": 15
    
      },
      {
        "player": "Jeremy Lin",
        "team": "Lakers",
        "location": "Los Angeles",
        "points": 22
      },
      {
        "player": "Kobe Bryant",
        "team": "Lakers",
        "location": "Los Angeles",
        "points": 51
      },
      {
        "player": "Blake Griffin",
        "team": "Clippers",
        "location": "Los Angeles",
        "points": 26
      }
    ]
    
    nested = dict()
    for d in data:
        nested.setdefault(d["location"],dict()) \
              .setdefault(d["team"],    dict()) \
              .setdefault(d["player"],  list()) \
              .append({"points":d["points"]})
    

    输出:

    print(nested)
    
    {  'Oklahoma City': 
        {  
           'Thunder': 
               {  'Kevin Durant': [{'points': 15}] }
        }, 
        'Los Angeles': 
        { 
           'Lakers': 
               {  
                  'Jeremy Lin': [{'points': 22}], 
                  'Kobe Bryant': [{'points': 51}]
               }, 
           'Clippers': 
               {  'Blake Griffin': [{'points': 26}] }
         }
      }
    

    [编辑] 概括方法

    如果你必须经常在不同类型的字典或层次结构上做这种事情,你可以在一个函数中概括它:

    def dictNesting(data,*levels):
        result = dict()
        for d in data:
            r = result
            for level in levels[:-1]:
                r = r.setdefault(d[level],dict())
            r = r.setdefault(d[levels[-1]],list())
            r.append({k:v for k,v in d.items() if k not in levels})
        return result
    

    然后你会给函数一个字典列表,后面跟着你想要嵌套的键的名称:

    byLocation = dictNesting(data,"location","team")
    
    {  'Oklahoma City':
           {  'Thunder': [
                  {'player': 'Kevin Durant', 'points': 15}]
           },
       'Los Angeles':
           {'Lakers': [
                  {'player': 'Jeremy Lin', 'points': 22},
                  {'player': 'Kobe Bryant', 'points': 51}],
            'Clippers': [
                  {'player': 'Blake Griffin', 'points': 26}]
           }
    }
    

    如果你想以不同的方式对相同的数据进行分组,你只需要改变字段名称的顺序:

    byPlayer = dictNesting(data,"player","location","team")
    
    
    {  'Kevin Durant':
           {  'Oklahoma City':
                  {  'Thunder': [{'points': 15}] }
           },
       'Jeremy Lin':
           {  'Los Angeles':
                  {'Lakers': [{'points': 22}]}
           },
       'Kobe Bryant':
           {  'Los Angeles':
                  {'Lakers': [{'points': 51}]}
           },
       'Blake Griffin':
           {  'Los Angeles':
                  {'Clippers': [{'points': 26}]}
           }
    }
    

    您可以从那里获得一些乐趣,并改进它以在最低嵌套级别聚合数据:

    def dictNesting(data,*levels,aggregate=False):
        result = dict()
        for d in data:
            r = result
            for level in levels[:-1]:
                r = r.setdefault(d[level],dict())
            r = r.setdefault(d[levels[-1]],[list,dict][aggregate]())
            content = ( (k,v) for k,v in d.items() if k not in levels)
            if aggregate:
                for k,v in content: r.setdefault(k,list()).append(v)
            else:
                r.append(dict(content))
        return result
    

    输出:

    byCity = dictNesting(data,"location","team",aggregate=True)
    
    {  'Oklahoma City':
            {'Thunder':
                 {'player': ['Kevin Durant'], 'points': [15]}},
       'Los Angeles':
            {'Lakers':
                 {'player': ['Jeremy Lin', 'Kobe Bryant'], 'points': [22, 51]},
             'Clippers':
                 {'player': ['Blake Griffin'], 'points': [26]}
            }
    }
    
    lakersPlayers = byCity["Los Angeles"]["Lakers"]["player"] 
    # ['Jeremy Lin', 'Kobe Bryant']
    
    lakersPoints  = sum(byCity["Los Angeles"]["Lakers"]["points"]) 
    # 73
    

    【讨论】:

    • 使用{}[] 代替dict()list()。它更快更简单。
    • 我发现 dict 和 list 比仅相差几个像素的神秘符号更能传达每个级别的意图和性质。在这种类型的数据处理中,性能差异(如果有的话)将是微不足道的。
    • 哦,这很有趣!因此,如果我想说创建一个可以接收 n 个嵌套级别键的函数,我该怎么做呢?所以,如果我只想说只使用两个嵌套,像这样 reorder(data,['location','team',]) 每个里面的字典都有玩家和点。这是一个巨大的帮助,你让我大开眼界
    猜你喜欢
    • 2019-02-04
    • 2018-11-09
    • 1970-01-01
    • 2011-08-27
    • 1970-01-01
    • 2016-06-16
    • 2021-08-09
    相关资源
    最近更新 更多