将分层数据框转换为嵌套的字典列表 Python答案

【问题标题】：Transform a Hierarchical Dataframe into a nested list of dictionaries Python将分层数据框转换为嵌套的字典列表 Python
【发布时间】：2020-09-23 23:45:31
【问题描述】：

我的数据可以下载here，看起来是这样的。

我的目标是建立一个以州、市和县为节点的网络，按人口规模划分。这将是应用程序的一部分，因此节点级别的选择将是动态的，可以是州、市和县的任意组合。 Here 是我想要实现的可视化。数据需要如下所示：

[{name: "state1",
  children:[{name: "county1",
             children:[{name: "city1",
                        population: "13000"
                       },
                       {name: "city2",
                        population: "10000"
                       },
            {name: "county2",
             children:[{name: "city1",
                        population: "1000"
                       },
                       {name: "city2",
                        population: "100000"
                       }]
            }]
},{name: "state2",
  children:[{name: "county1",
             children:[{name: "city1",
                        population: "13000"
                       },
                       {name: "city2",
                        population: "10000"
                       },
            {name: "county2",
             children:[{name: "city1",
                        population: "1000"
                       },
                       {name: "city2",
                        population: "100000"
                       }]
            }]
}]

这是我迄今为止尝试过的。

import pandas as pd
from benedict import benedict

# read in the data
df = pd.read_csv("C:\\Users\\m316375\\Downloads\\uscities.csv")

# Using Benedict to create a nested list
df_benedict = df[["state_name","city", "county_name", "population"]]
node_id = ["state_name", "county_name","city"]
df_benedict['dict_path'] = df[node_id].astype(str).apply('_'.join, axis=1)

d = benedict()
d.keypath_separator = '_'

for row in df_benedict.iterrows():
    dict_path = row[1]["dict_path"]
    d[dict_path] = row[1]["population"]

##### First Attempt ########
#looping through the nested dictionary
state_children = []
city_children = []
county_children = []
full_children = []
dict_list = []
counter = 0
for state, v0 in d.items():
    #print(f"state={state}, population={v0})")
    for city, v1 in v0.items():
        for county, v2 in v1.items():
            county_children.append({"name": city,
                                  "value": v2})
            counter += 1
            # print(counter)
            if counter > len(v1.items()):
                city_children.append({"name": county,
                                        "children": county_children})
                county_children = []
                counter = 0
                state_children = [{"name": city,
                                "children": city_children}]
                dict_list.append({"name": state,
                                  "children": state_children})

问题：我的方法不是动态的。如果我只想显示州和城市，则需要删除其中一个 for 循环。不理想。

【问题讨论】：

请修复您共享的数据（给出错误：unexpected EOF while parsing）并发布您的预期输出
@sammywemmy，我有下载数据的链接。见问题的第一句话。输出是字典的嵌套列表，如第一个代码块所示。希望对您有所帮助。

标签： python pandas list dictionary networkx

【解决方案1】：

我想我得到了你需要的东西，虽然它有点笨拙。鉴于您提供的链接中的数据被加载到数据框df，代码如下：

首先，groupby，将州、市和县移动到多索引，并将人口作为唯一列：

df_gr = df.groupby(['state_name', 'county_name', 'city']).sum()['population']

然后我们可以使用字典理解构建所需的字典：

resulting_dict = {level0: {level1: {level2: df_gr.xs([level0, level1, level2]) for level2 in df_gr.xs([level0, level1]).reset_index().groupby(['city']).sum().index} for level1 in df_gr.xs([level0]).reset_index().groupby(['county_name', 'city']).sum().index.levels[0]} for level0 in df_gr.index.levels[0]}

基本上，我们使用 .xs() 返回所需级别的数据帧的横截面。我们还确保我们不会遍历不存在的关卡组合。 .reset_index() 后跟 .groupby() 用于获取横截面的索引而不是整个数据框（因为在 .xs() 之后使用 .index.levels 返回整个数据框的级别，我不知道有什么更简单的方法使其仅返回横截面的索引）。

您可以将字典理解调整为所需的输出格式。

【讨论】：

我已经有一个嵌套字典。我需要它在字典的嵌套列表中，如第一个代码块所示。