如何解析“嵌套”的 json 文件并导入 Pandas 数据框？答案

【问题标题】：How to parse 'nested' json files and import into a Pandas dataframe?如何解析“嵌套”的 json 文件并导入 Pandas 数据框？
【发布时间】：2021-10-29 21:19:41
【问题描述】：

我正在尝试使用 json 规范化方法。它给了我这个解决方案。

json = {"vehicle_type":"Car","car_info":{"count":3,"infos":[{"car":"BMW","name":"5","description":"","production_time":"2017-07-16","tags":["car","BMW","sedan"],"references":[],"country":["germany"],"fuel_type":["electrical"],"color":"black","price":null,"technic":{"0-100":"-","horsepower":"165Nm"},"mileage":{"mile":14004}}]}}

df = json_normalize(json)

This is my pandas.dataframe

当我试图解析我的嵌套 json 时，某些列中还有更多的字典。 like that 使用该代码

df_car = pd.DataFrame()
empty_list = pd.DataFrame()
empty_list = [{}]


for i in range(len(df1)):
    if df1["car_info.infos"][i] == [{}]:
        df_car = df_car.append(empty_list,ignore_index = True)        
    else:
        car_info = (pd.DataFrame.from_dict(df1["car_info.infos"][i][0],orient='index'))
        car_info=car_info.transpose()
        df_car = df_car.append(car_info,ignore_index=True)
df2 = pd.concat([df1,df_car], axis = 1)
df2 = df2.drop(columns={"car_info.infos"})

我需要没有列名的代码来解析嵌套的 json，直到所有字典都不存在。我有多个 json 文件。如何实现自动化？

【问题讨论】：

标签： python json pandas dataframe parsing

【解决方案1】：

您可以将pd.json_normalize()与参数record_path=和meta=一起使用，如下所示：

df = pd.json_normalize(json, record_path=['car_info', 'infos'], meta=['vehicle_type', ['car_info', 'count']])

结果：

print(df)

   car name description production_time               tags references    country     fuel_type  color  price technic.0-100 technic.horsepower  mileage.mile vehicle_type car_info.count
0  BMW    5                  2017-07-16  [car, BMW, sedan]         []  [germany]  [electrical]  black    NaN             -              165Nm         14004          Car              3

【讨论】：

非常感谢您的回答。我将使用它，我还有一个问题。我如何在自动化的同时使用这些方法？
@AtakanAydınbaş 如果您的 json 文件结构相似，您可以将其自动化。但是如果它们不同，记录路径不同，我怀疑是否有任何其他方法可以自动化它。