如何解析熊猫数据框中的 JSON 列并将新数据框连接到原始数据框？答案

【问题标题】：How to parse JSON column in pandas dataframe and concat the new dataframe to the original one?如何解析熊猫数据框中的 JSON 列并将新数据框连接到原始数据框？
【发布时间】：2021-05-02 15:45:27
【问题描述】：

我有以下 df 样本：

{'id_user': {0: -8884522802746938515,
  1: -8884522802746938515,
  2: -8884522802746938515},
 'time': {0: '2021-01-01 11:10:34',
  1: '2021-01-01 11:11:48',
  2: '2021-01-01 11:12:38'},
 'data': {0: '{"fat": 4, "type": "FOOD_GENERAL", "unit": "1 mug (8 fl oz)", "title": "Cappuccino", "amount": 1.0, "protein": 4, "calories": 74, "foodType": 4, "recipeId": 7350, "servings": 1.0, "timestamp": "1609499434205", "ingredient": true, "carbohydrates": 6, "nutrientsData": {"iron": 0.19, "fiber": 0.2, "sugar": 6.41, "sodium": 50.0, "calcium": 144.0, "protein": 4.08, "fatTotal": 3.98, "vitaminA": 34.0, "potassium": 233.0, "cholesterol": 12.0, "fatSaturated": 2.273, "carbohydrates": 5.81, "energyConsumed": 74.0, "fatMonounsaturated": 1.007, "fatPolyunsaturated": 0.241}}',
  1: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "mealIndex": 2, "timestamp": "1609499508328", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}',
  2: '{"fat": 1, "type": "FOOD_BRANDED", "unit": "1/2 cup prepared", "title": "Stove Top Stuffing Mix For Turkey (Kraft)", "amount": 1.0, "protein": 3, "calories": 110, "foodType": 5, "recipeId": 4072396, "servings": 1.0, "timestamp": "1609499558606", "ingredient": true, "carbohydrates": 21, "nutrientsData": {"iron": 1.3, "fiber": 1.0, "sugar": 2.0, "sodium": 370.0, "protein": 3.0, "fatTotal": 1.0, "potassium": 100.0, "carbohydrates": 21.0, "energyConsumed": 110.0}}'}}

我正在对数据列执行以下操作：

pd.json_normalize(df.data.apply(json_loads))

结果我得到了我需要的东西，但我希望它粘在原始 df 上。我应该只合并索引上的数据框吗？是否有另一种方法可以一行或一次完成？

【问题讨论】：

json_loads 是什么？你从 pd.json_normalize 得到什么？
@QuangHoang json_loads 解析数据列中的 JSON，json_normalize 为每个解析的 JSON 创建一行，最终结果是一个数据帧。我希望这个数据框被“粘合”到我原来的 df 上。我可以合并索引，但也许有更简单的解决方案。

标签： json python-3.x pandas merge

【解决方案1】：

df 中的data 列应先从 json 转换为 dict。

然后使用：

方法 1。 df 转换为 dict 时使用pd.json_normalize
方法2。将df['data'] 转换为数据框，并合并到原始df。

df['data'] = df['data'].map(json.loads)

# method1
dfn = pd.json_normalize(df.to_dict(orient='records'))

# method2
obj = df['data']
dfn = df.merge(pd.DataFrame(obj.tolist(), index = obj.index),
               left_index=True,
               right_index=True)

【讨论】：

我使用了类似于方法 2 的方法。我已经使用 json.load 将 JSON 列转换为数据帧，并应用并合并两个数据帧的索引。