【发布时间】:2021-02-14 13:52:26
【问题描述】:
我有一些 csv 数据需要转换为特定的 json 格式。 我编写了一个适用于某些嵌套级别但不是必需的代码
这是我的 csv 数据:
title context answers question id
tit1 con1 text1 que1 id1
tit1 con1 text2 que2 id2
tit2 con2 text3 que3 id3
tit2 con2 text4 que4 id4
tit2 con3 text5 que5 id5
我的代码:
df = pd.read_csv('processedOutput.csv')
finalList = []
finalDict = {}
grouped = df.groupby(['context'])
for key, value in grouped:
dictionary = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['context'] = j.at[0, 'context']
dictList = []
anotherDict = {}
for i in j.index:
anotherDict['answers'] = j.at[i, 'answers']
anotherDict['question'] = j.at[i, 'question']
anotherDict['id'] = j.at[i, 'id']
dictList.append(anotherDict)
dictionary['qas'] = dictList
finalList.append(dictionary)
import json
data = json.dumps(finalList)
其输出结构很好,但只取分组项的最后一个元素
[{"context": "con1",
"qas": [
{"answers": "text2", "question": "que2", "id": "id2"},
{"answers": "text2", "question": "que2", "id": "id2"}
]
},
{"context": "con2",
"qas": [
{"answers": "text4", "question": "que4", "id": "id4"},
{"answers": "text4", "question": "que4", "id": "id4"}
]
},
{"context": "con3",
"qas": [
{"answers": "text5", "question": "que5", "id": "id5"}
]
}
]
想让数据再嵌套一层,所有字段如下:
[
{
"title": "tit1",
"paragraph": [
{
"context": "con1",
"qas": [
{"answers": "text1","question": "que1","id": "id1"},
{"answers": "text2","question": "que2","id": "id2"}
]}]
},
{
"title": "tit2",
"paragraph": [
{
"context": "con2",
"qas": [
{"answers": "text3","question": "que3","id": "id3"},
{"answers": "text4","question": "que4","id": "id4"}
],
"context": "con3",
"qas": [
{"answers": "text5","question":"que5", "id": "id5"}
]
}
]
}
]
坚持了很长时间,任何建议都会很棒
【问题讨论】:
-
tl;博士。但是您尝试过
df.to_json方法吗? pandas.pydata.org/pandas-docs/stable/reference/api/… -
那不会根据需要以分组嵌套格式给我 json
标签: python json dataframe csv nested