【发布时间】:2021-11-15 00:08:51
【问题描述】:
我需要将具有大量记录(大约 3 千条)的 csv 格式的数据转换为对象/字典列表。我从 Pandas 开始,但现在,我不确定这是一个不错的选择。文件包含 5 列。 csv文件的结构如下:
readTimestamp school_subject graduate full_name term
1611658200000 mathematics 3 Edd Ston 2
1611658200000 physics 5 Edd Ston 2
1611658200000 foreign language 5 Edd Ston 2
1611658200000 geography 4 Edd Ston 2
1611658200000 history 3 Edd Ston 2
1611658200000 Informatics 4 Kate Slow 1
1611658200000 chemistry 5 Kate Slow 1
1611658200000 mathematics 5 Kate Slow 1
1611658200000 foreign language 5 Kate Slow 1
我需要接收以下结构:
[
{
"readTimestamp": 123123123,
"full_name": "Edd Ston",
"term": 2,
"schools_subject": [
{
"mathematics": 3,
"phisics": 5,
"foreign language": 5,
"geography": 4,
"history": 3
}
]
},
{
"readTimestamp": 345345345,
"full_name": "Kate Slow",
"term": 1,
"schools_subject": [
{
"Informatics": 4,
"chemistry": 3,
"mathematics": 5,
"foreign language": 5
}
]
}
]
到目前为止我收到了:
df = df.groupby(['readTimestamp','full_name','term']).apply(lambda x: x[['school_subject', 'graduate']].to_dict(orient='records')).to_dict()
{(1611658200000, 'Edd Ston', 2): [{'school_subject': 'mathematics', 'graduate': 3}, {'school_subject': 'physics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}, {'school_subject': 'geography', 'graduate': 4}, {'school_subject': 'history', 'graduate': 3}], (1611658200000, 'Kate Slow', 1): [{'school_subject': 'Informatics', 'graduate': 4}, {'school_subject': 'chemistry', 'graduate': 5}, {'school_subject': 'mathematics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}]}
感谢您的帮助并解释我在哪里犯了错误
【问题讨论】: