将数据从 csv 转换为 dict 列表答案

【问题标题】：Convert data from csv to list of dict将数据从 csv 转换为 dict 列表
【发布时间】：2021-11-15 00:08:51
【问题描述】：

我需要将具有大量记录（大约 3 千条）的 csv 格式的数据转换为对象/字典列表。我从 Pandas 开始，但现在，我不确定这是一个不错的选择。文件包含 5 列。 csv文件的结构如下：

readTimestamp   school_subject  graduate    full_name   term
1611658200000   mathematics 3   Edd Ston    2
1611658200000   physics 5   Edd Ston    2
1611658200000   foreign language    5   Edd Ston    2
1611658200000   geography   4   Edd Ston    2
1611658200000   history 3   Edd Ston    2
1611658200000   Informatics 4   Kate Slow   1
1611658200000   chemistry   5   Kate Slow   1
1611658200000   mathematics 5   Kate Slow   1
1611658200000   foreign language    5   Kate Slow   1

我需要接收以下结构：

[
  {
    "readTimestamp": 123123123,
    "full_name": "Edd Ston",
    "term": 2,
    "schools_subject": [
      {
        "mathematics": 3,
        "phisics": 5,
        "foreign language": 5,
        "geography": 4,
        "history": 3
      }
    ]
  },
  {
    "readTimestamp": 345345345,
    "full_name": "Kate Slow",
    "term": 1,
    "schools_subject": [
      {
        "Informatics": 4,
        "chemistry": 3,
        "mathematics": 5,
        "foreign language": 5
      }
    ]
  }
]

到目前为止我收到了：

df = df.groupby(['readTimestamp','full_name','term']).apply(lambda x: x[['school_subject', 'graduate']].to_dict(orient='records')).to_dict()    


{(1611658200000, 'Edd Ston', 2): [{'school_subject': 'mathematics', 'graduate': 3}, {'school_subject': 'physics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}, {'school_subject': 'geography', 'graduate': 4}, {'school_subject': 'history', 'graduate': 3}], (1611658200000, 'Kate Slow', 1): [{'school_subject': 'Informatics', 'graduate': 4}, {'school_subject': 'chemistry', 'graduate': 5}, {'school_subject': 'mathematics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}]}

感谢您的帮助并解释我在哪里犯了错误

【问题讨论】：

标签： python pandas csv

【解决方案1】：

我认为您的解决方案可能变化不大 - 每个组的 crete 字典，然后使用 orient='records' 转换为 dict：

d = (df.groupby(['readTimestamp','full_name','term'])
       .apply(lambda x: x.set_index('school_subject')['graduate'].to_dict())
       .reset_index(name='schools_subject')
       .to_dict(orient='records'))

print (d)

[{
    'readTimestamp': 1611658200000,
    'full_name': 'Edd Ston',
    'term': 2,
    'schools_subject': {
        'mathematics': 3,
        'physics': 5,
        'foreign language': 5,
        'geography': 4,
        'history': 3
    }
}, {
    'readTimestamp': 1611658200000,
    'full_name': 'Kate Slow',
    'term': 1,
    'schools_subject': {
        'Informatics': 4,
        'chemistry': 5,
        'mathematics': 5,
        'foreign language': 5
    }
}]

【讨论】：

感谢您的帮助。您能解释一下为什么需要为 school_subject - (set_index) 添加索引吗？
@Sebastian - 因为school_subject 的字典中有键所以转换为索引，选择graduate，所以从系列x.set_index('school_subject')['graduate'] 由school_subject 的键和school_subject 的值创建字典987654328@每组
这次如此接近，如此遥远。感谢您的帮助和解释。