【问题标题】:Convert data from csv to list of dict将数据从 csv 转换为 dict 列表
【发布时间】:2021-11-15 00:08:51
【问题描述】:

我需要将具有大量记录(大约 3 千条)的 csv 格式的数据转换为对象/字典列表。我从 Pandas 开始,但现在,我不确定这是一个不错的选择。文件包含 5 列。 csv文件的结构如下:

readTimestamp   school_subject  graduate    full_name   term
1611658200000   mathematics 3   Edd Ston    2
1611658200000   physics 5   Edd Ston    2
1611658200000   foreign language    5   Edd Ston    2
1611658200000   geography   4   Edd Ston    2
1611658200000   history 3   Edd Ston    2
1611658200000   Informatics 4   Kate Slow   1
1611658200000   chemistry   5   Kate Slow   1
1611658200000   mathematics 5   Kate Slow   1
1611658200000   foreign language    5   Kate Slow   1

我需要接收以下结构:

[
  {
    "readTimestamp": 123123123,
    "full_name": "Edd Ston",
    "term": 2,
    "schools_subject": [
      {
        "mathematics": 3,
        "phisics": 5,
        "foreign language": 5,
        "geography": 4,
        "history": 3
      }
    ]
  },
  {
    "readTimestamp": 345345345,
    "full_name": "Kate Slow",
    "term": 1,
    "schools_subject": [
      {
        "Informatics": 4,
        "chemistry": 3,
        "mathematics": 5,
        "foreign language": 5
      }
    ]
  }
]

到目前为止我收到了:

df = df.groupby(['readTimestamp','full_name','term']).apply(lambda x: x[['school_subject', 'graduate']].to_dict(orient='records')).to_dict()    


{(1611658200000, 'Edd Ston', 2): [{'school_subject': 'mathematics', 'graduate': 3}, {'school_subject': 'physics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}, {'school_subject': 'geography', 'graduate': 4}, {'school_subject': 'history', 'graduate': 3}], (1611658200000, 'Kate Slow', 1): [{'school_subject': 'Informatics', 'graduate': 4}, {'school_subject': 'chemistry', 'graduate': 5}, {'school_subject': 'mathematics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}]}

感谢您的帮助并解释我在哪里犯了错误

【问题讨论】:

    标签: python pandas csv


    【解决方案1】:

    我认为您的解决方案可能变化不大 - 每个组的 crete 字典,然后使用 orient='records' 转换为 dict:

    d = (df.groupby(['readTimestamp','full_name','term'])
           .apply(lambda x: x.set_index('school_subject')['graduate'].to_dict())
           .reset_index(name='schools_subject')
           .to_dict(orient='records'))
    

    print (d)
    
    [{
        'readTimestamp': 1611658200000,
        'full_name': 'Edd Ston',
        'term': 2,
        'schools_subject': {
            'mathematics': 3,
            'physics': 5,
            'foreign language': 5,
            'geography': 4,
            'history': 3
        }
    }, {
        'readTimestamp': 1611658200000,
        'full_name': 'Kate Slow',
        'term': 1,
        'schools_subject': {
            'Informatics': 4,
            'chemistry': 5,
            'mathematics': 5,
            'foreign language': 5
        }
    }]
    

    【讨论】:

    • 感谢您的帮助。您能解释一下为什么需要为 school_subject - (set_index) 添加索引吗?
    • @Sebastian - 因为school_subject 的字典中有键所以转换为索引,选择graduate,所以从系列x.set_index('school_subject')['graduate']school_subject 的键和school_subject 的值创建字典987654328@每组
    • 这次如此接近,如此遥远。感谢您的帮助和解释。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-08-25
    • 1970-01-01
    • 2018-10-14
    • 2020-02-09
    • 2016-12-02
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多