【问题标题】:python trasform data from csv to array of dictionaries and group by field valuepython将数据从csv转换为字典数组并按字段值分组
【发布时间】:2020-05-10 23:01:15
【问题描述】:

我有这样的 csv:

id,company_name,country,country_id
1,batstop,usa, xx
2,biorice,italy, yy
1,batstop,italy, yy
3,legstart,canada, zz

我想要将一组字典导入到 firebase。我需要将同一公司的不同国家信息分组到嵌套的字典列表中。这是所需的输出:

[ {'id':'1', 'agency_name':'batstop', countries [{'country':'usa','country_id':'xx'}, {'country':'italy','country_id':'yy'}]} ,         
 {'id':'2', 'agency_name':'biorice', countries [{'country':'italy','country_id':'yy'}]},
 {'id':'3', 'legstart':'legstart', countries [{'country':'canada','country_id':'zz'}]} ]

【问题讨论】:

    标签: python arrays firebase csv dictionary


    【解决方案1】:

    最近我有一个类似的任务,来自itertoolsgroupby 函数和来自operatoritemgetter 函数——这两个标准python 库——对我帮助很大。这是考虑到您的 csv 的代码,请注意定义 csv 数据集的主键是多么重要。

    import csv
    import json
    
    from operator import itemgetter
    from itertools import groupby
    
    
    primary_keys = ['id', 'company_name']
    
    # Start extraction
    with open('input.csv', 'r') as file:
        # Read data from csv
        reader = csv.DictReader(file)
    
        # Sort data accordingly to primary keys
        reader = sorted(reader, key=itemgetter(*primary_keys))
    
    # Create a list of tuples
    # Each tuple containing a dict of the group primary keys and its values, and a list of the group ordered dicts
    groups = [(dict(zip(primary_keys, _[0])), list(_[1])) for _ in groupby(reader, key=itemgetter(*primary_keys))]
    
    # Create formatted dict to be converted into firebase objects
    group_dicts = []
    for group in groups:
        group_dict = {
            "id": group[0]['id'],
            "agency_name": group[0]['company_name'],
            "countries": [
                dict(country=_['country'], country_id=_['country_id']) for _ in group[1]
            ],
        }
        group_dicts.append(group_dict)
    
    print("\n".join([json.dumps(_, indent=2) for _ in group_dicts]))
    

    这是输出:

    {
      "id": "1",
      "agency_name": "batstop",
      "countries": [
        {
          "country": "usa",
          "country_id": " xx"
        },
        {
          "country": "italy",
          "country_id": " yy"
        }
      ]
    }
    {
      "id": "2",
      "agency_name": "biorice",
      "countries": [
        {
          "country": "italy",
          "country_id": " yy"
        }
      ]
    }
    {
      "id": "3",
      "agency_name": "legstart",
      "countries": [
        {
          "country": "canada",
          "country_id": " zz"
        }
      ]
    }
    

    没有外部库, 希望它适合你!

    【讨论】:

      【解决方案2】:

      您可以试试这个,您可能需要更改一些部分才能使其与您的 csv 一起使用,但希望它足以让您开始:

      csv = [
          "1,batstop,usa, xx",
          "2,biorice,italy, yy",
          "1,batstop,italy, yy",
          "3,legstart,canada, zz"
      ]
      
      output = {} # dictionary useful to avoid searching in list for existing ids
      
      # Parse each row
      for line in csv:
      
          cols = line.split(',')
          id = int(cols[0])
          agency_name = cols[1]
          country = cols[2]
          country_id = cols[3]
      
          if id in output:
                  output[id]['countries'].append([{'country': country,
                                                      'country_id': country_id}])
          else:
              output[id] = {'id': id,
                             'agency_name': agency_name,
                             'countries': [{'country': country,
                                            'country_id': country_id}]
                             }
      
      
      # Put into list
      json_output = []
      for key in output.keys():
          json_output.append( output[key] )
      
      
      # Check output
      for row in json_output:
          print(row)
      

      【讨论】:

      • 我犯了一个错误,而不是 int(cols[0]),您可以将该 ID 保留为字符串,具体取决于您需要的类型。
      猜你喜欢
      • 2017-07-21
      • 1970-01-01
      • 2023-03-08
      • 1970-01-01
      • 2010-09-16
      • 1970-01-01
      • 1970-01-01
      • 2021-07-29
      • 2018-12-21
      相关资源
      最近更新 更多