【问题标题】:Converting CSV to Hierarchical JSON output将 CSV 转换为分层 JSON 输出
【发布时间】:2021-01-01 20:44:35
【问题描述】:

我正在尝试将 CSV 文件转换为分层 JSON 文件。CSV 文件输入如下,它包含两列基因和疾病。

gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms

预期的输出格式应为以下格式

{
     "name": "A1BG",
     "children": [
      {"name": "Adenocarcinoma"},
      {"name": "apnea"},
      {"name": "Athritis"}
      ]
    },

{
     "name": "A2M",
     "children": [
      {"name": "Asthma"},
      {"name": "Astrocytoma"},
      {"name": "Diabetes"}
      ]
    },


{
     "name": "NAT1",
     "children": [
      {"name": "polyps"},
      {"name": "lymphoma"},
      {"name": "neoplasms"}
      ]
    }
   

我写的python代码如下。让我知道我需要更改哪里以获得所需的输出。

import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])

for key, value in grouped:

    dictionary = {}
    dictList = []
    anotherDict = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['name'] = j.at[0, 'gene']

    for i in j.index:    
        anotherDict['disease'] = j.at[i, 'disease']
        dictList.append(anotherDict)

    dictionary['children'] = dictList
    finalList.append(dictionary)

with open('outputresult3.json', "w") as out:
    json.dump(finalList,out)

【问题讨论】:

    标签: python json pandas csv dictionary


    【解决方案1】:

    使用 DataFrame.groupby 和自定义 lambda 函数,通过 DataFrame.to_dict 将值转换为字典:

    L = (df.rename(columns={'disease':'name'})
           .groupby('gene')
           .apply(lambda x: x[['name']].to_dict('records'))
           .reset_index(name='children')
           .rename(columns={'gene':'name'})
           .to_dict('records')
           )
    print (L)
    [{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
                                   {'name': 'apnea'}, 
                                   {'name': 'Athritis'}]}, 
     {'name': 'A2M', 'children': [{'name': 'Asthma'}, 
                                  {'name': 'Astrocytoma'}, 
                                  {'name': 'Diabetes'}]}, 
     {'name': 'NAT1', 'children': [{'name': 'polyps'},
                                   {'name': 'lymphoma'}, 
                                   {'name': 'neoplasms'}]}]
    
    with open('outputresult3.json', "w") as out:
        json.dump(L,out)
    

    【讨论】:

      【解决方案2】:
      import json
      
      json_data = []
      
      # group the data by each unique gene
      for gene, data in df.groupby(["gene"]):
      
          # obtain a list of diseases for the current gene
          diseases = data["disease"].tolist()
      
          # create a new list of dictionaries to satisfy json requirements
          children = [{"name": disease} for disease in diseases]
          
          entry = {"name": gene, "children": children}
          json_data.append(entry)
          
      with open('outputresult3.json', "w") as out:
          json.dump(json_data, out)
      

      【讨论】:

        猜你喜欢
        • 2014-09-25
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-05-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多