【问题标题】:convert pandas dataframe to json with specific format将 pandas 数据帧转换为具有特定格式的 json
【发布时间】:2020-12-09 02:15:41
【问题描述】:

我正在尝试将下面提到的数据框转换为所需的 json

enter image description here

column_id,column_name,mandatory,column_data_type,column_data_length,_id,data_format,file_type,active_ind
1,PAT_ID,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
2,PAT_NAME,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE
3,PAT_AGE,FALSE,VARCHAR,2500,5f2193c39448c44f0c1b65e0,TEXT,FACT,TRUE

像下面提到的json

{
    "_id": 5f2193c39448c44f0c1b65e0,
    "data_format": "TEXT",
    "file_type": "FACT",
    "columns": [
        {
            "column_id": 1,
            "column_name": "PAT_ID",
            "mandatory": "false",
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
            
        },

        {
            "column_id": 2,
            "column_name": "PAT_NAME",
            "mandatory": "false",
            "column_data_type": "VARCHAR",
            "column_data_length": 2500
          
        }
    ],
    "active_ind": "true",
}

我尝试使用多种方法根据列名和列 ID 进行分组

  1. 这将分组列但不是所有值 cac= df.groupby('column_id').apply(lambda x: x.to_json(orient='records'))

  2. cac = df.to_json(orient='records')

我无法将 id 和列分开。

请帮帮我

【问题讨论】:

    标签: json python-3.x pandas


    【解决方案1】:

    我会这样做:

    # Load data
    df = pd.read_csv('data.csv')
    
    # Create list of dict for columns column
    col_set = ['column_id', 
               'column_name', 
               'mandatory', 
               'column_data_type', 
               'column_data_length']
    df['columns'] = df[col_set].apply(lambda x: x.to_dict(), axis=1)
    reorder = ['column_id', 
               'column_name', 
               'mandatory', 
               'column_data_type', 
               'column_data_length', 
               'columns', 
               'active_ind', 
               '_id', 
               'data_format', 
               'file_type']
    df = df[reorder]
    
    # Group by similar rows and join sub dicts
    col_set_2 = ['_id', 'data_format', 'file_type', 'columns', 'active_ind']
    col_set_3 = ['_id', 'data_format', 'file_type', 'active_ind']
    df2 = df[col_set_2].groupby(col_set_3)['columns'].apply(lambda x: list(x)).reset_index()
    df2 = df2[col_set_2]
    
    # Dataframe to json
    parsed = json.loads(df2.to_json(orient='records', indent=4))
    result = json.dumps(parsed[0], indent=4)
    
    print(result)
    
    {
        "_id": "5f2193c39448c44f0c1b65e0",
        "data_format": "TEXT",
        "file_type": "FACT",
        "columns": [
            {
                "column_id": 1,
                "column_name": "PAT_ID",
                "mandatory": false,
                "column_data_type": "VARCHAR",
                "column_data_length": 2500
            },
            {
                "column_id": 2,
                "column_name": "PAT_NAME",
                "mandatory": false,
                "column_data_type": "VARCHAR",
                "column_data_length": 2500
            },
            {
                "column_id": 3,
                "column_name": "PAT_AGE",
                "mandatory": false,
                "column_data_type": "VARCHAR",
                "column_data_length": 2500
            }
        ],
        "active_ind": true
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-15
      • 2022-08-16
      • 2017-12-18
      • 2023-03-04
      • 1970-01-01
      • 2021-04-26
      相关资源
      最近更新 更多