【问题标题】:Grouping CSV file by ID and extracting JSON column按 ID 对 CSV 文件进行分组并提取 JSON 列
【发布时间】:2019-05-03 03:06:12
【问题描述】:

我目前有一个这样的 CSV:

A    B    C
1    10   {"a":"one","b":"two","c":"three"}
1    10   {"a":"four","b":"five","c":"six"}
1    10   {"a":"seven","b":"eight","c":"nine"}
1    10   {"a":"ten","b":"eleven","c":"twelve"}
2    10   {"a":"thirteen","b":"fourteen","c":"fifteen"}
2    10   {"a":"sixteen","b":"seventeen","c":"eighteen"}
2    10   {"a":"nineteen","b":"twenty","c":"twenty-one"}
3    10   {"a":"twenty-two","b":"twenty-three","c":"twenty-four"}
3    10   {"a":"twenty-five","b":"twenty-six","c":"twenty-seven"}
3    10   {"a":"twenty-eight","b":"twenty-nine","c":"thirty"}
3    10   {"a":"thirty-one","b":"thirty-two","c":"thirty-three"}

我想按A列分组,忽略B列,只取C中的“b”字段,得到如下输出:

A    C
1    ['two','five','eight','eleven']
2    ['fourteen','seventeen','twenty']
3    ['twenty-three','twenty-six','twenty-nine','thirty-two']

我可以这样做吗?如果有用的话,我有熊猫!我也希望输出文件用制表符分隔。

【问题讨论】:

    标签: json python-3.x pandas csv dataframe


    【解决方案1】:

    试试这个:

    import pandas as pd
    import json
    
    # read file that looks exactly as given above
    df = pd.read_csv("file.csv", delim_whitespace=True)
    
    # drop the 'B' column
    del df['B']
    
    # 'C' will start life as a string. convert from json, extract values, return as list
    df['C'] = df['C'].map(lambda x: json.loads(x)['b'])
    
    # 'C' now holds just the 'b' values. group these together: 
    df = df.groupby('A').C.apply(lambda x : list(x))
    
    print(df)
    

    这会返回:

    A
    1                           [two, five, eight, eleven]
    2                        [fourteen, seventeen, twenty]
    3    [twenty-three, twenty-six, twenty-nine, thirty...
    

    【讨论】:

      【解决方案2】:

      IIUC

      df.groupby('A').C.apply(lambda x : [y['b'] for y in x ])
      A
      1                           [two, five, eight, eleven]
      2                        [fourteen, seventeen, twenty]
      3    [twenty-three, twenty-six, twenty-nine, thirty...
      Name: C, dtype: object
      

      【讨论】:

        猜你喜欢
        • 2013-04-19
        • 1970-01-01
        • 2015-10-29
        • 2015-07-16
        • 2020-11-20
        • 2016-08-04
        • 1970-01-01
        • 2015-12-24
        • 2022-12-16
        相关资源
        最近更新 更多