【问题标题】:how to make summary aggregated information from multiple columns in pandas dataframe as list of strings?如何将pandas数据框中多列的汇总汇总信息作为字符串列表?
【发布时间】:2020-07-29 16:26:46
【问题描述】:

我有一个如下的数据框:

df = 
                               time_id gt_class  num_missed_base  num_missed_feature  num_objects_base  num_objects_feature
   5G21A6P00L4100023:1566617404450336      CAR               11                   4                27                   30
   5G21A6P00L4100023:1566617404450336  BICYCLE                4                   6                27                   30
   5G21A6P00L4100023:1566617404450336   PERSON                2                   3                27                   30
   5G21A6P00L4100023:1566617404450336    TRUCK                1                   0                27                   30
   5G21A6P00L4100023:1566617428450689      CAR               25                  14                60                   67
   5G21A6P00L4100023:1566617428450689   PERSON                7                   6                60                   67
   5G21A6P00L4100023:1566617515950900  BICYCLE                1                   1                59                   65
   5G21A6P00L4100023:1566617515950900      CAR               20                   9                59                   65
   5G21A6P00L4100023:1566617515950900   PERSON               10                   2                59                   65
   5G21A6P00L4100037:1567169649450046      CAR                8                   0                29                   32
   5G21A6P00L4100037:1567169649450046   PERSON                1                   0                29                   32
   5G21A6P00L4100037:1567169649450046    TRUCK                1                   0                29                   32

在每个time_id 处显示基本模型num_missed_base 中遗漏了多少对象,特征模型num_missed_feature 中遗漏了多少对象,以及num_objects_base 中的基础和特征中当时存在多少对象, num_objects_feature

我需要制作以下数据框:

    time_id                             gt_class                    num_missed_base     num_missed_feature   hover_base                                                  hover_feature
0   5G21A6P00L4100023:1566617404450336  CAR,BICYCLE,PERSON,TRUCK    18                  13           ['CAR: 11', 'BICYCLE: 4', 'PERSON: 2', 'TRUCK:1]          ['CAR: 4', 'BICYCLE: 6', 'PERSON: 3', 'TRUCK: 0']
1   5G21A6P00L4100023:1566617428450689  CAR,PERSON                  32                  20           ['CAR: 25', 'PERSON: 7']                                ['CAR: 14', 'PERSON: 6']
2   5G21A6P00L4100023:1566617515950900  BICYCLE,CAR,PERSON          31                  12      ['BICYCLE: 1', 'CAR: 20', 'PERSON: 10']                    ['BICYCLE: 1', 'CAR: 9', 'PERSON: 2']
3   5G21A6P00L4100037:1567169649450046  CAR,PERSON,TRUCK            10                   0      ['CAR: 8', 'PERSON: 1', 'TRUCK: 1']                 ['CAR: 0', 'PERSON: 0', 'TRUCK: 0']

【问题讨论】:

    标签: python pandas dataframe grouping summary


    【解决方案1】:

    您可以按time_id分组,然后应用相关的聚合函数 参考:https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html 注意:这是一个类似的更简单的示例。

    import pandas as pd
    
    df = pd.DataFrame(data={
        'time_id': ['2020-01-01','2020-01-01','2020-01-01','2020-01-02','2020-01-02','2020-01-02'],
        'val1': ['car', 'bicycle', 'person', 'truck', 'aeroplane', 'train'],
        'val2': [0,1,2,9,8,7],
        'val3': [9,2,3,4,5,6]
    })
    
    mylist = []
    def func(row):
        return ','.join(row.tolist())
    
    def multi_column1(row):
        l = []
        for n in row.index:
            x = df.loc[n, 'val1']
            y = df.loc[n, 'val3']
            w = '{} : {}'.format(x, y)
            l.append(w)
        return l
    ans = df.groupby('time_id').agg({'val1':func, 'val2': sum, 'val3': multi_column1})
    

    【讨论】:

    • 谢谢。但我需要制作hover_basehover_feature 这不是一个简单的列表。请问如何制作类列表:num_missed 作为上面的示例?
    • 进行了更改以适应多列用例,但这是 hacky。
    • 有什么办法可以去掉multi_column1里面的for循环?
    • 列表理解,但除了在索引上循环之外,我没有看到以任何常规方式访问其他列数据的方法。
    猜你喜欢
    • 2019-08-18
    • 1970-01-01
    • 2020-08-22
    • 2019-06-01
    • 2023-03-25
    • 2018-02-09
    • 2016-10-12
    • 2014-04-09
    • 2016-11-12
    相关资源
    最近更新 更多