【问题标题】:Split a column into several columns based on several conditions and group by根据多个条件将一列拆分为几列并分组
【发布时间】:2021-11-25 23:49:18
【问题描述】:

我有一个示例数据框,如下所示。

import pandas as pd

data = {'ID':['A','A','A','A','A','A','A','A','A','C','C','C','C','C','C','C','C'],
    'Week': ['Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2','Week3',
             'Week1','Week1','Week1','Week1','Week2','Week2','Week2','Week2'],
    'Risk':['High','','','','','','','','','High','','','','','','',''],
    'Testing':['','Pos','','Neg','','','','','Pos', '', '','','Neg','','','','Pos'],
    'Week1_adher':['','','','','','','','','', '','','','','','','',''],
    'Week2_adher':['','','','','','','','','','','','','','','','',''],
    'Week3_adher':['','','','','','','','','','','','','','','','','']}
    
df1 = pd.DataFrame(data)
df1 

现在我想计算每个参与者每周的依从性。其计算如下: 如果参与者在一周内的测试栏中有 2 个或更多条目(正面/负面),则该周的坚持为“是”,否则为“否”

例如,对于参与者 A,第 1 周_adherence 为“是”,因为它在第 1 周的测试列中有 2 个条目。 Week2_adherence 为“否”

并且我希望将整周的依从性结果显示在每个参与者的第一行。

最终的数据框应该如下图所示。

我已经坚持了很长一段时间了。任何帮助是极大的赞赏。谢谢。

【问题讨论】:

    标签: python-3.x pandas dataframe data-science data-processing


    【解决方案1】:

    试试:

    adher = (df1.Testing.ne('')       # check for non-empty string
        .groupby([df1.ID, df1.Week])  # groupby ID and week
        .sum().ge(2)                  # count and check >= 2
        .unstack(fill_value=False)
        .replace({True:'Yes', False:'No'})
        .add_suffix('_adher')
    )
    
    # the first lines
    mask = ~df1['ID'].duplicated()
    
    df1.loc[mask, adher.columns] = adher.loc[df1.loc[mask,'ID']].values
    

    输出:

       ID   Week  Risk   Testing Week1_adher Week2_adher Week3_adher
    0   A  Week1  High                   Yes          No          No
    1   A  Week1             Pos                                    
    2   A  Week1                                                    
    3   A  Week1             Neg                                    
    4   A  Week2                                                    
    5   A  Week2                                                    
    6   A  Week2                                                    
    7   A  Week2                                                    
    8   A  Week3             Pos                                    
    9   C  Week1  High                    No          No          No
    10  C  Week1                                                    
    11  C  Week1                                                    
    12  C  Week1        Negative                                    
    13  C  Week2                                                    
    14  C  Week2                                                    
    15  C  Week2                                                    
    16  C  Week2        Positive                                    
    

    【讨论】:

    • 适用于非空字符串(第一行)。如果它是 np.nan 值而不是空字符串怎么办?只需用 testing.ne(np.nan) 替换它?我试过这个,但不起作用。
    • 使用notna()而不是ne()来检查非nan值。
    • 知道了。非常感谢!。解决方案也很优雅。
    猜你喜欢
    • 1970-01-01
    • 2023-01-10
    • 1970-01-01
    • 2021-12-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-08-15
    相关资源
    最近更新 更多