【问题标题】:In pandas dataframe keep repeated values that are only in a group, if value is repeated after other value then print some message在熊猫数据框中保留仅在一个组中的重复值,如果值在其他值之后重复,则打印一些消息
【发布时间】:2021-05-15 22:44:48
【问题描述】:

示例数据框:

A1
A1
A1 #these values are ok because these are repeated continuously
A2
A3
A4
A1 #this is duplicate value as this is not in continuation
A5

【问题讨论】:

    标签: python pandas dataframe series


    【解决方案1】:

    用途:

    #test if duplciated, first dupe is False
    df['dup'] = df['col'].duplicated()
    #consecutive groups
    df['g'] = df['col'].ne(df['col'].shift()).cumsum()
    #test if not all Trues per groups
    df['new'] = ~df.groupby('g')['dup'].transform('all')
    print (df)
      col    dup  g    new
    0  A1  False  1   True
    1  A1   True  1   True
    2  A1   True  1   True
    3  A2  False  2   True
    4  A2   True  2   True
    5  A3  False  3   True
    6  A4  False  4   True
    7  A1   True  5  False
    8  A5  False  6   True
    

    如果只需要单独测试重复值:

    print (df)
       col
    0   A1
    1   A1
    2   A1
    3   A2
    4   A2
    5   A3
    6   A4
    7   A1
    8   A1
    9   A5
    10  A1
    

    #same like first solution
    df['dup'] = df['col'].duplicated()
    df['g'] = df['col'].ne(df['col'].shift()).cumsum()
    df['rem1'] = ~df.groupby('g')['dup'].transform('all')
    #test if all dupes by groups g
    df['rem2']  = df['g'].duplicated(keep=False)
    #chain by | for bitwise OR
    df['new']  = df['rem1'] | df['rem2']
    print (df)
       col    dup  g   rem1   rem2    new
    0   A1  False  1   True   True   True
    1   A1   True  1   True   True   True
    2   A1   True  1   True   True   True
    3   A2  False  2   True   True   True
    4   A2   True  2   True   True   True
    5   A3  False  3   True  False   True
    6   A4  False  4   True  False   True
    7   A1   True  5  False   True   True
    8   A1   True  5  False   True   True
    9   A5  False  6   True  False   True
    10  A1   True  7  False  False  False
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-17
      • 2018-02-01
      • 2017-09-09
      • 2019-08-14
      • 2020-06-08
      相关资源
      最近更新 更多