【问题标题】:Pandas Compare rows in Dataframe熊猫比较数据框中的行
【发布时间】:2017-02-28 15:20:22
【问题描述】:

我有以下数据框(由下面的字典表示):

{'Name': {0: '204',
  1: '110838',
  2: '110999',
  3: '110998',
  4: '111155',
  5: '111710',
  6: '111157',
  7: '111156',
  8: '111144',
  9: '118972',
  10: '111289',
  11: '111288',
  12: '111145',
  13: '121131',
  14: '118990',
  15: '110653',
  16: '110693',
  17: '110694',
  18: '111577',
  19: '111702',
  20: '115424',
  21: '115127',
  22: '115178',
  23: '111578',
  24: '115409',
  25: '115468',
  26: '111711',
  27: '115163',
  28: '115149',
  29: '115251'},
 'Sequence_new': {0: 1.0,
  1: 2.0,
  2: 3.0,
  3: 4.0,
  4: 5.0,
  5: 6.0,
  6: 7.0,
  7: 8.0,
  8: 9.0,
  9: 10.0,
  10: 11.0,
  11: 12.0,
  12: nan,
  13: 13.0,
  14: 14.0,
  15: 15.0,
  16: 16.0,
  17: 17.0,
  18: 18.0,
  19: 19.0,
  20: 20.0,
  21: 21.0,
  22: 22.0,
  23: 23.0,
  24: 24.0,
  25: 25.0,
  26: 26.0,
  27: 27.0,
  28: 28.0,
  29: 29.0},
 'Sequence_old': {0: 1,
  1: 2,
  2: 3,
  3: 4,
  4: 5,
  5: 6,
  6: 7,
  7: 8,
  8: 9,
  9: 10,
  10: 11,
  11: 12,
  12: 13,
  13: 14,
  14: 15,
  15: 16,
  16: 17,
  17: 18,
  18: 19,
  19: 20,
  20: 21,
  21: 22,
  22: 23,
  23: 24,
  24: 25,
  25: 26,
  26: 27,
  27: 28,
  28: 29,
  29: 30}}

我试图了解新旧序列之间的变化。如果通过Name Sequence_old = Sequence_new,没有任何改变。如果 Sequence+_new'nan',则删除名称。你能帮助在熊猫中实现这个吗? 什么尝试到现在都没有成功:

for i in range(0, len(Merge)):
    if Merge.iloc[i]['Sequence_x'] == Merge.iloc[i]['Sequence_y']:
        Merge.iloc[i]['New'] = 'N'
    else:
        Merge.iloc[i]['New'] = 'Y'

谢谢

【问题讨论】:

    标签: python pandas nan missing-data


    【解决方案1】:

    你可以使用 double numpy.where with condition with isnull:

    mask = df.Sequence_old == df.Sequence_new
    
    df['New'] = np.where(df.Sequence_new.isnull(), 'Removed', 
                np.where(mask, 'N', 'Y'))
    
    print (df)
         Name  Sequence_new  Sequence_old      New
    0      204           1.0             1        N
    1   110838           2.0             2        N
    2   110999           3.0             3        N
    3   110998           4.0             4        N
    4   111155           5.0             5        N
    5   111710           6.0             6        N
    6   111157           7.0             7        N
    7   111156           8.0             8        N
    8   111144           9.0             9        N
    9   118972          10.0            10        N
    10  111289          11.0            11        N
    11  111288          12.0            12        N
    12  111145           NaN            13  Removed
    13  121131          13.0            14        Y
    14  118990          14.0            15        Y
    15  110653          15.0            16        Y
    16  110693          16.0            17        Y
    17  110694          17.0            18        Y
    18  111577          18.0            19        Y
    19  111702          19.0            20        Y
    20  115424          20.0            21        Y
    21  115127          21.0            22        Y
    22  115178          22.0            23        Y
    23  111578          23.0            24        Y
    24  115409          24.0            25        Y
    25  115468          25.0            26        Y
    26  111711          26.0            27        Y
    27  115163          27.0            28        Y
    28  115149          28.0            29        Y
    29  115251          29.0            30        Y
    

    【讨论】:

    • 你能解释更多If Sequence+_new is 'nan', Name removed吗?期望的输出是什么?
    • 是的,nan 已被删除,我想要更聪明的布尔条件来查找这些名称。谢谢
    • 谢谢。更聪明是什么意思?您可以从此输入添加所需的输出吗?
    • 如果 Sequence_new 是 nan,表示名称已删除,我想将其标记为已删除。如果 sequence_old 和 sequence_new 都存在,这些将保留
    • Remove 是指删除Sequence_new - 12 111145 NaN 13 Y 中的NaN 被删除的所有行?或者然后所有值都设置为NaN ?
    【解决方案2】:
    dic_new = {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 11.0, 11: 12.0,
               12: 'Nan', 13: 13.0, 14: 14.0, 15: 15.0, 16: 16.0, 17: 17.0, 18: 18.0, 19: 19.0, 20: 20.0, 21: 21.0,
               22: 22.0, 23: 23.0, 24: 24.0, 25: 25.0, 26: 26.0, 27: 27.0, 28: 28.0, 29: 29.0}
    dic_old = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 12, 12: 13, 13: 14, 14: 15, 15: 16,
               16: 17, 17: 18, 18: 19, 19: 20, 20: 21, 21: 22, 22: 23, 23: 24, 24: 25, 25: 26, 26: 27, 27: 28, 28: 29,
               29: 30}
    
    # Does the same thing as the code below
    for a, b in zip(dic_new.items(), dic_old.items()):
        if b[1].lower() != 'nan':
            # You can add whatever print statement you want here
            print(a[1] == b[1])
    
    # Does the same thing as the code above
    [print(a[1] == b[1]) for a, b in zip(dic_new.items(), dic_old.items()) if b[1].lower() != 'nan']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-08-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多