【问题标题】:Looping through DataFrame with previous/next values使用前一个/下一个值循环遍历 DataFrame
【发布时间】:2021-10-28 14:29:45
【问题描述】:

我有以下 DataFrame df,其中 userstatus 已经按升序排序:

user  count  status

A     1      completed
A     2      not completed
B     1      not completed
B     2      completed
B     3      not completed
C     1      completed
C     2      not completed
C     3      completed
D     1      not completed
D     2      completed
D     3      not completed
D     4      completed

我需要分别标记两行,其中一个user 的状态not completed 后跟completed。因此,逻辑应该是这样的:

for each user:
  set rows where "not completed" comes before "completed" == 1
  set rows where "completed" comes after "not completed" == 1

这是想要的结果:

user  count  status           selection

A     1      completed        0
A     2      not completed    0
B     1      not completed    1
B     2      completed        1
B     3      not completed    0
C     1      completed        0
C     2      not completed    1
C     3      completed        1
D     1      not completed    1
D     2      completed        1
D     3      not completed    1
D     4      completed        1

我更喜欢使用 iterrows() 或 itertuples() 的解决方案,但遇到了标记两行以及选择上一个/下一个索引的问题。很高兴看到这个问题的潜在解决方案。

【问题讨论】:

    标签: python pandas dataframe loops iteration


    【解决方案1】:

    有点冗长,但您可以使用groupbytransformnp.select

    def func(d):
        s = np.select([(d.eq("not completed") & d.shift(-1).eq("completed")),
                       (d.eq("completed") & d.shift().eq("not completed"))],
                      [1, 1], 0)
        return s
    
    df["new"] = df.groupby("user")["status"].transform(func)
    
    print (df)
    
       user  count         status  new
    0     A      1      completed    0
    1     A      2  not completed    0
    2     B      1  not completed    1
    3     B      2      completed    1
    4     B      3  not completed    0
    5     C      1      completed    0
    6     C      2  not completed    1
    7     C      3      completed    1
    8     D      1  not completed    1
    9     D      2      completed    1
    10    D      3  not completed    1
    11    D      4      completed    1
    

    【讨论】:

      【解决方案2】:

      您可以在status 列上使用.groupby().apply()

      在每一行,检查status

      current row "not completed" comes before "completed" 
      

      by g.eq('not completed') & g.shift(-1).eq('completed') [.shift(-1) 获取下一行的内容]

      或: 当前行“已完成”在“未完成”之后`

      by g.eq('completed') & g.shift(1).eq('not completed') [.shift(1) 获取上一行的内容]

      ,如下:

      df['selection'] = (df.groupby('user')['status']
                          .apply(lambda g: 
                                     g.eq('not completed') & g.shift(-1).eq('completed') |
                                     g.eq('completed') & g.shift(1).eq('not completed')
                                ).astype(int)
                        )
      

      结果:

      print(df)
      
         user  count         status  selection
      0     A      1      completed          0
      1     A      2  not completed          0
      2     B      1  not completed          1
      3     B      2      completed          1
      4     B      3  not completed          0
      5     C      1      completed          0
      6     C      2  not completed          1
      7     C      3      completed          1
      8     D      1  not completed          1
      9     D      2      completed          1
      10    D      3  not completed          1
      11    D      4      completed          1
      

      【讨论】:

        猜你喜欢
        • 2018-05-17
        • 2019-08-30
        • 1970-01-01
        • 2016-02-24
        • 2021-12-09
        • 2013-11-06
        • 1970-01-01
        • 2021-11-29
        • 2023-03-24
        相关资源
        最近更新 更多