【问题标题】:How to find maximum count of consecutive zeros in column pandas?如何在熊猫列中找到连续零的最大计数?
【发布时间】:2020-12-23 12:04:47
【问题描述】:

我有数据框,想检查 B 列中连续零值的最大计数。

输入输出示例:

df = pd.DataFrame({'B':[1,3,4,0,0,11,1,15,0,0,0,87]})

df_out = pd.DataFrame({'max_count':[3]})

这是怎么做到的?

【问题讨论】:

    标签: python-3.x pandas numpy pandas-groupby


    【解决方案1】:

    一种 NumPy 方式 -

    a = df['B'].values
    m1 = np.r_[False, a==0, False]
    idx = np.flatnonzero(m1[:-1] != m1[1:])
    out = (idx[1::2]-idx[::2]).max()
    

    分步运行-

    # Input data as array
    In [83]: a
    Out[83]: array([ 1,  3,  4,  0,  0, 11,  1, 15,  0,  0,  0, 87])
    
    # Mask of starts and ends for each island of 0s
    In [193]: m1
    Out[193]: 
    array([False, False, False, False,  True,  True, False, False, False,
            True,  True,  True, False, False])
    
    # Indices of those starts and ends
    In [85]: idx
    Out[85]: array([ 3,  5,  8, 11])
    
    # Finally the differencing between starts and ends and max for final o/p
    In [86]: out
    Out[86]: 3
    

    这可以转换为单线:

    np.diff(np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2),axis=1).max()
    

    【讨论】:

    • 使用np.r_的原因是什么? m1 = a==0 还不够吗?
    • @Ehsan 说明任何起始 0
    【解决方案2】:

    您可以为连续的行创建组

    # create group for consecutive numbers
    df['grp'] = (df['B'] != df['B'].shift()).cumsum()
    
         B  grp
    0    1    1
    1    3    2
    2    4    3
    3    0    4
    4    0    4
    5   11    5
    6    1    6
    7   15    7
    8    0    8
    9    0    8
    10   0    8
    11  87    9
    
    
    # check size of groups having 0 value
    max_count = df.query("B == 0").groupby('grp').size().max()
    
    print(max_count)
    3
    

    【讨论】:

      【解决方案3】:

      想法是为连续值的计数器创建具有累积和的掩码,仅过滤 0 值,按 Series.value_counts 计数并获得最大值:

      s = df['B'].ne(0)
      
      a = s.cumsum()[~s].value_counts().max()
      print (a)
      3
      
      df_out=pd.DataFrame({'max_count':[a]})
      

      详情

      print (s.cumsum())
      0     1
      1     2
      2     3
      3     3
      4     3
      5     4
      6     5
      7     6
      8     6
      9     6
      10    6
      11    7
      Name: B, dtype: int32
      
      print (s.cumsum()[~s])
      3     3
      4     3
      8     6
      9     6
      10    6
      Name: B, dtype: int32
      
      print (s.cumsum()[~s].value_counts())
      6    3
      3    2
      Name: B, dtype: int64
      

      【讨论】:

        【解决方案4】:

        也许您可以将其调整为 Python。在 Java 中,您可以使用以下代码找到最连续的 0 长度:

        int B [] = {1,3,4,0,0,11,1,15,0,0,0,87}
        
        int max_zeroes = 0;
        int zeroes = 0;
        for(int i = 0; i < B.length; i++) {
            if( B[i] == 0) {
                zeroes += 1;
                if(zeroes > max_zeroes) {
                    max_zeroes = zeroes;
                }
            } else {
                zeroes = 0;
            }
        }
        

        如果您倾向于查找数组中大多数连续 0 的开始和结束索引,则可以使用以下逻辑:

        int max_zeroes = 0;
        int zeroes = 0;
        int endIndex = -1;
        for (int i = 0; i < B.length; i++) {
            if (B[i] == 0) {
                zeroes += 1;
                if (zeroes > max_zeroes) {
                    max_zeroes = zeroes;
                    endIndex = i;
                }
            } else {
                zeroes = 0;
            }
        }
        
        int startIndex = endIndex;
        for (int i = endIndex - 1; i > -1; i--) {
            if(B[i] == 0) {
                start = i;
            } else {
                i = -1; //used to get out of this for loop.
            }
        }
        
        System.out.println("Max zeroes is: " + max_zeroes + " at start index " + start + " and end index: " + endIndex);
        

        也许您可以将其调整为 Python。

        【讨论】:

          猜你喜欢
          • 2021-09-30
          • 2023-03-10
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2019-12-07
          • 2021-09-13
          相关资源
          最近更新 更多