【问题标题】:How to fill elements between intervals of a list如何在列表的间隔之间填充元素
【发布时间】:2020-08-01 04:51:19
【问题描述】:

我有一个这样的列表:

list_1 = [np.NaN, np.NaN, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, 1, np.NaN, 0, 1, np.NaN, 0, np.NaN,  1, np.NaN]

因此存在以1 开头并以0 结尾的间隔。 如何替换这些间隔中的值,比如用 1?结果将如下所示:

list_2 = [np.NaN, np.NaN, 1, 1, 1, 1, 0, np.NaN, 1, 1, 0, 1, 1, 0, np.NaN, 1, np.NaN]

我在这个例子中使用了NaN,但是一个可以应用于任何值的通用解决方案也很好

【问题讨论】:

  • 是 1 后面必须跟着 0 吗?或者在连续的 1-0 对之间是否还有其他 1?也就是说,每1后面有0吗?

标签: python pandas numpy dataframe


【解决方案1】:

熊猫解决方案:

s = pd.Series(list_1)
s1 = s.eq(1)
s0 = s.eq(0)
m = (s1 | s0).where(s1.cumsum().ge(1),False).cumsum().mod(2).eq(1)
s.loc[m & s.isna()] = 1
print(s.tolist())
#[nan, nan, 1.0, 1.0, 1.0, 1.0, 0.0, nan, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, nan, 1.0, 1.0]

但如果只有10NaN 你可以这样做:

s = pd.Series(list_1)
s.fillna(s.ffill().where(lambda x: x.eq(1))).tolist()

输出

[nan,
 nan,
 1.0,
 1.0,
 1.0,
 1.0,
 0.0,
 nan,
 1.0,
 1.0,
 0.0,
 1.0,
 1.0,
 0.0,
 nan,
 1.0,
 1.0]

【讨论】:

    【解决方案2】:

    这是一个使用 np.cumsum 的基于 numpy 的方法:

    a = np.array([np.NaN, np.NaN, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, 
                  1, np.NaN, 0, 1, np.NaN, 0, np.NaN,  1, np.NaN])
    
    ix0 = (a == 0).cumsum()
    ix1 = (a == 1).cumsum()
    dec = (ix1 - ix0).astype(float)
    # Only necessary if the seq can end with an unclosed interval
    ix = len(a)-(a[::-1]==1).argmax()
    last = ix1[-1]-ix0[-1]
    if last > 0:
        dec[ix:] = a[ix:]
    # -----
    out = np.where(dec==1, dec, a)
    

    print(out)
    array([nan, nan,  1.,  1.,  1.,  1.,  0., nan,  1.,  1.,  0.,  1.,  1.,
            0., nan,  1., nan])
    

    【讨论】:

      【解决方案3】:

      这是一个基于 NumPy 的 -

      def fill_inbetween(a):
          m1 = a==1
          m2 = a==0
          id_ar = m1.astype(int)-m2
          idc = id_ar.cumsum()
          idc[len(m1)-m1[::-1].argmax():] =  0
          return np.where(idc.astype(bool), 1, a)
      

      示例运行 -

      In [44]: a # input as array
      Out[44]: 
      array([nan, nan,  1., nan, nan, nan,  0., nan,  1., nan,  0.,  1., nan,
              0., nan,  1., nan])
      
      In [45]: fill_inbetween(a)
      Out[45]: 
      array([nan, nan,  1.,  1.,  1.,  1.,  0., nan,  1.,  1.,  0.,  1.,  1.,
              0., nan,  1., nan])
      

      使用数组输入对 NumPy 解决方案进行基准测试

      为简单起见,我们将通过平铺和测试基于 NumPy 的样本将给定样本扩大到 10,000x

      其他 NumPy 解决方案 -

      #@yatu's soln
      def func_yatu(a):
          ix0 = (a == 0).cumsum()
          ix1 = (a == 1).cumsum()
          dec = (ix1 - ix0).astype(float)
          ix = len(a)-(a[::-1]==1).argmax()
          last = ix1[-1]-ix0[-1]
          if last > 0:
              dec[ix:] = a[ix:]
          out = np.where(dec==1, dec, a)
          return out
      
      # @FBruzzesi's soln (with the output returned in a separate array)
      def func_FBruzzesi(a, value=1):
          ones = np.squeeze(np.argwhere(a==1))
          zeros = np.squeeze(np.argwhere(a==0))   
          if ones[0]>zeros[0]:
              zeros = zeros[1:]   
          out = a.copy()
          for i,j in zip(ones,zeros):
              out[i+1:j] = value
          return out
      
      # @Ehsan's soln (with the output returned in a separate array)
      def func_Ehsan(list_1):
          zeros_ind = np.where(list_1 == 0)[0]
          ones_ind = np.where(list_1 == 1)[0]
          ones_ind = ones_ind[:zeros_ind.size]        
          indexer = np.r_[tuple([np.s_[i:j] for (i,j) in zip(ones_ind,zeros_ind)])]
          out = list_1.copy()
          out[indexer] = 1
          return out
      

      时间安排 -

      In [48]: list_1 = [np.NaN, np.NaN, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, 1, np.NaN, 0, 1, np.NaN, 0, np.NaN,  1, np.NaN]
          ...: a = np.array(list_1)
      
      In [49]: a = np.tile(a,10000)
      
      In [50]: %timeit func_Ehsan(a)
          ...: %timeit func_FBruzzesi(a)
          ...: %timeit func_yatu(a)
          ...: %timeit fill_inbetween(a)
      4.86 s ± 325 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      253 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
      3.39 ms ± 205 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      2.01 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
      

      复制过程不会占用太多运行时间,因此可以忽略 -

      In [51]: %timeit a.copy()
      78.3 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
      

      【讨论】:

        【解决方案4】:

        假设每个 1 后面跟着 0(减去最后一个 1):

        list_1 = np.array([np.NaN, np.NaN, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, 1, np.NaN, 0, 1, np.NaN, 0, np.NaN,  1, np.NaN])
        zeros_ind = np.where(list_1 == 0)[0]
        ones_ind = np.where(list_1 == 1)[0]
        ones_ind = ones_ind[:zeros_ind.size]
        
        #create a concatenated list of ranges of indices you desire to slice
        indexer = np.r_[tuple([np.s_[i:j] for (i,j) in zip(ones_ind,zeros_ind)])]
        #slice using numpy indexing
        list_1[indexer] = 1
        

        输出:

        [nan nan  1.  1.  1.  1.  0. nan  1.  1.  0.  1.  1.  0. nan  1. nan]
        

        【讨论】:

          【解决方案5】:

          这是一个代码,其中变量replace 将确定是否应替换元素,for 将从间隔的0 迭代到len,如果找到1,则replace 将为真然后元素将被替换,当它找到下一个0时,替换将下降并且元素不会替换,直到再次出现1

            replace = False
              for i in (len(interval)-1):
                  if interval[i]==1:
                      replace = True
                  elif interval[i]==0:
                      replace = False
                  if replace:
                      list[i]=inerval[i]
          

          【讨论】:

          • 请不要只发布代码作为答案,还要解释您的代码的作用以及它如何解决问题的问题。带有解释的答案通常质量更高,更有可能吸引投票。
          【解决方案6】:

          您可以使用 np.argwhere 检索索引 1 和 0,然后在每个切片中填充值:

          import numpy as np
          
          a = np.array([np.NaN, np.NaN, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, 1, np.NaN, 0, 1, np.NaN, 0, np.NaN,  1, np.NaN])
          
          ones = np.squeeze(np.argwhere(a==1))
          zeros = np.squeeze(np.argwhere(a==0))
          
          if ones[0]>zeros[0]:
              zeros = zeros[1:]
          
          value = -999
          for i,j in zip(ones,zeros):
              a[i+1:j] = value
          
          a
          array([  nan,   nan,    1., -999., -999., -999.,    0.,   nan,    1.,
                 -999.,    0.,    1., -999.,    0.,   nan,    1.,   nan])
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2022-11-02
            • 1970-01-01
            • 2014-03-05
            • 2014-05-22
            • 1970-01-01
            • 1970-01-01
            • 2018-03-30
            相关资源
            最近更新 更多