【问题标题】:Calculate the cumulative count for all NaN values in specific column计算特定列中所有 NaN 值的累积计数
【发布时间】:2021-07-27 18:05:20
【问题描述】:

我有一个数据框:

# create example df
df = pd.DataFrame(index=[1,2,3,4,5,6,7])
df['ID'] = [1,1,1,1,2,2,2]
df['election_date'] = pd.date_range("01/01/2010", periods=7, freq="M")
df['stock_price'] = [1,np.nan,np.nan,4,5,np.nan,7]

# sort values
df.sort_values(['election_date'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True)
df

   ID   election_date   stock_price
0   2   2010-07-31       7.0
1   2   2010-06-30       NaN
2   2   2010-05-31       5.0
3   1   2010-04-30       4.0
4   1   2010-03-31       NaN
5   1   2010-02-28       NaN
6   1   2010-01-31       1.0

我想为每个ID 计算列stock_price 的所有np.nan 的累积计数。

预期结果是:

df

   ID   election_date   stock_price  cum_count_nans
0   2   2010-07-31       7.0            1
1   2   2010-06-30       NaN            0
2   2   2010-05-31       5.0            0   
3   1   2010-04-30       4.0            2  
4   1   2010-03-31       NaN            1
5   1   2010-02-28       NaN            0
6   1   2010-01-31       1.0            0

有什么办法解决吗?

【问题讨论】:

    标签: pandas numpy count nan


    【解决方案1】:

    想法是通过索引改变顺序,然后在自定义函数中测试缺失值,移位并使用累积和:

    f = lambda x: x.isna().shift(fill_value=0).cumsum()
    df['cum_count_nans'] = df.iloc[::-1].groupby('ID')['stock_price'].transform(f)
    print (df)
       ID election_date  stock_price cum_count_nans
    0   2    2010-07-31          7.0              1
    1   2    2010-06-30          NaN              0
    2   2    2010-05-31          5.0              0
    3   1    2010-04-30          4.0              2
    4   1    2010-03-31          NaN              1
    5   1    2010-02-28          NaN              0
    6   1    2010-01-31          1.0              0
    

    【讨论】:

      猜你喜欢
      • 2019-06-06
      • 2020-12-17
      • 1970-01-01
      • 2020-07-18
      • 2020-12-02
      • 2014-09-03
      • 2018-03-26
      • 1970-01-01
      • 2020-09-30
      相关资源
      最近更新 更多