【问题标题】:Drop columns if all of their values in a specific date range are NaNs using Pandas如果特定日期范围内的所有值都是使用 Pandas 的 NaN,则删除列
【发布时间】:2021-11-29 10:43:12
【问题描述】:

给定一个数据样本如下:

          date  value1    value2    value3
0   2021-10-12   1.015  1.115668  1.015000
1   2021-10-13     NaN  1.104622  1.030225
2   2021-10-14     NaN  1.093685       NaN
3   2021-10-15   1.015  1.082857       NaN
4   2021-10-16   1.015  1.072135  1.077284
5   2021-10-29   1.015  1.061520  1.093443
6   2021-10-30   1.015  1.051010  1.109845
7   2021-10-31   1.015       NaN  1.126493
8    2021-11-1   1.015       NaN       NaN
9    2021-11-2   1.015  1.020100       NaN
10   2021-11-3     NaN  1.010000       NaN
11  2021-11-30   1.015  1.000000       NaN

假设我想在 2021 年 11 月删除 所有值为 NaNs 的列,这意味着范围为 2021-11-012021-11-30(包括开始和结束日期) .

根据此要求,vlue3 将被删除,因为它在 2021-11 中的所有值都是 NaNs。其他列在2021-11 中有NaNs,但不是全部,因此将保留这些列。

我如何在 Pandas 中实现这一点?谢谢。

编辑:

df['date'] = pd.to_datetime(df['date'])  
mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
df.loc[mask]

输出:

         date  value1  value2  value3
8  2021-11-01   1.015     NaN     NaN
9  2021-11-02   1.015  1.0201     NaN
10 2021-11-03     NaN  1.0100     NaN
11 2021-11-30   1.015  1.0000     NaN

【问题讨论】:

    标签: python-3.x pandas dataframe


    【解决方案1】:

    您可以按November of 2021过滤行,并按条件测试所有行是否有NaNs:

    df['date'] = pd.to_datetime(df['date'])
    
    df = df.loc[:, ~df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()]
    

    或者:

    df['date'] = pd.to_datetime(df['date'])
    
    df = df.loc[:, df[df['date'].dt.to_period('m') == pd.Period('2021-11')].notna().any()]
    

    编辑:如果需要手动设置一些不处理使用的列:

    mask = (df['date'] >= '2021-11-01') & (df['date'] <= '2021-11-30')
    df = df.loc[:, df.loc[mask].notna().any()]
    

    输出:

             date  value1    value2
    0  2021-10-12   1.015  1.115668
    1  2021-10-13     NaN  1.104622
    2  2021-10-14     NaN  1.093685
    3  2021-10-15   1.015  1.082857
    4  2021-10-16   1.015  1.072135
    5  2021-10-29   1.015  1.061520
    6  2021-10-30   1.015  1.051010
    7  2021-10-31   1.015       NaN
    8  2021-11-01   1.015       NaN
    9  2021-11-02   1.015  1.020100
    10 2021-11-03     NaN  1.010000
    11 2021-11-30   1.015  1.000000
    

    编辑:

    df = df.assign(value4 = np.nan)
    print (df)
              date  value1    value2    value3  value4
    0   2021-10-12   1.015  1.115668  1.015000     NaN
    1   2021-10-13     NaN  1.104622  1.030225     NaN
    2   2021-10-14     NaN  1.093685       NaN     NaN
    3   2021-10-15   1.015  1.082857       NaN     NaN
    4   2021-10-16   1.015  1.072135  1.077284     NaN
    5   2021-10-29   1.015  1.061520  1.093443     NaN
    6   2021-10-30   1.015  1.051010  1.109845     NaN
    7   2021-10-31   1.015       NaN  1.126493     NaN
    8    2021-11-1   1.015       NaN       NaN     NaN
    9    2021-11-2   1.015  1.020100       NaN     NaN
    10   2021-11-3     NaN  1.010000       NaN     NaN
    11  2021-11-30   1.015  1.000000       NaN     NaN
    

    df['date'] = pd.to_datetime(df['date'])
    
    m = df[df['date'].dt.to_period('m') == pd.Period('2021-11')].isna().all()
    m.loc['value4'] = False
    
    print (m)
    date      False
    value1    False
    value2    False
    value3     True
    value4    False
    dtype: bool
    
    df = df.loc[:, ~m]
    print (df)
             date  value1    value2  value4
    0  2021-10-12   1.015  1.115668     NaN
    1  2021-10-13     NaN  1.104622     NaN
    2  2021-10-14     NaN  1.093685     NaN
    3  2021-10-15   1.015  1.082857     NaN
    4  2021-10-16   1.015  1.072135     NaN
    5  2021-10-29   1.015  1.061520     NaN
    6  2021-10-30   1.015  1.051010     NaN
    7  2021-10-31   1.015       NaN     NaN
    8  2021-11-01   1.015       NaN     NaN
    9  2021-11-02   1.015  1.020100     NaN
    10 2021-11-03     NaN  1.010000     NaN
    11 2021-11-30   1.015  1.000000     NaN
    

    【讨论】:

    • 谢谢,我在问题中添加了一段代码到 loc 2021-11,有用吗?
    • 我测试了你所有的三种方法,它们都完美无缺。 :)
    • 对不起,我还有一个问题,如果我有一个专栏value4,即使满足要求,我还是会保留它。我怎么能这样做?
    猜你喜欢
    • 2021-02-20
    • 2022-01-12
    • 2017-09-15
    • 2019-01-23
    • 1970-01-01
    • 2020-06-26
    • 2022-11-04
    • 1970-01-01
    • 2018-12-08
    相关资源
    最近更新 更多