【问题标题】:Combine DataFrame Rows to eliminate NaNs组合 DataFrame 行以消除 NaN
【发布时间】:2021-12-22 02:22:25
【问题描述】:

我有一个从数据记录器创建的 DataFrame,其中每个数据点都有自己的时间戳,如下所示:

df_orig = pd.DataFrame(
    {
        "val1": [ 1, np.nan, np.nan, 11, np.nan, np.nan, 21, np.nan, np.nan, ],
        "val2": [ np.nan, 2, np.nan, np.nan, 12, np.nan, np.nan, 22, np.nan, ],
        "val3": [ np.nan, np.nan, 3, np.nan, np.nan, 13, np.nan, np.nan, 23, ],
    },
    index=pd.to_datetime( [
        "2021-01-01 00:00", "2021-01-01 00:00:01", "2021-01-01 00:00:02",
        "2021-01-01 00:01", "2021-01-01 00:01:01", "2021-01-01 00:01:02",
        "2021-01-01 00:02", "2021-01-01 00:02:01", "2021-01-01 00:02:02",
    ] )
)
                     val1  val2  val3
2021-01-01 00:00:00   1.0   NaN   NaN
2021-01-01 00:00:01   NaN   2.0   NaN
2021-01-01 00:00:02   NaN   NaN   3.0
2021-01-01 00:01:00  11.0   NaN   NaN
2021-01-01 00:01:01   NaN  12.0   NaN
2021-01-01 00:01:02   NaN   NaN  13.0
2021-01-01 00:02:00  21.0   NaN   NaN
2021-01-01 00:02:01   NaN  22.0   NaN
2021-01-01 00:02:02   NaN   NaN  23.0

我实际上并不需要记录每个数据点的精确度。我想通过消除NaNs 并合并非常接近的行来压缩 DataFrame。结果应如下所示:

                     val1  val2  val3
2021-01-01 00:00:00     1     2     3
2021-01-01 00:01:00    11    12    13
2021-01-01 00:02:00    21    22    23

有没有办法做到这一点?

【问题讨论】:

    标签: python pandas dataframe datetime nan


    【解决方案1】:

    如果可能,使用maxminfirst 简化每分钟重新采样的解决方案:

    df = df_orig.resample('Min').max()
    print (df)
                         val1  val2  val3
    2021-01-01 00:00:00   1.0   2.0   3.0
    2021-01-01 00:01:00  11.0  12.0  13.0
    2021-01-01 00:02:00  21.0  22.0  23.0
    

    【讨论】:

    • 谢谢。我的真实数据框在数据之间没有那么好的频率,因此重新采样它会导致很多很多行充满NaNs。我用这条线解决了这个问题:data = data_orig.resample( "100ms" ).first().dropna( how="all" ).
    猜你喜欢
    • 1970-01-01
    • 2021-08-13
    • 2014-05-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-09-02
    • 2021-01-18
    相关资源
    最近更新 更多