【问题标题】:Break the time series dataframe into one has multiple variables each one has the name of the year将时间序列数据框分成一个有多个变量,每个变量都有年份的名称
【发布时间】:2021-04-11 18:41:47
【问题描述】:

我有以下时间序列日期范围:

       date_time                     system_load
0     2013-01-01 00:00:00.000000     599.2
1     2013-01-01 00:59:59.999999     759.2
2     2013-01-01 02:00:00.000001     954.5
3     2013-01-01 03:00:00.000000     190.9
4     2013-01-01 03:59:59.999999     465.2
...                          ...     ...
70123 2020-12-31 18:59:59.999999     355.9
70124 2020-12-31 20:00:00.000001     752.1
70125 2020-12-31 21:00:00.000000     928.5
70126 2020-12-31 21:59:59.999999     299.2
70127 2020-12-31 23:00:00.000001     478.5

我想要的是一个新的数据框,如下所示:

       Year2013     Year 2014   Year2015     Year2016   Year2017     Year2018   Year2019     Year 2020
0      599.2           ...       ...           ...       ...           ...       ...          355.9                                                                           
1      759.2           ...       ...           ...       ...           ...       ...          752.1  
2      954.5           ...       ...           ...       ...           ...       ...          928.5
3      190.9           ...       ...           ...       ...           ...       ...          299.2
4      465.2           ...       ...           ...       ...           ...       ...          478.5
...    ...             ...       ...           ...       ...           ...       ...          ...                                
8760   ...             ....      ...           ...       ...           ...       ...          ...
8761   NaN             NaN       NaN           ...       NaN           NaN        NaN         ...
...    NaN             NaN       NaN           ...       NaN           NaN        NaN         ...                   
8784   NaN             NaN       NaN           ...       NaN           NaN        NaN         ...

并考虑闰年。 任何帮助以获得我想要的
提前致谢。

【问题讨论】:

    标签: python pandas time-series


    【解决方案1】:

    我假设你有这个数据框:

                         date_time  system_load
    0   2013-01-01 00:00:00.000000        599.2
    1   2013-01-01 00:59:59.999999        759.2
    2   2013-01-01 02:00:00.000001        954.5
    3   2013-01-01 03:00:00.000000        190.9
    4   2013-01-01 03:59:59.999999        465.2
    5   2020-12-31 18:59:59.999999        355.9
    6   2020-12-31 20:00:00.000001        752.1
    7   2020-12-31 21:00:00.000000        928.5
    8   2020-12-31 21:59:59.999999        299.2
    9   2020-12-31 23:00:00.000001        478.5
    10  2020-12-31 23:00:01.000001        400.0
    

    然后:

    df["date_time"] = pd.to_datetime(df["date_time"])
    df["year"] = df["date_time"].dt.year
    df["index"] = df.groupby("year").transform("cumcount")
    
    print(
        df.pivot(columns="year", index="index", values="system_load").add_prefix(
            "Year"
        )
    )
    

    打印:

    year   Year2013  Year2020
    index                    
    0         599.2     355.9
    1         759.2     752.1
    2         954.5     928.5
    3         190.9     299.2
    4         465.2     478.5
    5           NaN     400.0
    

    【讨论】:

      猜你喜欢
      • 2021-08-03
      • 2020-07-13
      • 2021-11-14
      • 1970-01-01
      • 2019-04-05
      • 1970-01-01
      • 1970-01-01
      • 2018-01-21
      • 1970-01-01
      相关资源
      最近更新 更多