【问题标题】:Aligning time series data sets with missing values for plotting将时间序列数据集与缺失值对齐以进行绘图
【发布时间】:2020-04-13 23:17:17
【问题描述】:

我有三个缺失值的数据集,每个数据集由一个时间列和一个数据列组成。两行之间的最小时间差为 1 秒(00:00:01):

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81                          00:00:00    70
00:00:01    81                      
00:00:02    81                      
00:00:03    81                          00:00:03    99
00:00:04    81                          00:00:04    100
00:00:05    80      00:00:05    80      00:00:05    101
00:00:06    80      00:00:06    100         
                    00:00:07    92      00:00:07    88
00:00:08    83      00:00:08    80      00:00:08    88
00:00:09    84      00:00:09    83      00:00:09    87
00:00:10    86                      
00:00:11    89                      
00:00:12    90                      
00:00:13    92                          00:00:13    92
00:00:14    94                          00:00:14    94
00:00:15    94      00:00:15    96      00:00:15    93
00:00:16    96      00:00:16    97          
00:00:17    98      00:00:17    100     00:00:17    99
00:00:18    100                         00:00:18    99
00:00:19    101                         00:00:19    101
00:00:20    103                     

为了直观起见,上表显示了缺失值的空白字段。真实数据是密集的,例如看起来像这样:

Dataset 1:          Dataset 2:          Dataset 3:  
00:00:00    81      00:00:05    80      00:00:00    70
00:00:01    81      00:00:06    100     00:00:03    99
00:00:02    81      00:00:07    92      00:00:04    100
00:00:03    81      00:00:08    80      00:00:05    101
00:00:04    81      00:00:09    83      00:00:07    88
00:00:05    80      00:00:15    96      00:00:08    88
00:00:06    80      00:00:16    97      00:00:09    87
00:00:08    83      00:00:17    100     00:00:13    92
00:00:09    84                          00:00:14    94
00:00:10    86                          00:00:15    93
00:00:11    89                          00:00:17    99
00:00:12    90                          00:00:18    99
00:00:13    92                          00:00:19    101
00:00:14    94                      
00:00:15    94                      
00:00:16    96                      
00:00:17    98                      
00:00:18    100                     
00:00:19    101                     
00:00:20    103                     

现在我想对齐数据,以便可以这样绘制:

这样:

我的幼稚做法是这样的:

  1. 找出每个数据集中的最小/最大时间。
  2. 创建一个表,其中每次一行,三列,每列都以n/a 为值。
  3. 循环遍历每个数据集并将值分配给表。

是否有一些 Python 函数/库可以有效地执行这些步骤?或者有更好的方法吗?

问候,

【问题讨论】:

    标签: python pandas numpy matplotlib


    【解决方案1】:

    你可以concat所有DataFrames连同time列的索引:

    dfs = [df1, df2, df3]
    df = pd.concat([x.set_index('time')['val'] for x in dfs], 
                    axis=1, 
                    keys=['a','b','c'],
                    sort=True)
    print (df)
                  a      b      c
    00:00:00   81.0    NaN   70.0
    00:00:01   81.0    NaN    NaN
    00:00:02   81.0    NaN    NaN
    00:00:03   81.0    NaN   99.0
    00:00:04   81.0    NaN  100.0
    00:00:05   80.0   80.0  101.0
    00:00:06   80.0  100.0    NaN
    00:00:07    NaN   92.0   88.0
    00:00:08   83.0   80.0   88.0
    00:00:09   84.0   83.0   87.0
    00:00:10   86.0    NaN    NaN
    00:00:11   89.0    NaN    NaN
    00:00:12   90.0    NaN    NaN
    00:00:13   92.0    NaN   92.0
    00:00:14   94.0    NaN   94.0
    00:00:15   94.0   96.0   93.0
    00:00:16   96.0   97.0    NaN
    00:00:17   98.0  100.0   99.0
    00:00:18  100.0    NaN   99.0
    00:00:19  101.0    NaN  101.0
    00:00:20  103.0    NaN    NaN
    

    如果每个DataFrame中有些时候缺失,添加DataFrame.asfreq,但必须DatetimeIndex

    df.index = pd.to_datetime(df.index)
    df = df.asfreq('S')
    df.index = df.index.time
    print (df)
                  a      b      c
    00:00:00   81.0    NaN   70.0
    00:00:01   81.0    NaN    NaN
    00:00:02   81.0    NaN    NaN
    00:00:03   81.0    NaN   99.0
    00:00:04   81.0    NaN  100.0
    00:00:05   80.0   80.0  101.0
    00:00:06   80.0  100.0    NaN
    00:00:07    NaN   92.0   88.0
    00:00:08   83.0   80.0   88.0
    00:00:09   84.0   83.0   87.0
    00:00:10   86.0    NaN    NaN
    00:00:11   89.0    NaN    NaN
    00:00:12   90.0    NaN    NaN
    00:00:13   92.0    NaN   92.0
    00:00:14   94.0    NaN   94.0
    00:00:15   94.0   96.0   93.0
    00:00:16   96.0   97.0    NaN
    00:00:17   98.0  100.0   99.0
    00:00:18  100.0    NaN   99.0
    00:00:19  101.0    NaN  101.0
    00:00:20  103.0    NaN    NaN
    

    最后用于绘图使用DataFrame.plot

    df.plot()
    

    并且对于单独的每个情节:

    df.plot(subplots=True)
    

    【讨论】:

    • 谢谢!完美运行。我只需要添加一些插值df = df.interpolate(method ='linear')
    猜你喜欢
    • 2012-04-30
    • 2016-11-10
    • 2018-11-11
    • 1970-01-01
    • 2016-11-21
    • 2022-01-06
    • 1970-01-01
    • 2021-08-13
    • 1970-01-01
    相关资源
    最近更新 更多