【问题标题】:Python pandas time series interpolation datetime dataPython pandas时间序列插值日期时间数据
【发布时间】:2018-02-02 08:21:55
【问题描述】:

来自this question 我知道如何使用给定的时间戳插入时间序列。我想知道如何使用给定值插入时间戳,例如以下示例,以获得估计的 NaT 值。

interval            datetime             
0.782296 2012-11-19 12:40:10
0.795469                 NaT
0.821426 2012-11-19 12:35:10
0.834957                 NaT
0.864383 2012-11-19 12:30:10
0.906240 2012-11-19 12:25:10

附:我曾尝试直接使用df['datetime'].interpolate(),但失败了。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    这似乎有效。可能可以稍微清理一下代码。但是你明白了它的要点

    from datetime import datetime
    import pandas as pd
    import time
    
    #Create data
    df = pd.DataFrame({ 'interval' : [0.782296, 0.795469, 0.821426, 0.834957, 
                                      0.864383, 0.906240],
                        'datetime' : [datetime(2012, 11, 19, 12, 40, 10), pd.NaT,
                                      datetime(2012, 11, 19, 12, 35, 10), pd.NaT,
                                      datetime(2012, 11, 19, 12, 30, 10),
                                      datetime(2012, 11, 19, 12, 25, 10)
                                      ]})
    
    
    #Cast date to seconds (also recast the NaT to Nan)
    df['seconds'] = [time.mktime(t.timetuple()) if t is not pd.NaT else float('nan') for t in df['datetime'] ]
    
    #Set the interval as the index, as interpolation uses the index
    df.set_index('interval', inplace=True)
    #Use the 'values'-argument to actually use the values of the index and not the spacing
    df['intepolated'] = df['seconds'].interpolate('values')
    #Cast the interpolated seconds back to datetime
    df['datetime2'] = [datetime.utcfromtimestamp(t) for t in df['intepolated']]
    
    #Clean up
    df.reset_index(inplace=True)
    df = df[['interval', 'datetime2']]
    
    >>>>df
    Out[25]: 
       interval                  datetime2
    0  0.782296 2012-11-19 11:40:10.000000
    1  0.795469 2012-11-19 11:38:29.005878
    2  0.821426 2012-11-19 11:35:10.000000
    3  0.834957 2012-11-19 11:33:35.503178
    4  0.864383 2012-11-19 11:30:10.000000
    5  0.906240 2012-11-19 11:25:10.000000
    

    希望这是你想要的。

    【讨论】:

    • 感谢您的回答,我也在考虑将日期时间转换为浮点数。
    • 没问题。编辑了答案,因为它第一次并不正确。我省略了插值函数中的“值”参数。
    【解决方案2】:

    使用RedBlackPy 非常容易。它适用于 python 3 的 macosx 和 linux。

    import redblackpy as rb
    from datetime import datetime
    
    # do not keep Nat values, it is redundantly
    # you can interpolate data with no add keys to container
    index = [0.782296, 0.821426, 0.864383, 0.906240]
    values = [datetime(2012, 11, 19, 12, 40, 10),
              datetime(2012, 11, 19, 12, 35, 10),
              datetime(2012, 11, 19, 12, 30, 10),
              datetime(2012, 11, 19, 12, 25, 10) ]
    
    # init Series with specific interpolation type (floor, ceil, nn, linear)
    data = rb.Series(index=index, values=values, dtype='object',  
                     interpolation='linear')
    

    现在您可以使用插值按任意键访问!

    # your index, where you wanted to interpolate
    int_index = [0.795469, 0.834957]
    # access to key that not in series
    print( data[int_index[0]] ) # this prints 2012-11-19 12:38:29.005878
    
    # you change interpolation type
    data.set_interpolation('floor')
    
    print( data[int_index[0]] ) # this prints 2012-11-19 12:40:10
    

    如果您想向 Series 添加插值,只需使用 insert 或 setitem,如下所示:

    # this add interpolation values to data 
    for el in int_index:
        data[el] = data[el]
    
    print(data)
    

    由于最新的插值是'floor' print(data) 的结果:

    Series object Untitled
    0.782296: 2012-11-19 12:40:10
    0.795469: 2012-11-19 12:40:10
    0.821426: 2012-11-19 12:35:10
    0.834957: 2012-11-19 12:35:10
    0.864383: 2012-11-19 12:30:10
    0.90624: 2012-11-19 12:25:10
    

    【讨论】:

      【解决方案3】:

      @montysporty 的简短变体答案:

      >>> import time
      >>> df.set_index('interval', inplace=True)
      >>> df['datetime'].map(lambda x: time.mktime(pd.datetime.timetuple(x)) if not pd.isna(x) else np.nan)\
                        .interpolate('values')\
                        .map(pd.datetime.utcfromtimestamp)
      
      interval
      0.782296   2012-11-19 08:40:10.000000
      0.795469   2012-11-19 08:38:29.005878
      0.821426   2012-11-19 08:35:10.000000
      0.834957   2012-11-19 08:33:35.503178
      0.864383   2012-11-19 08:30:10.000000
      0.906240   2012-11-19 08:25:10.000000
      Name: datetime, dtype: datetime64[ns]
      

      【讨论】:

        猜你喜欢
        • 2021-12-09
        • 1970-01-01
        • 2020-10-22
        • 1970-01-01
        • 2021-05-09
        • 2021-06-04
        • 2016-01-21
        • 1970-01-01
        • 2021-12-15
        相关资源
        最近更新 更多