【问题标题】:Hourly time series in minutes between two timestamps using Pandas使用 Pandas 的两个时间戳之间的每小时时间序列(以分钟为单位)
【发布时间】:2018-07-17 00:21:35
【问题描述】:

我有一系列带有开始时间和结束时间的时间戳。我想生成两个时间戳之间每小时的分钟数:

import pandas as pd

start_time = pd.to_datetime('2013-03-26 21:49:08',infer_datetime_format=True)
end_time = pd.to_datetime('2013-03-27 05:21:00, infer_datetime_format=True)

pd.date_range(start_time, end_time, freq='h')

给出:

DatetimeIndex(['2013-03-26 21:49:08', '2013-03-26 22:49:08',
               '2013-03-26 23:49:08', '2013-03-27 00:49:08',
               '2013-03-27 01:49:08', '2013-03-27 02:49:08',
               '2013-03-27 03:49:08', '2013-03-27 04:49:08'],
              dtype='datetime64[ns]', freq='H')

示例结果:我想计算开始和结束时间之间以小时为界的分钟数,如下所示:

 2013-03-26 21:00:00'  - 10m 52secs
 2013-03-26 22:00:00'  - 60 m 
 2013-03-26 23:00:00'  - 60 m

 2013-03-27 05:00:00'  - 21 m 

我看过 pandas resample,但不完全确定如何实现这一点。任何方向表示赞赏。

【问题讨论】:

    标签: python pandas timestamp time-series


    【解决方案1】:

    构造两个Series,分别对应每小时的开始和结束时间。使用clip_lowerclip_upper 将它们限制在您想要的时间跨度内,然后减去:

    # hourly range, floored to the nearest hour
    rng = pd.date_range(start_time.floor('h'), end_time.floor('h'), freq='h')
    
    # get the left and right endpoints for each hour
    # clipped to be inclusive of [start_time, end_time]
    left = pd.Series(rng, index=rng).clip_lower(start_time)
    right = pd.Series(rng + 1, index=rng).clip_upper(end_time)
    
    # construct a series of the lengths
    s = right - left
    

    结果输出:

    2013-03-26 21:00:00   00:10:52
    2013-03-26 22:00:00   01:00:00
    2013-03-26 23:00:00   01:00:00
    2013-03-27 00:00:00   01:00:00
    2013-03-27 01:00:00   01:00:00
    2013-03-27 02:00:00   01:00:00
    2013-03-27 03:00:00   01:00:00
    2013-03-27 04:00:00   01:00:00
    2013-03-27 05:00:00   00:21:00
    Freq: H, dtype: timedelta64[ns]
    

    【讨论】:

      【解决方案2】:

      这似乎是一个可行的解决方案:

      import pandas as pd
      import datetime as dt
      
      def bounded_min(t, range_time):
          """ For a given timestamp t and considered time interval range_time,
          return the desired bounded value in minutes and seconds"""
      
          # min() takes care of the end of the time interval, 
          # max() takes care of the beginning of the interval
          s = (min(t + dt.timedelta(hours=1), range_time.max()) - 
               max(t, range_time.min())).total_seconds() 
          if s%60:
              return "%dm %dsecs" % (s/60, s%60)
          else:
              return "%dm" % (s/60)
      
      start_time = pd.to_datetime('2013-03-26 21:49:08',infer_datetime_format=True)
      end_time = pd.to_datetime('2013-03-27 05:21:00', infer_datetime_format=True)
      
      range_time = pd.date_range(start_time, end_time, freq='h')
      # Include the end of the time range using the union() trick, as described at:
      # https://stackoverflow.com/questions/37890391/how-to-include-end-date-in-pandas-date-range-method
      range_time = range_time.union([end_time])
      
      # This is essentially timestamps for beginnings of hours 
      index_time = pd.Series(range_time).apply(lambda x: dt.datetime(year=x.year, 
                                    month=x.month, 
                                    day=x.day,
                                    hour=x.hour, 
                                    minute=0, 
                                    second=0))
      
      bounded_mins = index_time.apply(lambda x: bounded_min(x, range_time))
      
      # Put timestamps and values together
      bounded_df = pd.DataFrame(bounded_mins, columns=["Bounded Mins"]).set_index(index_time)
      
      print bounded_df
      

      一定要喜欢强大的 lambda 表达式:)。不过也许有更简单的方法。

      输出:

                            Bounded Mins
      2013-03-26 21:00:00   10m 52secs
      2013-03-26 22:00:00          60m
      2013-03-26 23:00:00          60m
      2013-03-27 00:00:00          60m
      2013-03-27 01:00:00          60m
      2013-03-27 02:00:00          60m
      2013-03-27 03:00:00          60m
      2013-03-27 04:00:00          60m
      2013-03-27 05:00:00          21m
      

      【讨论】:

      • 最后,您也可以使用以下方法在 bounded_mins 系列上设置索引,而不是创建数据框:bounded_mins.index = index_time
      【解决方案3】:

      在某种 for 循环中使用 datetime.timedelta() 似乎正是您要寻找的。​​p>

      https://docs.python.org/2/library/datetime.html#datetime.timedelta

      【讨论】:

      • 我尝试了像 t2 - t1 这样的 timedelta。但是,我想要两个时间戳之间所有小时的分钟数差异。
      猜你喜欢
      • 2021-05-14
      • 2021-05-12
      • 2021-01-04
      • 2020-06-08
      • 1970-01-01
      • 2020-02-11
      • 2014-05-20
      • 1970-01-01
      相关资源
      最近更新 更多