【问题标题】:Python: how to compute date ranges from a list of dates?Python:如何从日期列表中计算日期范围?
【发布时间】:2011-08-03 22:37:42
【问题描述】:

我有一个日期列表,例如:

['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08']

如何找到这些日期中包含的连续日期范围?在上面的例子中,范围应该是:

[{"start_date": '2011-02-27', "end_date": '2011-03-01'},
 {"start_date": '2011-04-12', "end_date": '2011-04-13'},
 {"start_date": '2011-06-08', "end_date": '2011-06-08'}
]

谢谢。

【问题讨论】:

  • 我什至不确定您是如何得出示例中的解决方案的。 '2011-02-28' 日期去哪儿了?
  • '2011-02-28' 在范围内 {"start_date": '2011-02-27', "end_date": '2011-03-01'}
  • 好的,那么你的第二个代码块,你拥有的字典列表,不是answer,而只是第二个参数?如果是这样,您能否发布您期望返回的结果?
  • 第二个代码块就是答案。整个想法是获取第一个列表中的 6 个日期并在连续的日期范围内表示它们,即第二个代码块中的 3 个范围。

标签: python datetime date python-datetime


【解决方案1】:

这行得通,但我对此不满意,将使用更清洁的解决方案并编辑答案。完成,这是一个干净、有效的解决方案:

import datetime
import pprint

def parse(date):
    return datetime.date(*[int(i) for i in date.split('-')])

def get_ranges(dates):
    while dates:
        end = 1
        try:
            while dates[end] - dates[end - 1] == datetime.timedelta(days=1):
                end += 1
        except IndexError:
            pass

        yield {
            'start-date': dates[0],
            'end-date': dates[end-1]
        }
        dates = dates[end:]

dates = [
    '2011-02-27', '2011-02-28', '2011-03-01',
    '2011-04-12', '2011-04-13',
    '2011-06-08'
]

# Parse each date and convert it to a date object. Also ensure the dates
# are sorted, you can remove 'sorted' if you don't need it
dates = sorted([parse(d) for d in dates]) 

pprint.pprint(list(get_ranges(dates)))

以及相对输出:

[{'end-date': datetime.date(2011, 3, 1),
  'start-date': datetime.date(2011, 2, 27)},
 {'end-date': datetime.date(2011, 4, 13),
  'start-date': datetime.date(2011, 4, 12)},
 {'end-date': datetime.date(2011, 6, 8),
  'start-date': datetime.date(2011, 6, 8)}]

【讨论】:

    【解决方案2】:

    正在尝试忍者 GaretJax 的编辑:;)

    def date_to_number(date):
      return datetime.date(*[int(i) for i in date.split('-')]).toordinal()
    
    def number_to_date(number):
      return datetime.date.fromordinal(number).strftime('%Y-%m-%d')
    
    def day_ranges(dates):
      day_numbers = set(date_to_number(d) for d in dates)
      start = None
      # We loop including one element guaranteed not to be in the set, to force the
      # closing of any range that's currently open.
      for n in xrange(min(day_numbers), max(day_numbers) + 2):
        if start == None:
          if n in day_numbers: start = n
        else:
          if n not in day_numbers: 
            yield {
              'start_date': number_to_date(start),
              'end_date': number_to_date(n - 1)
            }
            start = None
    
    list(
      day_ranges([
        '2011-02-27', '2011-02-28', '2011-03-01',
        '2011-04-12', '2011-04-13', '2011-06-08'
      ])
    )
    

    【讨论】:

    • 您是否知道您的解决方案执行了大量无用的迭代这一事实?在这个例子中是 103,我的用相同的数据集做了 4... ;-)
    • 哦,顺便说一句,这个数据集窒息:['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08', '2011-06-10']... ;-)
    • 是的,我确实弄错了算法,尤其是对于稀疏日期集。 :) 不过,对我来说,使用新数据集效果很好。
    【解决方案3】:
    from datetime import datetime, timedelta
    
    dates = ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08']
    d = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
    test = lambda x: x[1] - x[0] != timedelta(1)
    slices = [0] + [i+1 for i, x in enumerate(zip(d, d[1:])) if test(x)] + [len(dates)]
    ranges = [{"start_date": dates[s], "end_date": dates[e-1]} for s, e in zip(slices, slices[1:])]
    

    结果如下:

    >>> pprint.pprint(ranges)
    [{'end_date': '2011-03-01', 'start_date': '2011-02-27'},
     {'end_date': '2011-04-13', 'start_date': '2011-04-12'},
     {'end_date': '2011-06-08', 'start_date': '2011-06-08'}]
    

    slices 列表解析获取上一个日期不是当前日期前一天的所有索引。在前面加上0,在最后加上len(dates),每个日期范围都可以描述为dates[slices[i]:slices[i+1]-1]

    【讨论】:

      【解决方案4】:

      我对主题的细微改动(我最初构建了开始/结束列表并将它们压缩以返回元组,但我更喜欢@Karl Knechtel 的生成器方法):

      from datetime import date, timedelta
      
      ONE_DAY = timedelta(days=1)
      
      def find_date_windows(dates):
          # guard against getting empty list
          if not dates:
              return
      
          # convert strings to sorted list of datetime.dates
          dates = sorted(date(*map(int,d.split('-'))) for d in dates)
      
          # build list of window starts and matching ends
          lastStart = lastEnd = dates[0]
          for d in dates[1:]:
              if d-lastEnd > ONE_DAY:
                  yield {'start_date':lastStart, 'end_date':lastEnd}
                  lastStart = d
              lastEnd = d
          yield {'start_date':lastStart, 'end_date':lastEnd}
      

      这里是测试用例:

      tests = [
          ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08'],
          ['2011-06-08'],
          [],
          ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08', '2011-06-10'],
      ]
      for dates in tests:
          print dates
          for window in find_date_windows(dates):
              print window
          print
      

      打印:

      ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08']
      {'start_date': datetime.date(2011, 2, 27), 'end_date': datetime.date(2011, 3, 1)}
      {'start_date': datetime.date(2011, 4, 12), 'end_date': datetime.date(2011, 4, 13)}
      {'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)}
      
      ['2011-06-08']
      {'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)}
      
      []
      
      ['2011-02-27', '2011-02-28', '2011-03-01', '2011-04-12', '2011-04-13', '2011-06-08', '2011-06-10']
      {'start_date': datetime.date(2011, 2, 27), 'end_date': datetime.date(2011, 3, 1)}
      {'start_date': datetime.date(2011, 4, 12), 'end_date': datetime.date(2011, 4, 13)}
      {'start_date': datetime.date(2011, 6, 8), 'end_date': datetime.date(2011, 6, 8)}
      {'start_date': datetime.date(2011, 6, 10), 'end_date': datetime.date(2011, 6, 10)}
      

      【讨论】:

        【解决方案5】:

        这是一个替代解决方案:它返回一个 (start,finish) 的列表元组,因为这正是我所需要的 ;)。

        这会改变列表,所以我需要复制一份。显然,这会增加内存使用量。我怀疑 list.pop() 效率不是很高,但这可能取决于 python 中 list 的实现。

        def collapse_dates(date_list):
            if not date_list:
                return date_list
            result = []
            # We are going to alter the list, so create a (sorted) copy.
            date_list = sorted(date_list)
            while len(date_list):
                # Grab the first item: this is both the start and end of the range.
                start = current = date_list.pop(0)
                # While the first item in the list is the next day, pop that and
                # set it to the end of the range.
                while len(date_list) and date_list[0] == current + datetime.timedelta(1):
                    current = date_list.pop(0)
                # That's a completed range.
                result.append((start,current))
        
            return result
        

        您可以轻松更改附加行以附加 dict,或 yield 而不是附加到列表。

        哦,我假设它们已经是约会对象了。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2020-07-31
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2017-05-16
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多