【问题标题】:split python list given the index start给定索引开始拆分python列表
【发布时间】:2017-02-08 12:38:14
【问题描述】:

我看过这个:Split list into sublist based on index ranges

但我的问题略有不同。 我有一个清单

List = ['2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report']

我需要根据日期将其拆分为子列表。基本上它是一个事件日志,但由于糟糕的数据库设计,系统将事件的单独更新消息连接到一个大字符串列表中。 我有:

Event_indices = [i for i, word in enumerate(List) if 
                 re.match(date_regex_return_all = "(\d+\-\d+\-\d+",word)]

在我的例子中会给出:

[0,3]

现在我需要根据索引将列表拆分为单独的列表。所以对于我的例子来说,理想情况下我想得到:

[List[0], [List[1], List[2]]], [List[3], [List[4],  List[5], List[6]] ]

所以格式是:

[event_date, [list of other text]], [event_date, [list of other text]]

还有一些极端情况,没有日期字符串,格式如下:

Special_case = ['blah', 'blah', 'stuff']
Special_case_2 = ['blah', 'blah', '2015-01-01', 'blah', 'blah']

result_special_case = ['', [Special_case[0], Special_case[1],Special_case[2] ]]
result_special_case_2 = [ ['', [ Special_case_2[0], Special_case_2[1] ] ], 
                          [Special_case_2[2], [ Special_case_2[3],Special_case_2[4] ] ] ]

【问题讨论】:

  • 格式[event_date, [list of other text]]与输出[List[3], List[4]]不匹配,是[List[3], [List[4]]]吗?还有没有日期字符串的情况,输入@的期望输出是什么987654331@?[date, [thing1]], ["", thing2][date, thing1, thing2]?
  • 输入列表中的所有空字符串是否都被视为无日期字符串的情况?
  • 修复了这个例子。我后来修改了这个例子,忘了修改它。` [date, [thing1]], ["", [thing2] ]` 是没有日期字符串的期望输出。是的,所有空都视为没有日期字符串
  • 我还是不知道你对没有日期字符串的定义是什么,你能具体说明一下吗?
  • 添加了特殊情况的示例输入和结果

标签: python string list


【解决方案1】:

您根本不需要执行两次分组,因为您可以使用itertools.groupby 在一次遍历中按日期及其相关事件进行分段。通过避免需要计算索引然后使用它们对list 进行切片,您可以处理一个一次提供一个值的生成器,从而避免在您的输入很大时出现内存问题。为了演示,我采用了您原来的 List 并对其进行了扩展,以显示它可以正确处理边缘情况:

import re

from itertools import groupby

List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
        '2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report',
        '2016-01-03']

datere = re.compile(r"\d+\-\d+\-\d+")  # Precompile regex for speed
def group_by_date(it):
    # Make iterator that groups dates with dates and non-dates with dates
    grouped = groupby(it, key=lambda x: datere.match(x) is not None)
    for isdate, g in grouped:
        if not isdate:
            # We had a leading set of undated events, output as undated
            yield ['', list(g)]
        else:
            # At least one date found; iterate with one loop delay
            # so final date can have events included (all others have no events)
            lastdate = next(g)
            for date in g:
                yield [lastdate, []]
                lastdate = date

            # Final date pulls next group (which must be events or the end of the input)
            try:
                # Get next group of events
                events = list(next(grouped)[1])
            except StopIteration:
                # There were no events for final date
                yield [lastdate, []]
            else:
                # There were events associated with final date
                yield [lastdate, events]

print(list(group_by_date(List)))

哪些输出(为可读性添加了换行符):

[['', ['undated', 'garbage', 'then', 'twodates']],
 ['2015-12-31', []],
 ['2016-01-01', ['stuff happened', 'details']],
 ['2016-01-02', ['more stuff happened', 'details', 'report']],
 ['2016-01-03', []]]

【讨论】:

    【解决方案2】:

    试试:

    def split_by_date(arr, patt='\d+\-\d+\-\d+'):
        results = []
        srch = re.compile(patt)
        rec = ['', []]
        for item in arr:
            if srch.match(item):
                if rec[0] or rec[1]:
                    results.append(rec)
                rec = [item, []]
            else:
                rec[1].append(item)
        if rec[0] or rec[1]:
            results.append(rec)
        return results
    

    然后:

    normal_case = ['2016-01-01', 'stuff happened', 'details', 
                   '2016-01-02', 'more stuff happened', 'details', 'report']
    special_case_1 = ['blah', 'blah', 'stuff', '2016-11-11']
    special_case_2 = ['blah', 'blah', '2015/01/01', 'blah', 'blah']
    
    print(split_by_date(normal_case))
    print(split_by_date(special_case_1))
    print(split_by_date(special_case_2, '\d+\/\d+\/\d+'))
    

    【讨论】:

      猜你喜欢
      • 2018-07-24
      • 1970-01-01
      • 2016-09-01
      • 2022-01-24
      • 1970-01-01
      • 1970-01-01
      • 2017-11-13
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多