给定索引开始拆分python列表答案

【问题标题】：split python list given the index start给定索引开始拆分python列表
【发布时间】：2017-02-08 12:38:14
【问题描述】：

我看过这个：Split list into sublist based on index ranges

但我的问题略有不同。我有一个清单

List = ['2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report']

我需要根据日期将其拆分为子列表。基本上它是一个事件日志，但由于糟糕的数据库设计，系统将事件的单独更新消息连接到一个大字符串列表中。我有：

Event_indices = [i for i, word in enumerate(List) if 
                 re.match(date_regex_return_all = "(\d+\-\d+\-\d+",word)]

在我的例子中会给出：

[0,3]

现在我需要根据索引将列表拆分为单独的列表。所以对于我的例子来说，理想情况下我想得到：

[List[0], [List[1], List[2]]], [List[3], [List[4],  List[5], List[6]] ]

所以格式是：

[event_date, [list of other text]], [event_date, [list of other text]]

还有一些极端情况，没有日期字符串，格式如下：

Special_case = ['blah', 'blah', 'stuff']
Special_case_2 = ['blah', 'blah', '2015-01-01', 'blah', 'blah']

result_special_case = ['', [Special_case[0], Special_case[1],Special_case[2] ]]
result_special_case_2 = [ ['', [ Special_case_2[0], Special_case_2[1] ] ], 
                          [Special_case_2[2], [ Special_case_2[3],Special_case_2[4] ] ] ]

【问题讨论】：

格式[event_date, [list of other text]]与输出[List[3], List[4]]不匹配，是[List[3], [List[4]]]吗？还有没有日期字符串的情况，输入@的期望输出是什么987654331@?[date, [thing1]], ["", thing2] 或[date, thing1, thing2]?
输入列表中的所有空字符串是否都被视为无日期字符串的情况？
修复了这个例子。我后来修改了这个例子，忘了修改它。` [date, [thing1]], ["", [thing2] ]` 是没有日期字符串的期望输出。是的，所有空都视为没有日期字符串
我还是不知道你对没有日期字符串的定义是什么，你能具体说明一下吗？
添加了特殊情况的示例输入和结果

标签： python string list

【解决方案1】：

您根本不需要执行两次分组，因为您可以使用itertools.groupby 在一次遍历中按日期及其相关事件进行分段。通过避免需要计算索引然后使用它们对list 进行切片，您可以处理一个一次提供一个值的生成器，从而避免在您的输入很大时出现内存问题。为了演示，我采用了您原来的 List 并对其进行了扩展，以显示它可以正确处理边缘情况：

import re

from itertools import groupby

List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
        '2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report',
        '2016-01-03']

datere = re.compile(r"\d+\-\d+\-\d+")  # Precompile regex for speed
def group_by_date(it):
    # Make iterator that groups dates with dates and non-dates with dates
    grouped = groupby(it, key=lambda x: datere.match(x) is not None)
    for isdate, g in grouped:
        if not isdate:
            # We had a leading set of undated events, output as undated
            yield ['', list(g)]
        else:
            # At least one date found; iterate with one loop delay
            # so final date can have events included (all others have no events)
            lastdate = next(g)
            for date in g:
                yield [lastdate, []]
                lastdate = date

            # Final date pulls next group (which must be events or the end of the input)
            try:
                # Get next group of events
                events = list(next(grouped)[1])
            except StopIteration:
                # There were no events for final date
                yield [lastdate, []]
            else:
                # There were events associated with final date
                yield [lastdate, events]

print(list(group_by_date(List)))

哪些输出（为可读性添加了换行符）：

[['', ['undated', 'garbage', 'then', 'twodates']],
 ['2015-12-31', []],
 ['2016-01-01', ['stuff happened', 'details']],
 ['2016-01-02', ['more stuff happened', 'details', 'report']],
 ['2016-01-03', []]]

【讨论】：

【解决方案2】：

试试：

def split_by_date(arr, patt='\d+\-\d+\-\d+'):
    results = []
    srch = re.compile(patt)
    rec = ['', []]
    for item in arr:
        if srch.match(item):
            if rec[0] or rec[1]:
                results.append(rec)
            rec = [item, []]
        else:
            rec[1].append(item)
    if rec[0] or rec[1]:
        results.append(rec)
    return results

然后：

normal_case = ['2016-01-01', 'stuff happened', 'details', 
               '2016-01-02', 'more stuff happened', 'details', 'report']
special_case_1 = ['blah', 'blah', 'stuff', '2016-11-11']
special_case_2 = ['blah', 'blah', '2015/01/01', 'blah', 'blah']

print(split_by_date(normal_case))
print(split_by_date(special_case_1))
print(split_by_date(special_case_2, '\d+\/\d+\/\d+'))

【讨论】：