流生成器。详细使用迭代器答案

【问题标题】：Stream generator. Using an iterator in detail流生成器。详细使用迭代器
【发布时间】：2021-10-29 13:34:29
【问题描述】：

我试图弄清楚迭代器如何与这个例子一起工作：

有一个函数为给定的可迭代对象（列表、生成器等）生成流生成器，其元素包含位置和值，并按外观顺序排序。流生成器等于初始流（无位置），间隙用零填充。

from itertools import count

def gen_stream(total, sorted_iterable, extractor=lambda x: x):
    sorted_iterator = iter(sorted_iterable)
    iterable = count() if total is None else range(total)
    try:
        current_extracted_record = extractor(next(sorted_iterator))
    except StopIteration:
        current_extracted_record = None
    for i in iterable:
        if current_extracted_record:
            if i == current_extracted_record[0]:
                try:
                    yield current_extracted_record[1]
                    current_extracted_record = extractor(next(sorted_iterator))
                except StopIteration:
                    current_extracted_record = None
            else:
                yield 0
        else:
            yield 0

例如：

gen = gen_stream(9,[(4,111),(7,12)])
list(gen) 
[0, 0, 0, 0, 111, 0, 0, 12, 0] # first element has zero index, so 111 located on fifth position, 12 located on 8th position

此功能还支持自定义位置值提取器，用于更高级的情况，例如

def day_extractor(x):
    months = [31, 28, 31, 30, 31, 31, 30, 31, 30, 31, 30, 31]
    acc = sum(months[:x[1] - 1]) + x[0] - 1
    return acc, x[2]

precipitation_days = [(3,1,4),(5,2,6)]
list(gen_stream(59,precipitation_days,day_extractor)) #59: January and February to limit output
[0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

precipitation_days 格式如下：(d,m,mm)，其中 d - 月中的天数，m - 月，mm - 降水量，以毫米为单位因此，例如：

(3,1,4) # January,3 precipitation: 4 mm
(5,2,6) # February,5 precipitation: 6 mm

提取器作为带有默认值的可选第三个参数传递 - 处理（位置，值）对的 lambda 函数，如第一个示例中所示。

问题从这里开始

问题 1 可以换吗

    try:
        current_extracted_record = extractor(next(sorted_iterator))
    except StopIteration:
        current_extracted_record = None

使用函数next的默认值，而不是使用一行代码捕获异常StopIteration

current_extracted_record = extractor(next((sorted_iterator), None))

在其他情况下它是否总是能正常工作？

问题2如何通过使用next()方法的默认值和循环while而不是循环为。理论上，代码应该更短。

    for i in iterable:
        if current_extracted_record:
            if i == current_extracted_record[0]:
                try:
                    yield current_extracted_record[1]
                    current_extracted_record = extractor(next(sorted_iterator))
                except StopIteration:
                    current_extracted_record = None
            else:
                yield 0
        else:
            yield 0

问题 3 这似乎是一个愚蠢的问题，但据我了解，提取器没有索引。那么方括号中的数字是什么意思呢？

current_extracted_record[0] 
current_extracted_record[1]

如果你能帮忙，谢谢。

对于线程中的 3 个问题，我深表歉意，但在我看来，它们以不同的细节描述了同一个问题。

答案（问题 1 和问题 2）

def gen_stream(total, sorted_iterable, extractor=lambda x: x):
    elem_iter = iter(map(extractor, sorted_iterable))
    pos, val = next(elem_iter, (None, None))
    cnt = 0
    while total is None or cnt < total:
        if cnt == pos:
            yield val
            pos, val = next(elem_iter, (None, None))
        else:
            yield 0
        cnt += 1

【问题讨论】：

这里的每个问题都应该关注一个问题/问题，而不是问题列表。
@Prophet 抱歉，问题 1 和问题 2 几乎相同。问题 3 只是对第二个问题的解释。我同意问题 4 - 它是分开的。我应该删除它吗？
我不是这里的版主，但是这个问题可能会因此而被简单地关闭。做什么由你决定

标签： python function loops iterator generator

【解决方案1】：

我会将整个函数简化为

from itertools import count, islice, chain, repeat


def gen_stream(sorted_iterable, *, extractor=lambda x: x, total=None):
    itr = chain(map(extractor, sorted_iterable), repeat(None))

    current = next(itr)
    for i in islice(count(), total):
        if current is None or i != current[0]:
            yield 0
        else:
            yield current[1]
            current = next(itr)

我们创建一个迭代器，其元素是应用于原始迭代的extractor 的返回值，然后是None，只要有必要。无需明确检查StopIteration。

如果total是None，那么islice(count(), total)等价于count()；否则，等同于range(total)。

或者，

def gen_stream(sorted_iterable, *, extractor=lambda x: x, total=None):
    def stream():
        pos = 0
        for new_pos, value in map(extractor, sorted_iterable):
            yield from repeat(0, new_pos - pos)
            yield value
            pos = new_pos + 1
    yield from islice(chain(stream(), repeat(0)), total)

内部生成器会用 0 修补间隙，然后最终的 yield from 可以生成所需的流。

在每种情况下，我都将两个可选参数设为仅关键字，因为我一直输入gen_stream([...], 9)，不小心设置了提取器而不是总数。如果您愿意，您可以保留原始签名或其他一些变体。

【讨论】：

切片计数的好主意。不过，无需重复None。你也可以使用我的更好的默认技巧。
是的，替代版本只是各种损坏。我认为较新的版本工作正常。
是的，新的看起来不错:-)。不过，仍然不喜欢第一个的repeat(None)。这表明您可能会反复阅读，但您不会。所以这是误导/混淆。我会用[None] 替换它。
你可能反复阅读它，如果你要求的元素比可迭代提供的更多。它只是将所有“如果引发异常，返回 0”的情况转换为“如果迭代器产生 None，返回 0”的情况。
不，一旦你有了None，你就再也不会读到它了。然后你继续进入if 分支，只产生零。你再也不用带着next(itr) 进入else 分支了。

【解决方案2】：

问题 1

不，如果next 返回None 并且extractor 无法处理，extractor(next(sorted_iterator, None)) 将失败。例如，您的 day_extractor 不能（它崩溃）并且默认身份提取器不返回一对索引和值，因此稍后代码将失败。

问题 2

您可以通过将extractor 映射到sorted_iterable 并要求那个的next 值来缩短整个过程：

def gen_stream(total, sorted_iterable, extractor=lambda x: x):
    specials = map(extractor, sorted_iterable)
    iterable = count() if total is None else range(total)
    current_extracted_record = next(specials, None)
    for i in iterable:
        if current_extracted_record and i == current_extracted_record[0]:
            yield current_extracted_record[1]
            current_extracted_record = next(specials, None)
        else:
            yield 0

由于您没有负索引，我们也可以使用[-1] 代替None 来缩短内部if 条件（或者实际上没有在count() 中的任何内容，例如[None]）。让我也重命名一下。

def gen_stream(total, sorted_iterable, extractor=lambda x: x):
    specials = map(extractor, sorted_iterable)
    indexes = count() if total is None else range(total)
    next_special = next(specials, [-1])
    for i in indexes:
        if i == next_special[0]:
            yield next_special[1]
            next_special = next(specials, [-1])
        else:
            yield 0

问题 3

current_extracted_record 是提取器返回的内容，即下一个特殊的（索引、值）pair。其中有索引 0 和 1。

【讨论】：

这样的代码可以与无限生成器一起使用吗？这种情况下会不会有内存问题？
是的。不，这都是懒惰的。试试看:-)
你能把我的回答和这个帖子里的其他人比较一下吗？

【解决方案3】：

您的第一个问题是问next 的两个参数形式是否可以帮助您简化代码：

try:
    current_extracted_record = extractor(next(sorted_iterator))
except StopIteration:
    current_extracted_record = None

我的回答是可能的。您不能只将None 作为现有next 调用的第二个参数，因为不能保证extractor 函数可以处理将None 作为输入。当前代码永远不会将None 传递给extractor，虽然默认NOOP 函数支持None，但其他提取器函数（如您的日期索引之一）可能对此不满意。现在，您可以更改 main 函数的规范以要求任何提取器支持 None（例如，通过不变地传递它），但如果不这样做，您至少需要一点复杂性来处理 None。这是我的做法：

raw_record = next(sorted_iterator, None)
current_extracted_record = extractor(raw_record) if raw_record is not None else None

您的第二个问题询问您是否可以以某种方式简化循环。我认为循环本身没有任何明显的简化（使用while 而不是for 可能没有效率），但是您可以将两个if 语句简化为一个，因为它们都是如果条件不满足，做同样的事情（yield 0）：

for i in iterable:
    if current_extracted_record and i == current_extracted_record[0]: # combine both conditions
        yield current_extracted_record[1]
        raw_record = next(sorted_iterator, None)      # might as well use the code from q1 here
        current_extracted_record = extractor(raw_record) if raw_record is not None else None
    else:
        yield 0

至于你的第三个问题，我不确定你为什么认为current_extracted_record 不能被索引。它是一个 2 元组，所以总会有两个可以访问的项目。在代码中，它们是一个索引和一个值。使用默认的extractor，它们直接来自输入列表（例如(4, 111)），但是使用自定义提取器，它们可以从原始值计算出来，而不是直接成为它的一部分（例如acc和 x[2] 由您的 day_extractor 函数生成的值）。

【讨论】：

不太安全，例如考虑gen_stream(5, [None], lambda _: (2, 'aha!'))（即，您可能会将given None 与默认的混淆）。