在 Tornado 协程中使用常规 Python 生成器答案

【问题标题】：Using regular Python generator in Tornado coroutine在 Tornado 协程中使用常规 Python 生成器
【发布时间】：2016-02-02 14:22:24
【问题描述】：

Python 生成器是一个很棒的功能。它允许我对复杂的、可能是递归的遍历逻辑进行编码，并将其与用户分离。通常我像下面这段代码一样使用它

TREE = {
  1: [2,3],
  2: [],
  3: [4,5],
  4: [6],
  5: [],
  6: []
  }   

def query_children(node):
    return TREE[node]

def walk_tree(root):
    # recursive tree traversal logic
    yield root
    children = query_children(root)
    for child in children:
        for node in walk_tree(child):
            yield node

def do_something():
   # nice linear iterator
   for node in walk_tree(root):
       print(node)

Tornado 使用生成器实现协程，这也是构建没有回调的异步函数的好方法。

但是，当我尝试同时使用这两种方法时，我可能会感到困惑。

@gen.coroutine
def query_children(node):
    ...
    raise gen.Return(children)


def walk_tree(root):
    # recursive tree traversal logic
    yield root
    children = yield query_children(root)
    for child in children:
        for node in walk_tree(child):
            yield node


def do_something():
   # nice linear iterator
   for node in walk_tree(root):
       print(node)

在新的 walk_tree 中，第一个收益是常规 Python 收益。第二个收益是 Tornado 的。他们可以一起工作吗？

【问题讨论】：

我想知道你有没有忘记在第 5 行使用生成器 walk_tree(child) ？您只创建生成器对象但不使用它。您必须再添加一个循环：for node in walk_tree(child): yield node
已编辑以使代码可运行。

标签： python generator tornado coroutine

【解决方案1】：

我得到这个工作。我没有在非协程walk_tree() 中使用yield，而是通过调用IOLoop.run_sync 来同步运行它。我是龙卷风新手。因此，如果这是一个合法的解决方案或者是否有任何其他更好的方法，请发表评论。

TREE = {
  1: [2,3],
  2: [],
  3: [4,5],
  4: [6],
  5: [],
  6: []
  }   

@gen.coroutine
def query_children_async(node):
    raise gen.Return(TREE[node])

# this is a regular Python generator
def walk_tree(root):
    # recursive tree traversal logic
    yield root
    # call .result() of the Future
    children = IOLoop.instance().run_sync(lambda: query_children_async(root))
    for child in children:
        for node in walk_tree(child):
            yield node

@gen.coroutine
def do_something(root):
    # just collect the result
    result = [node for node in walk_tree(root)]
    raise gen.Return(result)

编辑 1. 使用.result() 的原始提案不起作用。我有当我有一个“DummyFuture 不支持阻止结果”时不平凡的query_children_async()。

【讨论】：

将提交的.result() 替换为IOLoop.instance().run_sync。
从另一个协程内部调用run_sync 是不安全的。 IOLoop 应该启动一次，在文件的顶层使用run_sync 或start，然后在程序运行期间一直运行。
如果有使用.result()的希望有任何见解吗？
没有。在 Tornado 中，.result() 仅与 .add_done_callback() 一起使用，并且很少显式使用（通常由 yield 调用）

【解决方案2】：

Python 生成器协议基于同步接口；不可能像协程一样使用异步代码作为生成器的一部分来与for一起使用（协程最重要的规则：任何调用协程的东西也必须是协程，或者至少知道协程。for 语句对它们一无所知，它就是你的生成器）。相反，我建议使用tornado.queues.Queue:

@gen.coroutine
def query_children(node):
    ...
    raise gen.Return(children)


def walk_tree(queue, root):
    # recursive tree traversal logic
    yield root
    children = yield query_children(root)
    for child in children:
        for node in walk_tree(child):
            yield queue.put(node)
     yield queue.put(None)


def do_something():
    queue = tornado.queues.Queue()
    IOLoop.current().spawn_callback(walk_tree, queue, root)
    while True:
        node = yield queue.get()
        if node is None:
            break
        print(node)

【讨论】：

不幸的是，与生成器上的 for 循环相比，使用 Queue 会为do_something() 增加大量开销:( 除非有其他建议，否则将 Queue 标记为已接受的答案。
另一种可能性是将callback 参数添加到walk_tree，并为每个节点调用回调而不是产生它或将其放入队列中。这是可用的最快选项，尽管有时将事物构造为回调可能会很尴尬。
我可以将Queue(1) 视为异步生成器或管道吗？使用Queue(1)从电机fetch_next导出100000+文档有什么问题吗？
我不会说队列是管道，但您绝对可以使用队列构建管道。如果您有 100k+ 个文档，Queue(1) 的性能可能不如具有更大缓冲区的文档，但它仍然可以工作。
@WaiYipTung 当你说“大量的开销”时——指的是什么？运行时间？代码复杂度？