Twisted Reactor not Restartable - 重复使用异步线程答案

【问题标题】：Twisted Reactor not Restartable - Using asynchronous threading repeatedlyTwisted Reactor not Restartable - 重复使用异步线程
【发布时间】：2015-10-05 22:15:39
【问题描述】：

我有一个 URL 列表。我想每 10 秒异步获取他们的内容。

urls = [
 'http://www.python.org', 
 'http://stackoverflow.com', 
 'http://www.twistedmatrix.com', 
 'http://www.google.com',
 'http://launchpad.net',
 'http://github.com',
 'http://bitbucket.org',
]

waiting = [client.getPage(url) for url in urls]
defer.gatherResults(waiting).addCallback(saveResults)

reactor.run()

我该怎么做？此代码允许我只获取一次 urls 内容。再次调用它会抛出error.ReactorNotRestartable()

谢谢:)

【问题讨论】：

本例中没有线程。你打算加入一些吗？

标签： multithreading twisted reactor

【解决方案1】：

Twisted 绝对可以做到这一点。

首先，尽管这与您的问题有些无关，但请不要使用getPage。这是一个非常有限的 API，HTTPS 安全性的默认设置很差。请改用Treq。

现在，谈谈你的主要问题。

了解reactor.run() 的重要一点是，它并不意味着“在此处运行此代码”。它的意思是“运行整个程序”。当reactor.run() 退出时，您的程序就该退出了。

幸运的是，Twisted 有一个很好的内置方式来定期做事：LoopingCall。

这是一个工作示例，使用 treq 和 LoopingCall：

urls = [
    'http://www.python.org',
    'http://stackoverflow.com',
    'http://www.twistedmatrix.com',
    'http://www.google.com',
    'http://launchpad.net',
    'http://github.com',
    'http://bitbucket.org',
]

from twisted.internet.task import LoopingCall
from twisted.internet.defer import gatherResults
from treq import get, content

def fetchWebPages():
    return (gatherResults([get(url).addCallback(content) for url in urls])
            .addCallback(saveResults))

def saveResults(responses):
    print("total: {} bytes"
          .format(sum(len(response) for response in responses)))


repeatedly = LoopingCall(fetchWebPages)
repeatedly.start(10.0)

from twisted.internet import reactor
reactor.run()

作为奖励，这可以处理fetchWebPages 花费超过 10 秒的情况，并且会做出明智的反应，而不是让太多未完成的请求堆积起来，或者随着请求花费的时间越来越长而延迟越来越长。

【讨论】：