扭曲的反应堆没有在scrapy中重新启动答案

【问题标题】：Twisted Reactor not restarting in scrapy扭曲的反应堆没有在scrapy中重新启动
【发布时间】：2018-03-02 23:30:09
【问题描述】：

我正在尝试使用python-telegram-bot API 包装器通过 Telegram 机器人运行爬虫。使用下面的代码，我可以成功执行蜘蛛并将抓取的结果转发给机器人，但自从我运行脚本以来只有一次。当我尝试通过机器人（电报机器人命令）重新执行蜘蛛时，我收到错误twisted.internet.error.ReactorNotRestartable。

from twisted.internet import reactor
from scrapy import cmdline
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters, RegexHandler
import logging
import os
import ConfigParser
import json
import textwrap
from MIS.spiders.moodle_spider import MySpider
from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerRunner, CrawlerProcess
from scrapy.utils.log import configure_logging


# Read settings from config file
config = ConfigParser.RawConfigParser()
config.read('./spiders/creds.ini')
TOKEN = config.get('BOT', 'TOKEN')
#APP_NAME = config.get('BOT', 'APP_NAME')
#PORT = int(os.environ.get('PORT', '5000'))
updater = Updater(TOKEN)

# Setting Webhook
#updater.start_webhook(listen="0.0.0.0",
#                      port=PORT,
#                      url_path=TOKEN)
#updater.bot.setWebhook(APP_NAME + TOKEN)

logging.basicConfig(format='%(asctime)s -# %(name)s - %(levelname)s - %(message)s',level=logging.INFO)

dispatcher = updater.dispatcher

# Real stuff

def doesntRun(bot, update):
    #process = CrawlerProcess(get_project_settings())
    #process.crawl(MySpider)
    #process.start()
    ############

    configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
    runner = CrawlerRunner({
        'FEED_FORMAT' : 'json',
        'FEED_URI' : 'output.json'
        })

    d = runner.crawl(MySpider)
    d.addBoth(lambda _: reactor.stop())
    reactor.run(installSignalHandlers=0) # the script will block here until the crawling is finished
    #reactor.stop()

    with open("./output.json", 'r') as file:
        contents = file.read()
        a_r = json.loads(contents)
        AM = a_r[0]['AM']
        ...
        ...

        message_template = textwrap.dedent("""
                AM: {AM}
                ...
                """)
        messageContent = message_template.format(AM=AM, ...)
        #print messageContent
        bot.sendMessage(chat_id=update.message.chat_id, text=messageContent)
        #reactor.stop()


# Handlers
test_handler = CommandHandler('doesntRun', doesntRun)

# Dispatchers
dispatcher.add_handler(test_handler)

updater.start_polling()
updater.idle()

我正在使用文档中的代码：https://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script

代码如下：

from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()

d = runner.crawl(MySpider)
d.addBoth(lambda _: reactor.stop())
reactor.run() # the script will block here until the crawling is finished

【问题讨论】：

重复stackoverflow.com/questions/39946632/…
链接的重复问题甚至没有明确的答案。它使用CrawlerProcess，而不是在我的代码中使用CrawlerRunner。
看看这些对stackoverflow.com/questions/1979112/…、blog.pythonlibrary.org/2016/09/14/restarting-a-twisted-reactor有没有帮助。理想的方法是使用 scrapyd 并使用相同的调度刮板

标签： python scrapy python-telegram-bot

【解决方案1】：

好的，我终于解决了我的问题。

Python-telegram-bot API 包装器提供 an easy way to restart the bot.

我只是简单地写了几行：

time.sleep(0.2)
os.execl(sys.executable, sys.executable, *sys.argv)

在 doesntRun() 函数的末尾。现在，每当我通过 bot 调用该函数时，它都会抓取页面、存储结果、转发结果，然后自行重新启动。这样做可以让我随意执行蜘蛛。

【讨论】：