【发布时间】:2021-07-27 10:19:15
【问题描述】:
我一直在尝试在this question 上找到的 spyder 上运行以下代码:
import scrapy
import scrapy.crawler as crawler
from multiprocessing import Process, Queue
from twisted.internet import reactor
# your spider
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ['http://quotes.toscrape.com/tag/humor/']
def parse(self, response):
for quote in response.css('div.quote'):
print(quote.css('span.text::text').extract_first())
# the wrapper to make it run more times
def run_spider(spider):
def f(q):
try:
runner = crawler.CrawlerRunner()
deferred = runner.crawl(spider)
deferred.addBoth(lambda _: reactor.stop())
reactor.run()
q.put(None)
except Exception as e:
q.put(e)
q = Queue()
p = Process(target=f, args=(q,))
p.start()
result = q.get()
p.join()
if result is not None:
raise result
print('first run:')
run_spider(QuotesSpider)
print('\nsecond run:')
run_spider(QuotesSpider)
但是,当我运行它时,我收到以下错误:
AttributeError: Can't pickle local object 'run_spider.<locals>.f'
我已经看到建议的一个答案
Had small issue regarding 'AttributeError: Can't pickle local object 'run_spider.<locals>.f', but moving function called f outside resolved my issue, and I could run the code –
我尝试将函数 f 放在 run_spider 函数之外,甚至放在不同的文件中。但还是不行。
任何帮助将不胜感激。谢谢
【问题讨论】:
标签: python python-3.x scrapy multiprocessing web-crawler