【问题标题】:Python shell not running crawlerPython shell 没有运行爬虫
【发布时间】:2016-06-22 04:01:52
【问题描述】:

我不得不试试这个代码

import scrapy
from scrapy.cmdline import execute
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from metacritic.items import MetacriticItem
class MetacriticSpider(scrapy.spider):
name = "metacritic" # Name of the spider, to be used when crawling
    allowed_domains = ["metacritic.com"] # Where the spider is allowed to go
start_urls = [
    "http://www.metacritic.com/browse/games/title/pc?page=0"
]
def parse(self, response):
    hxs = HtmlXPathSelector(response) # The XPath selector
    sites = hxs.select('//li[contains(@class, "product game_product")]/div[@class="product_wrap"]')
    items = []
    for site in sites:
        item = MetacriticItem()
        item['title'] = site.select('div[@class="basic_stat product_title"]/a/text()').extract()
        item['link'] = site.select('div[@class="basic_stat product_title"]/a/@href').extract()
        item['cscore'] = site.select('div[@class="basic_stat product_score brief_metascore"]/div/div/span[contains(@class, "data metascore score")]/text()').extract()
        item['uscore'] = site.select('div[@class="more_stats condensed_stats"]/ul/li/span[contains(@class, "data textscore textscore")]/text()').extract()
        item['date'] = site.select('div[@class="more_stats condensed_stats"]/ul/li/span[@class="data"]/text()').extract()
        items.append(item)
    return items

我已经尝试了一些方法来修复这个代码,但我总是得到一个错误 /home/kautsar/metacritic 2/metacritic/spiders/metacritic_spider.py:3: ScrapyDeprecationWarning: 模块scrapy.spider 已弃用,请改用scrapy.spiders 从 scrapy.spider 导入 BaseSpider 回溯(最近一次通话最后): 文件“/home/kautsar/metacritic 2/metacritic/spiders/metacritic_spider.py”,第 6 行,在 类 MetacriticSpider(scrapy.spider): TypeError:调用元类库时出错 module.init() 最多接受 2 个参数(给定 3 个) 有人知道如何解决这个问题吗?

【问题讨论】:

    标签: python scrapy pycharm


    【解决方案1】:
    from scrapy.spider import BaseSpider
    ...
    class MetacriticSpider(scrapy.spider):
    

    如您所见,scrapy.spider 是一个模块名称。你的类试图继承它。通常,您的类应该继承自另一个类——在这种情况下可能是 BaseSpider。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-02-28
      • 1970-01-01
      • 2019-12-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多