【问题标题】:Scrapy-Splash ERROR 400: "description": "Required argument is missing: url"Scrapy-Splash ERROR 400:“描述”:“缺少必需的参数:url”
【发布时间】:2019-12-30 02:10:07
【问题描述】:

我在我的代码中使用 scrapy splash 来生成 javascript-html 代码。

并且 splash 将这个 render.html 还给我

{
    "error": 400,
    "type": "BadOption",
    "description": "Incorrect HTTP API arguments",
    "info": {
        "type": "argument_required",
        "argument": "url",
        "description": "Required argument is missing: url"
    }
}

而且我无法使用 javascript 生成的 html 得到响应。 这是我的spider.py

class ThespiderSpider(scrapy.Spider):
    name = 'thespider'
    #allowed_domains = ['https://www.empresia.es/empresa/repsol/']
    start_urls = ['https://www.empresia.es/empresa/repsol/']

    def start_requests(self):
        yield scrapy.Request( 'http://example.com', self.fake_start_requests )

    def fake_start_requests(self, response):
        for url in self.start_urls:
            yield SplashRequest( url, self.parse,
                                 args={'wait': 1.5, 'http_method': 'POST'},
                                 endpoint='render.html'
                                 )

    def parse(self, response):
        open_in_browser(response)
        title = response.css("title").extract()
        # har = response.data["har"]["log"]["pages"]
        headers = response.headers.get('Content-Type')
        names = response.css('.fa-user-circle-o+ a::text').extract()
        yield {
            'title': title,
            #'har': har,
            'headers': headers,
            'names': names,
            'length': len(names)
        }

这是我的 settings.py

USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

# Splash Settings
DOWNLOADER_MIDDLEWARES = {
    # Engine side
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    # Downloader side
}

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
SPLASH_URL = 'http://127.0.0.1:8050/'
# SPLASH_URL = 'http://192.168.59.103:8050/'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

感谢您的帮助。

【问题讨论】:

  • 您正在与之交互的 API 似乎没有得到预期的结果。你有 API 的文档或源代码吗?
  • 这是 API 的文档 - splash.readthedocs.io/en/stable/faq.html 我似乎无法弄清楚我错过了什么。
  • 您好,我遇到了同样的错误,您找到解决方法了吗?

标签: python error-handling scrapy scrapy-splash


【解决方案1】:

args参数中提供url参数就可以了。

yield SplashRequest(self.parse,
                     args={'url': url, 'wait': 1.5, 'http_method': 'POST'},
                     endpoint='render.html'
                     )

【讨论】:

    猜你喜欢
    • 2019-10-26
    • 1970-01-01
    • 2021-05-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-02
    • 1970-01-01
    相关资源
    最近更新 更多