使用 Scrapy 从 Ajax 表单请求中抓取数据

【问题标题】：Scraping data from Ajax Form Requests using Scrapy使用 Scrapy 从 Ajax 表单请求中抓取数据
【发布时间】：2018-07-17 18:22:37
【问题描述】：

我正在尝试从该网站上抓取所有医院数据。 https://www.german-hospital-directory.com/search/Bundesland/Baden-Wuerttemberg.html.

查看请求后，它会发出表单请求。并且无法通过 scrapy shell

访问

并且在响应负载中，它给出了整个 html 内容。如何提取每个医院数据，如 URL、NAME、IMAGE 并遍历所有医院。任何帮助将不胜感激，因为我是scrapy的新手。

我是否需要使用 selenium 或者我可以使用 scrapy 以某种方式实现这一点。

【问题讨论】：

标签： python python-3.x scrapy web-crawler scrapy-spider

【解决方案1】：

您需要先GET 您的网址（以接收 cookie）：https://www.german-hospital-directory.com/search/Bundesland/Baden-Wuerttemberg.html

但是接下来你需要GET这个网址https://www.german-hospital-directory.com/search/_files/main-search/Suchergebnis.jsf

类似这样的：

start_urls = ['https://www.german-hospital-directory.com/search/Bundesland/Baden-Wuerttemberg.html']

def parse(self, response):

    yield scrapy.Request(

        url="https://www.german-hospital-directory.com/search/_files/main-search/Suchergebnis.jsf",
        callback=self.parse_hospitals
    )

def parse_hospitals(self, response):
    #here you have hospitals data
    .....

【讨论】：