从 json 文件中抓取多个链接答案

【问题标题】：Scrape multiple links from a json file从 json 文件中抓取多个链接
【发布时间】：2022-01-08 11:51:15
【问题描述】：

我正在尝试抓取我之前抓取并保存在 json 文件中的多个链接。

到目前为止，这有效，但我不想只是从我的 json 文件中抓取一个 url。

import scrapy
import json

class RatingSpider(scrapy.Spider):
    name = "rating"

    def start_requests(self):
        urls = [
            'https://www.darkpattern.games/game/3478/0/ragnarok-m-eternal-love-rom-.html'
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)
            
    def parse(self, response):
        for rating in response.css('div.score_box'):
            yield {
                'reported': rating.css('div.score_heading *::text').extract()
                
            }

json 文件如下所示

[
  {
    "title": [
      "\n\t\t\t\t\t\t",
      "Ragnarok M: Eternal Love(ROM)",
      "\n\t\t\t\t\t\t",
      "\t\t\t\t\t\t",
      "The classic adventure returns",
      "\n\t\t\t\t\t"
    ],
    "link": [
      "/game/3478/0/ragnarok-m-eternal-love-rom-.html"
    ],
    "rating": [
      "\n\t\t\t\t\t\t",
      "\n\t\t\t\t\t\t",
      "-4.68",
      "\n\t\t\t\t\t"
    ]
  }
]

关于如何做到这一点的任何建议？

【问题讨论】：

标签： python json scrapy web-crawler

【解决方案1】：

我没有在您的示例中看到您从 json 文件中读取的位置。你需要做这样的事情：

with open("your json file", "r") as f:
    jsonlist = json.load(f)

for i in range(len(jsonlist)):
    url = jsonlist[i]["link"][0]
do something with url - run request or store in list, etc. Also, Your sample json contains a relative url so I assume the rest of the file is the same and the base url is https://www.darkpattern.games so you would need to concatenate the base url - https://www.darkpattern.games - and the relative urls prior to running the requests.

【讨论】：

谢谢，我会努力的
我如何将绝对链接与来自 jsons 的相对链接？