【发布时间】:2022-01-08 11:51:15
【问题描述】:
我正在尝试抓取我之前抓取并保存在 json 文件中的多个链接。
到目前为止,这有效,但我不想只是从我的 json 文件中抓取一个 url。
import scrapy
import json
class RatingSpider(scrapy.Spider):
name = "rating"
def start_requests(self):
urls = [
'https://www.darkpattern.games/game/3478/0/ragnarok-m-eternal-love-rom-.html'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for rating in response.css('div.score_box'):
yield {
'reported': rating.css('div.score_heading *::text').extract()
}
json 文件如下所示
[
{
"title": [
"\n\t\t\t\t\t\t",
"Ragnarok M: Eternal Love(ROM)",
"\n\t\t\t\t\t\t",
"\t\t\t\t\t\t",
"The classic adventure returns",
"\n\t\t\t\t\t"
],
"link": [
"/game/3478/0/ragnarok-m-eternal-love-rom-.html"
],
"rating": [
"\n\t\t\t\t\t\t",
"\n\t\t\t\t\t\t",
"-4.68",
"\n\t\t\t\t\t"
]
}
]
关于如何做到这一点的任何建议?
【问题讨论】:
标签: python json scrapy web-crawler