【问题标题】:scraping json data with scrapy用 scrapy 抓取 json 数据
【发布时间】:2022-08-04 03:26:36
【问题描述】:
我正在尝试抓取我成功到达的下面的网站,直到生成正文。我想知道如何访问其他详细信息,例如名称、评级、标题、描述。下面是代码。我想弄清楚如何访问响应中的键,例如名称、评级、评论
代码 :
import scrapy
import json
from pprint import pprint
class nykacr(scrapy.Spider):
name = \'nykaa\'
allowed_domains=[\'nykaa.com\']
start_urls = [\"https://www.nykaa.com/gateway-api/products/683166/reviews?pageNo=1&filters=DEFAULT&domain=nykaa\"]
def parse(self,response):
datas = json.loads(response.body)
标签:
json
web-scraping
scrapy
【解决方案1】:
您只需获取reviewData 字段并像列表一样对其进行迭代:
例如:
import scrapy
class nykacr(scrapy.Spider):
name = 'nykaa'
allowed_domains=['nykaa.com']
start_urls = ["https://www.nykaa.com/gateway-api/products/683166/reviews?pageNo=1&filters=DEFAULT&domain=nykaa"]
def parse(self,response):
for item in response.json()["response"]["reviewData"]:
yield {
"id": item["id"],
"childId": item["childId"],
"title": item["title"],
"description": item["description"],
"name": item["name"],
"createdOn": item["createdOn"],
"reviewCreationText": item["reviewCreationText"],
"likeCount": item["likeCount"],
"rating": item["rating"],
"isLikedByUser": item["isLikedByUser"],
"isBuyer": item["isBuyer"],
}