【发布时间】:2021-07-05 13:52:58
【问题描述】:
# -*- coding: utf-8 -*-
import scrapy
from ..items import HomedepotItem
import re
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup
class HomedepotSpider(scrapy.Spider):
name = 'homeDepot'
start_urls = ['https://www.homedepot.com/p/ZLINE-Kitchen-and-Bath-36-DuraSnow-Stainless-Steel-Range-Hood-with-Hand-Hammered-Copper-Shell-8654HH-36-8654HH-36/311287560']
def parse(self, response):
for item in self.parseHomeDepot(response):
yield item
pass
def parseHomeDepot(self, response):
item = HomedepotItem() #items from items.py
jsonresponse = json.loads(response.text)
productPrice = jsonresponse(["offers"][0]["price"])
#item['productPrice'] = productPrice #display price and assign to variable
yield item
我正在尝试从该网页的 json 中解析数据。我之前回答了一个关于 json 的问题,并且 ["offers"]["prices"] 是要走的路,因为网页的 json 是
"offers":{"@type":"Offer","url":"https://www.homedepot.com/p/ZLINE-Kitchen-and-Bath-36-DuraSnow-Stainless-Steel-Range-Hood-with-Hand-Hammered-Copper-Shell-8654HH-36-8654HH-36/311287560","priceCurrency":"USD","price":1449.95,"priceValidUntil":"4/7/2021","availability":"https://schema.org/InStock"}
所以现在我得到了错误:raise JSONDecodeError("Expecting value", s, err.value) from None
任何帮助将不胜感激!
【问题讨论】:
-
您收到错误是因为您尝试在整个网页上执行
json.loads,而不仅仅是 json 组件 -
@tomjn 所以我会在我的 json 响应中加载 offer 对象,然后循环遍历它以尝试获取价格?
-
@TowsifAhamedLabib 我不认为我可以使用 response.css,因为内容是动态生成的
-
既然你提到了一个之前的问题,我看了那个问题,这就是你问题的答案。我在这里错过了什么?您可以使用
response.css加载json,类似于您上一个问题的答案... -
@tomjn 我确实尝试过,但我可能加载不正确,谢谢!
标签: python web-scraping beautifulsoup scrapy