【问题标题】:Scrapy extracting data from json responseScrapy从json响应中提取数据
【发布时间】:2022-01-22 02:23:45
【问题描述】:

我正在尝试使用 scrapy 从 json 响应中提取数据。目的是获得响应中列出的产品:e

import scrapy
import json

class DepopSpider(scrapy.Spider):
    name = 'depop'
    allowed_domains = ["depop.com"]
    start_urls = ['https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance']
def parse(self, response):
    data = json.loads(response.body)
    yield from data['meta']['products']

我收到以下错误:

ERROR: Spider 错误处理 https://webapi.depop.com/api/v2/search/products/?brands=1596&itemsPerPage=24&country=gb&currency=GBP&sort=relevance> (referer: None)

【问题讨论】:

  • 嘿伙计,你为什么需要scrapy呢?
  • @y.y 尝试在其他人之上使用scrapy。但对我来说,使用 scrapy 进行 json 响应也是新的,所以我很高兴学习如何正确地做到这一点。
  • okey 我明白 :) scrapy 真的很好,但对我来说,我会说它解析 json 请求的错误模块,请在下面查看我的答案

标签: python web-scraping scrapy


【解决方案1】:

这是使用scrapy and json的最小工作代码

脚本:

import scrapy
import json

class DepopSpider(scrapy.Spider):
    name = 'depop'

    def start_requests(self):
        yield scrapy.Request (
            url='https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance',
            method='GET',
            callback = self.parse,
           
            )
    def parse(self, response):
        resp= response.json()['products']
        #print(resp)
        # json_data = json.dumps(resp)

        # with open('data.json','w') as f:
        #     f.write(json_data)

        for item in resp:
            yield {
                'Name': item['slug'],
                'price':item['price']['priceAmount']
                }

输出:

{'Name': 'kicksbrothers-exclusive-genuine-blue-inc', 'price': '22.98'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'isabellaimogen-crew-clothing-full-length-slim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-shirt-size', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-brand-1990s', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'tommkent-high-waisted-vintage-jeans-washed', 'price': '24.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'megsharp-super-cute-flowery-anchor-blue', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'moniulka2607-sweat-wear-for-man-shorts', 'price': '30.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'quynheu-free-uk-shipping-anchor-blue-07e1', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'bradymonster-oversized-stone-washed-shirt-from', 'price': '14.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'bonebear-vintage-funky-mens-large-shirt', 'price': '9.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'katy_potaty-vintage-anchor-blue-mom-jeanstrousers', 'price': '20.00'}       
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'urielbongco-washed-up-denim-jacket-preloved', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'reubz16--thick-thermal-heavy-t-shirt', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'reubz16--vintage-egypt-tourist-tee', 'price': '16.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'kristoferjohnson-blue-harbour-mens-tailored-fit', 'price': '7.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'ravsonline-blue-willis-pure-indigo-cotton', 'price': '27.20'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>
{'Name': 'shikhalamode-anchor-blue-low-rise-denim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance>

.. 以此类推

【讨论】:

  • 我有一个关于物品装载机的问题;我将如何为响应实施此操作?例如,假设我为项目创建了一个类。然后在您的循环中,我使用loader = ItemLoader(DepopItem(), selector=item) 创建了一个加载器,我将如何选择nameprice 之类的数据?我只习惯以这种方式获取xpaths。我猜是loader.add_value(?????)
  • 当然!查看新帖子:*.com/questions/70424004/…
  • 我会试试的,我有一定的技巧,这可能需要一些时间,因为我有点忙,你也可以尝试在线帮助。谢谢
  • 没关系!我最终设法解决了这个问题。感谢您的模板!
【解决方案2】:

如果你想处理 json 请求的响应,可以试试这个:

import requests

url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance"

payload={}
headers = {}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

所以你的输出是这样的:

{
    "meta": {
        "resultCount": 20,
        "cursor": "MnwyMHwxNjQwMDA1ODc3",
        "hasMore": false,
        "totalCount": 20
    },
    "products": [
        {
            "id": 215371070,
            "slug": "kicksbrothers-exclusive-genuine-blue-inc",
            "status": "ONSALE",
            "hasVideo": false,
            "price": {
                "priceAmount": "22.98",
                "currencyName": "GBP",
                "nationalShippingCost": "4.99",
                "internationalShippingCost": "10.00"
            },
            "preview": {
                "150": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P2.jpg",
                "210": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P4.jpg",
                "320": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P5.jpg",
                "480": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P6.jpg",
                "640": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P1.jpg",
                "960": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P7.jpg",
                "1280": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P8.jpg"
            },
            "variantSetId": 93,
            "variants": {
                "7": 1
            },
            "isLiked": false
        },

如何解析json响应

import requests
import json

def get_requests():
    url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb&currency=GBP&sort=relevance"
    payload={}
    headers = {}
    response = requests.request("GET", url, headers=headers, data=payload)
    return response.text

# x uses method "get_requests"
x = get_requests()

data_json = json.loads(x)
for id, price in zip(data_json['products'], data_json['products']):
    print(id['id'])
    print(price['price']['priceAmount'])

输出:

215371070
22.98
256715789
8.00
202721541
5.00
202722546
5.00
274328291
24.00
221641139
10.00
245419941
30.00
192541316
8.00
147762409
14.00
158406248
9.99
234693030
20.00
213377081
10.00
228630951
10.00
203627182
16.00
159958157
7.99
151413456
27.20
250985338
8.00
185488012
15.00
154423470
20.00
193888222
10.00

您遍历了 json 响应并保存了键的值:“id”和“price”

【讨论】:

  • 啊,我明白了!我可能需要包含可能解决问题的标题
  • 现在看看我的编辑答案。你仍然错过一些事情吗? :)
  • OP 为 Scrapy 提出了一个问题,但答案指导用户使用请求!这就是为什么我对这个答案投了反对票!
  • 好吧,我应该用错误的方式解释他吗? @艾哈迈德
  • 他是对的,我只是在寻找一个草率的答案。或者,也许您想查看我的其他问题:*.com/questions/70423056/…。欢迎回复requests