【问题标题】:Extract a content from <script> scrapign with BS4使用 BS4 从 <script> 抓取中提取内容
【发布时间】:2025-12-20 08:30:12
【问题描述】:

我正在尝试从“脚本”标签中提取信息,代码如下

    response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
 
    marca = soup.find("h3", {"class":"OEhtt9 ka2E9k uMhVZi uc9Eq5 pVrzNP _5Yd-hZ"}).text
    nombre = soup.find("h1", {"class":"OEhtt9 ka2E9k uMhVZi z-oVg8 pVrzNP w5w9i_ _1PY7tW _9YcI4f"}).text
    color = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi dgII7d z-oVg8 pVrzNP"}).text
    precio = soup.find("span", {"class":"uqkIZw ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
    talla = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
    imagen = soup.find("img", {"class": "_6uf91T z-oVg8 u-6V88 ka2E9k uMhVZi FxZV-M _2Pvyxl JT3_zV EKabf7 mo6ZnF _1RurXL mo6ZnF PZ5eVw"})['src']


    sku355 = api + str(soup.find_all('script')[15]).split('sku":"')[3][:-137]
    sku36 = api + str(soup.find_all('script')[15]).split('sku":"')[4][:-139]
    sku365 = api + str(soup.find_all('script')[15]).split('sku":"')[5][:-139]
    sku375 = api + str(soup.find_all('script')[15]).split('sku":"')[6][:-137]
    sku38 =  api + str(soup.find_all('script')[15]).split('sku":"')[7][:-139]
    sku385 = api + str(soup.find_all('script')[15]).split('sku":"')[8][:-137]
    sku39 = api + str(soup.find_all('script')[15]).split('sku":"')[9][:-137]
    sku40 = api + str(soup.find_all('script')[15]).split('sku":"')[10][:-139]
    sku405 = api + str(soup.find_all('script')[15]).split('sku":"')[11][:-137]
    sku41 = api + str(soup.find_all('script')[15]).split('sku":"')[12][:-137]
    sku42 = api + str(soup.find_all('script')[15]).split('sku":"')[13][:-139]
    sku425 = api + str(soup.find_all('script')[15]).split('sku":"')[14][:-137]
    sku43 = api + str(soup.find_all('script')[15]).split('sku":"')[15][:-125]

    print (sku3555)
    print (sku36)
    print (sku365)
    print (sku375)
    print (sku38)
    print (sku385)
    print (sku39)
    print (sku40)
    print (sku405)
    print (sku41)
    print (sku42)
    print (sku425)
    print (sku43)

这双鞋的一切都很完美,但是当我切换到这个链接时,它给了我一些别的东西,我想取出的是每个尺码的 SKU,不管链接是什么

https://www.zalando.es/nike-sportswear-air-force-1-gtx-unisex-zapatillas-anthraciteblackbarely-grey-ni115o01u-q11.html

【问题讨论】:

标签: python beautifulsoup screen-scraping


【解决方案1】:

无法重现您的示例,如果能改进您的问题会很酷。

以防万一

如果您只想获取尺寸,请尝试以下操作:

import requests, json
from bs4 import BeautifulSoup

headers = {"user-agent": "Mozilla/5.0"}
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)

soup = BeautifulSoup(response.content, 'lxml')

json_object = json.loads(soup.select_one('script#z-vegas-pdp-props').contents[0].split('CDATA')[1].split(']>')[0])

for item in json_object[0]['model']['articleInfo']['units']:
    print('sku:{0} - size:{1}'.format(item['id'],item['size']['local']))

输出

sku:JOC11A024-G110005000 - size:35.5
sku:JOC11A024-G110055000 - size:36
sku:JOC11A024-G110006000 - size:36.5
sku:JOC11A024-G110065000 - size:37.5
sku:JOC11A024-G110007000 - size:38
sku:JOC11A024-G110075000 - size:38.5
sku:JOC11A024-G110008000 - size:39
sku:JOC11A024-G110085000 - size:40
sku:JOC11A024-G110009000 - size:40.5
sku:JOC11A024-G110095000 - size:41
sku:JOC11A024-G110010000 - size:42
sku:JOC11A024-G110105000 - size:42.5
sku:JOC11A024-G110011000 - size:43

【讨论】:

  • 如何打印该输出?我可以改变它,例如每个尺寸的 SKU?谢谢!!
  • 您的问题中没有明确定义预期的输出,但请参阅我的示例中的更改,其中包括如何打印并显示id/sku
  • 非常感谢,你能帮我解决新问题吗? *.com/questions/65764426/…