【发布时间】:2021-03-29 08:23:14
【问题描述】:
我想从以下 html 代码中抓取所有产品的列表,以及它们是“instock”还是“outofstock”。
<div class="js-product-sizer sizes__layout built" data-scope="conversion-zone">
<div class="sizes__wrapper sizes__wrapper--visible" data-id="1042303" data-sellableonline="true">
<span class="sizes__button sizes__button--selected" role="button" data-quantity="">Taglia</span>
<ul class="sizes__list" role="listbox">
<li class="sizes__size" data-id="969834" data-belowthreshold="false" data-quantity="true" data-available-quantity="1631" data-backinstockqualified="false" data-locale="it" data-storeprice="€1,49" data-storename="Cavallino, Lecce" data-price="1.49" data-weight="0.528" data-favstore-stock="10" data-favstore-above-threshold="false" data-favstore-cnc1h="true" aria-labelledby="size-selector-title" role="option">
<span class="sizes__info" data-tnr-size-selector-bootstrap-by-text="">0,5 KG</span>
<span class="sizes__stock"
<span class="sizes__stock__info" data-tnr-size-selector-stock-info="">Disponibile</span>
</span>
</li>
<li class="sizes__size" data-id="969842" data-belowthreshold="false" data-quantity="false" data-available-quantity="0" data-backinstockqualified="true" data-locale="it" data-storeprice="€3,49" data-storename="Cavallino, Lecce" data-price="3.49" data-weight="1.074" data-displayname="Disco ghisa bodybuilding 28mm" data-favstore-stock="0" data-favstore-above-threshold="false" data-favstore-cnc1h="false" aria-labelledby="size-selector-title" role="option">
<span class="sizes__info" data-tnr-size-selector-bootstrap-by-text="">1 KG</span>
<span class="sizes__stock">
<span class="sizes__stock__info sizes__stock__info--nostock" data-tnr-size-selector-stock-info="">0 disponibili</span>
我已经运行了以下代码:
import requests
from bs4 import BeautifulSoup
import time
r = requests.get('https://www.decathlon.it/p/disco-ghisa-bodybuilding-28mm/_/R-p-7278?mc=1042303&c=NERO')
soup = BeautifulSoup(r.text, 'html.parser')
for anchor_tag in soup.find_all(class_="js-product-sizer sizes__layout built")[0].findChildren():
if "sizes_stock" in anchor_tag['class']:
print(f"Size {anchor_tag.text} OOS")
else:
print(f"Size {anchor_tag.text} in stock!")
但它给了我以下错误:
IndexError: list index out of range
【问题讨论】:
-
您从哪里复制了这段 HTML 代码?来自您的浏览器还是来自请求?请注意,您无法使用这种方式通过 JavaScript 获取动态创建的内容。如果是 JavaScript 动态创建的内容,那么你应该使用 Selenium。
-
你可以给我一个硒代码的例子吗?
-
我没有使用 Selenium。我在我的项目中使用了 BeautifulSoup。在考虑 selenium 之前试试这个:像这样更改 soup.find_all(...) 中的 for anchor_tag 部分:
soup.find_all("div",{"class"="js-product-sizer sizes__layout built"})
标签: python python-3.x web-scraping beautifulsoup python-requests