Python beautifulsoup 循环遍历 span答案

【问题标题】：Python beautifulsoup looping through spanPython beautifulsoup 循环遍历 span
【发布时间】：2020-12-02 09:39:16
【问题描述】：

我下面的代码返回大量跨度结果，我如何循环遍历每个跨度（大约有 6 个，请参见下面的一个示例）以提取“数据库存”？我注意到没有 span 类，因此为什么我被困在如何循环这个问题上。

非常感谢！

import requests
from bs4 import BeautifulSoup
url = "https://www.smythstoys.com/uk/en-gb/video-games-and-tablets/playstation-5/playstation-5-games/sackboy-a-big-adventure-ps5/p/191447"
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
    headers = {'User-Agent': user_agent,
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html5lib")

    gear = soup.find_all('div', class_ = 'instoreMessage')
    print(gear)

生产：

[

仅在商店有售。

<span data-channel="CLICK_AND_COLLECT" data-location="" data-stock="PREORDER" style="display:none">
    
    
        <table border="0" cellpadding="0" cellspacing="0" width="100%">
            <tbody>
                <tr>
                    <td class="check_i" valign="top" width="3%"><i class="fa fa-check green-check"></i></td>
                    <td>Smyths <a data-target="#price-promise-mdl" data-toggle="modal" style="cursor: pointer;">Pre-order Price Promise</a></td>
                </tr>
            </tbody>
        </table>
    
    
</span>

[/div]

【问题讨论】：

标签： python web-scraping beautifulsoup

【解决方案1】：

从您的gear 元素（只有一个）中，找到所有<span> 标记。然后只需遍历该列表并获取data-stock 属性：

import requests
from bs4 import BeautifulSoup

url = "https://www.smythstoys.com/uk/en-gb/video-games-and-tablets/playstation-5/playstation-5-games/sackboy-a-big-adventure-ps5/p/191447"
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
headers = {'User-Agent': user_agent,
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html5lib")

gear = soup.find_all('div', class_ = 'instoreMessage')
print(gear)

spans = gear[0].find_all('span')
for span in spans:
    print (span['data-stock'])

输出：

CCUNAVAILABLEONLYPREORDER
PREORDER
PREORDER
PREORDER
PREORDER
INSTOCK
INSTOCK

【讨论】：

非常感谢！让它看起来很简单，对不起，我来自 VBA，所以这对我来说是全新的！
不用担心！我记得学习了所有这些东西（并且还在学习）。好有趣！坚持下去，继续练习......它会变得更容易。