【问题标题】:Python beautifulsoup looping through spanPython beautifulsoup 循环遍历 span
【发布时间】:2020-12-02 09:39:16
【问题描述】:

我下面的代码返回大量跨度结果,我如何循环遍历每个跨度(大约有 6 个,请参见下面的一个示例)以提取“数据库存”?我注意到没有 span 类,因此为什么我被困在如何循环这个问题上。

非常感谢!

import requests
from bs4 import BeautifulSoup
url = "https://www.smythstoys.com/uk/en-gb/video-games-and-tablets/playstation-5/playstation-5-games/sackboy-a-big-adventure-ps5/p/191447"
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
    headers = {'User-Agent': user_agent,
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html5lib")

    gear = soup.find_all('div', class_ = 'instoreMessage')
    print(gear)

生产:

[

仅在商店有售。
<span data-channel="CLICK_AND_COLLECT" data-location="" data-stock="PREORDER" style="display:none">
    
    
        <table border="0" cellpadding="0" cellspacing="0" width="100%">
            <tbody>
                <tr>
                    <td class="check_i" valign="top" width="3%"><i class="fa fa-check green-check"></i></td>
                    <td>Smyths <a data-target="#price-promise-mdl" data-toggle="modal" style="cursor: pointer;">Pre-order Price Promise</a></td>
                </tr>
            </tbody>
        </table>
    
    
</span>

[/div]

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    从您的gear 元素(只有一个)中,找到所有&lt;span&gt; 标记。然后只需遍历该列表并获取data-stock 属性:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.smythstoys.com/uk/en-gb/video-games-and-tablets/playstation-5/playstation-5-games/sackboy-a-big-adventure-ps5/p/191447"
    user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14'
    headers = {'User-Agent': user_agent,
               'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html5lib")
    
    gear = soup.find_all('div', class_ = 'instoreMessage')
    print(gear)
    
    spans = gear[0].find_all('span')
    for span in spans:
        print (span['data-stock'])
    

    输出:

    CCUNAVAILABLEONLYPREORDER
    PREORDER
    PREORDER
    PREORDER
    PREORDER
    INSTOCK
    INSTOCK
    

    【讨论】:

    • 非常感谢!让它看起来很简单,对不起,我来自 VBA,所以这对我来说是全新的!
    • 不用担心!我记得学习了所有这些东西(并且还在学习)。好有趣!坚持下去,继续练习......它会变得更容易。
    猜你喜欢
    • 1970-01-01
    • 2016-05-29
    • 2018-10-22
    • 1970-01-01
    • 1970-01-01
    • 2019-06-24
    • 1970-01-01
    • 2021-11-08
    • 1970-01-01
    相关资源
    最近更新 更多