【问题标题】:How can I get the Price, Title, and the Link from this element tag?如何从此元素标签中获取价格、标题和链接?
【发布时间】:2018-08-28 00:19:36
【问题描述】:

这是Craigslist中搜索项目的导出元素标签。

我正在使用 BeautifulSoup 和 Python。

如何获得下面隐藏的 3 个项目? 1.) 2 BED/1 BATH CONDO 单元,布局绝佳 2.) https://vancouver.craigslist.ca/rch/apa/d/2-bed-1-bath-condo-unit-with/6682563732.html 3.) 1400 美元

<li class="result-row" data-pid="6682563732" data-repost-of="6672289062">
<a class="result-image gallery" data-ids="1:00O0O_9WrVUmeuy5e,1:01010_8pEn0AcGEYo,1:00707_77eVL7Ade68,1:00e0e_dxzhja1yrDa,1:00k0k_2F9337g40vD,1:00O0O_eaVv31Gd4yw,1:00Z0Z_jccPvNndfg7,1:00505_eicoHPPOUcN,1:00U0U_7ligL02j3Mr,1:00u0u_3RnaxaZyl81,1:00S0S_ld5VSNTlzAJ,1:00U0U_dU2swTovLEJ,1:00O0O_d7PeITBKlmL,1:00x0x_3kcVk30PK30,1:00i0i_xhm6pORJuQ,1:00m0m_kIMUZ1PgCHb,1:00D0D_aTGfSGlJ1Ru,1:00v0v_8M4NXLErFqM,1:00c0c_jiTQuztqh9J,1:00Z0Z_3rlFLU7MCbq" href="https://vancouver.craigslist.ca/rch/apa/d/2-bed-1-bath-condo-unit-with/6682563732.html">
<span class="result-price">$1400</span>
</a>
<p class="result-info">
<span class="icon icon-star" role="button">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2018-08-27 16:54" title="Mon 27 Aug 04:54:01 PM">Aug 27</time>
<a class="result-title hdrlnk" data-id="6682563732" href="https://vancouver.craigslist.ca/rch/apa/d/2-bed-1-bath-condo-unit-with/6682563732.html">2 BED/1 BATH CONDO UNIT WITH FANTASTIC LAYOUT</a>
<span class="result-meta">
<span class="result-price">$1400</span>
<span class="housing">
                    2br -
                    1430ft<sup>2</sup> -
                </span>
<span class="result-hood"> (RICHMOND)</span>
<span class="result-tags">
                    pic
                    <span class="maptag" data-pid="6682563732">map</span>
</span>
<span class="banish icon icon-trash" role="button">
<span class="screen-reader-text">hide this posting</span>
</span>
<span aria-hidden="true" class="unbanish icon icon-trash red" role="button"></span>
<a class="restore-link" href="#">
<span class="restore-narrow-text">restore</span>
<span class="restore-wide-text">restore this posting</span>
</a>
</span>
</p>
</li>

【问题讨论】:

    标签: python beautifulsoup craigslist


    【解决方案1】:

    您可以找到具有所需类名的元素:

    import re
    from bs4 import BeautifulSoup as soup
    d = soup(content, 'html.parser')
    [titles] = [[i.text, i['href']] for i in d.find_all('a', {'class':'result-title'})]
    _, price= [i.text for i in d.find_all('span', {'class':'result-price'})]
    print([*titles, price])
    

    输出:

    ['2 BED/1 BATH CONDO UNIT WITH FANTASTIC LAYOUT', 'https://vancouver.craigslist.ca/rch/apa/d/2-bed-1-bath-condo-unit-with/6682563732.html', '$1400']
    

    【讨论】:

    • 你为什么用{'class':re.compile('result\-price')}而不是直接{'class':'result-price'}?我失踪有什么原因吗? (只是好奇)
    • @newbie 我首先认为 OP 还想要 1430ft,它也包含在 span 中,因此,re.compile('result\-price|housing') 会比 soup.find('span', {'class':'result-price'})soup.find('span', {'class':'housing'}) 短。我在仔细检查了问题陈述后留下了它,以防以后需要特定的值。
    • 哇!简短的回答和快速的回​​复谢谢!这就是为什么我不喜欢 Python 的原因,因为有这么多的记忆库和简码库。如果我是 Python 新手,如何学习所有这些东西?
    • 你有这个的长版本吗?如果我们不喜欢使用 're' 方法怎么办。
    • @Ajax1234 我仍然没有得到 d.find_all('a', {'class': 'result-tile'}) 中的数据,结果 [] 为空
    猜你喜欢
    • 2012-07-09
    • 1970-01-01
    • 2015-09-10
    • 2022-10-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-07-19
    • 2021-02-14
    相关资源
    最近更新 更多