【问题标题】:Find Specific HTML Tag in Beautifulsoup在 Beautifulsoup 中查找特定的 HTML 标签
【发布时间】:2020-10-05 06:58:16
【问题描述】:

我已经尝试了几个小时来找到正确的 soup.select_one 或 find_next 组合来找到下面的 zestimate 标签。你能帮忙找到这个汤代码吗?

这是网址:

https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/

我正在尝试返回: $486,997

<div id="home-details-home-values">
   <h2>Home Value</h2>
   <div class="zestimate-summary">
      <div class="zsg-content-component zestimate-above-toggle">
         <div class="primary-zestimate-item">
            <div>
               <div class="title zsg-h3 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate</span></span></div>
               <div class="content">
                  <div class="zestimate-value">$486,997</div>
               </div>
            </div>
            <div class="left-spacer"></div>
            <div class="right-spacer"></div>
            <div class="zillow-offers-upsell-wrapper">
               <div class="sc-kgoBCf pnJxW">
                  <div class="zsg-h3 zsg-content_collapsed">Zillow Offer</div>
                  <a href="/offers/?t=omhdp-zestimate&amp;zpid=251036192">Get your Zillow Offer</a>
               </div>
            </div>
         </div>
         <div class="secondary-zestimate-items">
            <div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
               <span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPlplc3RpbWF0ZV9SYW5nZTwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTIzLjgwNCAxMy41MDF2MTAuNTExYzAgLjY0OC0uMzI1IDEuNTEyLTEuNTEzIDEuNTEyaC01Ljk0VjE0Ljc2MmgtNS45NHYxMC43NjJINC40N2MtMS4xODggMC0xLjUxMi0uODY0LTEuNTEyLTEuNTEydi0xMC41MUguNThjLS44NjQgMC0uNjQ4LS40MzMtLjEwOC0xLjA4TDEyLjM1NC40MzFjLjMyNC0uMzI0LjY0OS0uNDMyIDEuMDgtLjQzMi40MzMgMCAuNzU3LjIxNiAxLjA4LjQzMmwxMS44ODIgMTEuOTljLjY0OC42NDcuODY0IDEuMDgtLjEwOCAxLjA4aC0yLjQ4NHoiIGlkPSJiIi8+PG1hc2sgaWQ9ImQiIHg9IjAiIHk9IjAiIHdpZHRoPSIyNi45NSIgaGVpZ2h0PSIyNS41MjQiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxNSAxNSkiLz48L2c+PC9zdmc+" role="presentation"></span>
               <div class="secondary-wrapper">
                  <div class="title zsg-h4 zsg-content_collapsed"><span tabindex="0" role="button"><span class="ds-dashed-underline">Zestimate Range</span></span></div>
                  <div class="content">$463,000 - $511,000</div>
               </div>
            </div>
            <div class="zsg-lg-1-3 zsg-md-1-1 secondary-row">
               <span class="zestimate-icon"><img src="data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIHZpZXdCb3g9IjAgMCA1NiA1NiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+PHRpdGxlPjMwX0RheXNfRG93bjwvdGl0bGU+PGRlZnM+PGVsbGlwc2UgaWQ9ImEiIGN4PSIyOCIgY3k9IjI4IiByeD0iMjgiIHJ5PSIyOCIvPjxtYXNrIGlkPSJjIiB4PSIwIiB5PSIwIiB3aWR0aD0iNTYiIGhlaWdodD0iNTYiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2EiLz48L21hc2s+PHBhdGggZD0iTTI4LjcwNiAxMy43NjVMMTYuNDcgMS41MjlDMTYgMS4wNiAxNS40MS44MjQgMTQuNzA2LjgyNGMtLjcwNiAwLTEuMjk0LjIzNS0xLjY0Ny43MDVMLjcwNiAxMy43NjVjLS40Ny40Ny0uNzA2IDEuMDU5LS43MDYgMS43NjQgMCAuNzA2LjIzNSAxLjE3Ny43MDYgMS42NDdsMS40MTIgMS40MTJjLjQ3LjQ3IDEuMDU4LjcwNiAxLjY0Ny43MDYuNzA2IDAgMS4yOTQtLjIzNSAxLjY0Ny0uNzA2bDUuNTMtNS41M3YxMy4yOTVjMCAuNzA2LjIzNCAxLjE3Ni43MDUgMS42NDdhMi44OSAyLjg5IDAgMCAwIDEuNzY1LjU4OGgyLjQ3QTIuODkgMi44OSAwIDAgMCAxNy42NDcgMjhjLjQ3LS4zNTMuNzA2LS45NDEuNzA2LTEuNjQ3VjEzLjA1OWw1LjUzIDUuNTNjLjQ3LjQ3IDEuMDU4LjcwNSAxLjY0Ni43MDUuNzA2IDAgMS4yOTUtLjIzNSAxLjc2NS0uNzA2bDEuNDEyLTEuNDEyYy40Ny0uNDcuNzA2LTEuMDU4LjcwNi0xLjY0NyAwLS43MDUtLjIzNi0xLjI5NC0uNzA2LTEuNzY0eiIgaWQ9ImIiLz48bWFzayBpZD0iZCIgeD0iMCIgeT0iMCIgd2lkdGg9IjI5LjQxMiIgaGVpZ2h0PSIyNy43NjUiIGZpbGw9IiNmZmYiPjx1c2UgeGxpbms6aHJlZj0iI2IiLz48L21hc2s+PC9kZWZzPjxnIHN0cm9rZT0iIzAwNzRFNCIgc3Ryb2tlLXdpZHRoPSIyIiBmaWxsPSIjRkZGIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjx1c2UgbWFzaz0idXJsKCNjKSIgeGxpbms6aHJlZj0iI2EiLz48dXNlIG1hc2s9InVybCgjZCkiIHhsaW5rOmhyZWY9IiNiIiB0cmFuc2Zvcm09Im1hdHJpeCgxIDAgMCAtMSAxMyA0MykiLz48L2c+PC9zdmc+" role="presentation"></span>
               <div class="secondary-wrapper">
                  <div class="title zsg-h4 zsg-content_collapsed">Last 30 Day Change</div>
                  <div class="content">-$2,830 <span class="percent-decrease">(-0.6 %)</span></div>
               </div>
            </div>
         </div>
      </div>
      <div class="toggle-section">
         <div class="zsg-content-component module-separator hide">
            <div class="additional-zestimate-info zsg-wrapper-body-hidden"></div>
         </div>
         <div class="zsg-content-item"><a class="toggle zsg-lg-1-1 zsg-centered">Zestimate history &amp; details <span class="zsg-icon-expando-down"></span></a></div>
      </div>
   </div>
</div>

这是我正在使用的代码:

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

for link in df['links']:
    r = s.get(link, headers=req_headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    #     soup = BeautifulSoup(requests.get(url, headers=req_headers).content, 'html.parser')
    results = soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
    print(results)

【问题讨论】:

    标签: python html beautifulsoup html-parsing


    【解决方案1】:

    基于my answer:Zillow 为用户提供的页面类型似乎更多。首先检查,如果你没有得到验证码页面。如果没有,请使用此脚本:

    import requests
    from bs4 import BeautifulSoup
    
    
    url = 'https://www.zillow.com/homedetails/8612-Silverthorne-St-Austin-TX-78744/251036192_zpid/'
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    
    home_value = soup.select_one('h4:contains("Home value")')
    if not home_value:
        home_value = soup.select_one('.zestimate').text.split()[-1]
    else:
        home_value = home_value.find_next('p').get_text(strip=True)
    
    print(home_value)
    

    打印:

    $486,997
    

    对于url = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/',它会打印:

    $324,493
    

    可能需要更多测试。

    【讨论】:

    • 我实际上是想给你打电话,因为你想出了我几天前的原始答案!现在试试这个。再次感谢。我知道我很烦人。
    猜你喜欢
    • 2018-10-11
    • 2013-12-17
    • 1970-01-01
    • 2010-10-26
    • 2014-05-09
    • 1970-01-01
    • 2013-09-18
    • 2012-07-30
    相关资源
    最近更新 更多