【问题标题】:web crawler Beautiful Soup网络爬虫美汤
【发布时间】:2021-09-10 07:39:37
【问题描述】:

当 class=high 时如何过滤然后打印平均每小时收入 m/m?

  <tr class="calendar__row calendar_row calendar__row--grey calendar__row--no-grid nogrid" data-eventid="117390" data-ecobaseid="159" data-touchable="">

    <td class="calendar__cell calendar__impact impact calendar__impact calendar__impact--high">
        <div class="calendar__impact-icon calendar__impact-icon--screen">
            <span title="High Impact Expected" class="high"></span>
        </div>
        <div class="calendar__impact-icon calendar__impact-icon--print">
            <img src="https://resources.faireconomy.media/images/sprites/mm-impact-red.png" alt="" width="14" height="12">
        </div>
    </td>

    <td class="calendar__cell calendar__currency currency calendar__currency--right-of-impact" title="United States">
            US
    </td>   

    <td class="calendar__cell calendar__event event">
        <div>
            <span class="calendar__event-title">Average Hourly Earnings m/m</span>
        </div>
    </td>

【问题讨论】:

    标签: python html beautifulsoup web-crawler


    【解决方案1】:

    如果我理解正确,您希望找到包含 class="high" 的日历行,然后打印该事件:

    from bs4 import BeautifulSoup
    
    html_doc = """
      <tr class="calendar__row calendar_row calendar__row--grey calendar__row--no-grid nogrid" data-eventid="117390" data-ecobaseid="159" data-touchable="">
    
        <td class="calendar__cell calendar__impact impact calendar__impact calendar__impact--high">
            <div class="calendar__impact-icon calendar__impact-icon--screen">
                <span title="High Impact Expected" class="high"></span>
            </div>
            <div class="calendar__impact-icon calendar__impact-icon--print">
                <img src="https://resources.faireconomy.media/images/sprites/mm-impact-red.png" alt="" width="14" height="12">
            </div>
        </td>
    
        <td class="calendar__cell calendar__currency currency calendar__currency--right-of-impact" title="United States">
                US
        </td>   
    
        <td class="calendar__cell calendar__event event">
            <div>
                <span class="calendar__event-title">Average Hourly Earnings m/m</span>
            </div>
        </td>
    
      </tr>
    """
    
    soup = BeautifulSoup(html_doc, "html.parser")
    
    for calendar_row in soup.select("tr.calendar_row"):
        if not calendar_row.find(class_="high"):
            continue
    
        event = calendar_row.find(class_="event")
        print(event.get_text(strip=True))
    

    打印:

    Average Hourly Earnings m/m
    

    或者:只使用 CSS 选择器:

    event = soup.select_one(".calendar_row:has(.high) .event")
    print(event.get_text(strip=True))
    

    【讨论】:

      猜你喜欢
      • 2021-12-15
      • 2011-12-11
      • 1970-01-01
      • 1970-01-01
      • 2017-09-11
      • 2018-08-12
      • 2012-08-01
      • 2015-05-12
      • 2013-03-29
      相关资源
      最近更新 更多