【问题标题】:web crawler Beautiful Soup网络爬虫美汤
【发布时间】:2021-09-10 07:39:37
【问题描述】:
当 class=high 时如何过滤然后打印平均每小时收入 m/m?
<tr class="calendar__row calendar_row calendar__row--grey calendar__row--no-grid nogrid" data-eventid="117390" data-ecobaseid="159" data-touchable="">
<td class="calendar__cell calendar__impact impact calendar__impact calendar__impact--high">
<div class="calendar__impact-icon calendar__impact-icon--screen">
<span title="High Impact Expected" class="high"></span>
</div>
<div class="calendar__impact-icon calendar__impact-icon--print">
<img src="https://resources.faireconomy.media/images/sprites/mm-impact-red.png" alt="" width="14" height="12">
</div>
</td>
<td class="calendar__cell calendar__currency currency calendar__currency--right-of-impact" title="United States">
US
</td>
<td class="calendar__cell calendar__event event">
<div>
<span class="calendar__event-title">Average Hourly Earnings m/m</span>
</div>
</td>
【问题讨论】:
标签:
python
html
beautifulsoup
web-crawler
【解决方案1】:
如果我理解正确,您希望找到包含 class="high" 的日历行,然后打印该事件:
from bs4 import BeautifulSoup
html_doc = """
<tr class="calendar__row calendar_row calendar__row--grey calendar__row--no-grid nogrid" data-eventid="117390" data-ecobaseid="159" data-touchable="">
<td class="calendar__cell calendar__impact impact calendar__impact calendar__impact--high">
<div class="calendar__impact-icon calendar__impact-icon--screen">
<span title="High Impact Expected" class="high"></span>
</div>
<div class="calendar__impact-icon calendar__impact-icon--print">
<img src="https://resources.faireconomy.media/images/sprites/mm-impact-red.png" alt="" width="14" height="12">
</div>
</td>
<td class="calendar__cell calendar__currency currency calendar__currency--right-of-impact" title="United States">
US
</td>
<td class="calendar__cell calendar__event event">
<div>
<span class="calendar__event-title">Average Hourly Earnings m/m</span>
</div>
</td>
</tr>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for calendar_row in soup.select("tr.calendar_row"):
if not calendar_row.find(class_="high"):
continue
event = calendar_row.find(class_="event")
print(event.get_text(strip=True))
打印:
Average Hourly Earnings m/m
或者:只使用 CSS 选择器:
event = soup.select_one(".calendar_row:has(.high) .event")
print(event.get_text(strip=True))