BeautifulSoup：阅读跨度类元素

【问题标题】：BeautifulSoup: Reading Span Class ElementsBeautifulSoup：阅读跨度类元素
【发布时间】：2020-08-28 22:27:02
【问题描述】：

我在使用 Python 中的 beautifulsoup 和 requests 插件从特定页面跨类元素中抓取信息时遇到了一些问题。它一直给我返回空白信息：“”。这是我的代码：

headers = {'User-Agent':'Mozilla/5.0'}
res = requests.get('https://www.theweathernetwork.com/ca/weather/ontario/toronto')
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text, 'html.parser')

weather_elem = soup.find('span', {'class':'wxcondition'})
weather = weather_elem
print(weather)
return weather`

【问题讨论】：

标签： python json web-scraping beautifulsoup python-requests

【解决方案1】：

数据是通过 JavaScript 加载的，所以 BeautifulSoup 什么也看不到。但是您可以使用requests 模块模拟Ajax：

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.theweathernetwork.com/ca/weather/ontario/toronto'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
place_code = soup.select_one('link[rel="alternate"]')['href'].split('=')[-1].lower()
ajax_url = 'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' + place_code
data = requests.get(ajax_url).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

print(data['observation']['weatherCode']['text'])

打印：

Partly cloudy

【讨论】：