【问题标题】:BeautifulSoup: Reading Span Class ElementsBeautifulSoup:阅读跨度类元素
【发布时间】:2020-08-28 22:27:02
【问题描述】:
我在使用 Python 中的 beautifulsoup 和 requests 插件从特定页面跨类元素中抓取信息时遇到了一些问题。它一直给我返回空白信息:“”。这是我的代码:
headers = {'User-Agent':'Mozilla/5.0'}
res = requests.get('https://www.theweathernetwork.com/ca/weather/ontario/toronto')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
weather_elem = soup.find('span', {'class':'wxcondition'})
weather = weather_elem
print(weather)
return weather`
【问题讨论】:
标签:
python
json
web-scraping
beautifulsoup
python-requests
【解决方案1】:
数据是通过 JavaScript 加载的,所以 BeautifulSoup 什么也看不到。但是您可以使用requests 模块模拟Ajax:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.theweathernetwork.com/ca/weather/ontario/toronto'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
place_code = soup.select_one('link[rel="alternate"]')['href'].split('=')[-1].lower()
ajax_url = 'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' + place_code
data = requests.get(ajax_url).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
print(data['observation']['weatherCode']['text'])
打印:
Partly cloudy