【问题标题】:BeautifulSoup: Reading Span Class ElementsBeautifulSoup:阅读跨度类元素
【发布时间】:2020-08-28 22:27:02
【问题描述】:

我在使用 Python 中的 beautifulsoup 和 requests 插件从特定页面跨类元素中抓取信息时遇到了一些问题。它一直给我返回空白信息:“”。这是我的代码:

headers = {'User-Agent':'Mozilla/5.0'}
res = requests.get('https://www.theweathernetwork.com/ca/weather/ontario/toronto')
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text, 'html.parser')

weather_elem = soup.find('span', {'class':'wxcondition'})
weather = weather_elem
print(weather)
return weather`

【问题讨论】:

    标签: python json web-scraping beautifulsoup python-requests


    【解决方案1】:

    数据是通过 JavaScript 加载的,所以 BeautifulSoup 什么也看不到。但是您可以使用requests 模块模拟Ajax:

    import json
    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.theweathernetwork.com/ca/weather/ontario/toronto'
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')
    place_code = soup.select_one('link[rel="alternate"]')['href'].split('=')[-1].lower()
    ajax_url = 'https://weatherapi.pelmorex.com/api/v1/observation/placecode/' + place_code
    data = requests.get(ajax_url).json()
    
    # uncomment to print all data:
    # print(json.dumps(data, indent=4))
    
    print(data['observation']['weatherCode']['text'])
    

    打印:

    Partly cloudy
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-01-07
      • 2020-09-14
      • 1970-01-01
      相关资源
      最近更新 更多