【问题标题】:cannot extract some of the div tags with beautifulsoup无法使用 beautifulsoup 提取某些 div 标签
【发布时间】:2019-06-13 09:58:30
【问题描述】:

我想从网站“https://www.timeanddate.com/weather”获取所需地区和日期的每日天气。但是我无法使用以下代码访问以下 div 类。我该怎么办?

我尝试使用 BeautifulSoup 进行提取。我要提取的信息在 'temp' 和 'wdesc' div 类中(它们是度数和天气情况,例如;“过云”左右。)所以我尝试了以下代码;

import requests

url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'

result = requests.get(url, verify = False)
soup = BeautifulSoup(result.text, "html.parser")
w1 = soup.findAll('div', attrs ={'class':'temp'})
w2 = soup.findAll('div', attrs ={'class':'wdesc'})

我希望从 w1 获得天气度数 (13 / 11 °C),从 w2 (Scattered cloud.) 获得天气状况。但相反,我从 w1 和 w2 获得了两个空列表。

【问题讨论】:

  • 如果我去这个网站查看源代码。我在任何地方都找不到类 wdesc 的 div。您确定您正在寻找正确的位置吗?
  • @Oddmar Dam 当我打开“检查元素”并单击检查器到网站中的“过云”文本时,它会在 div 中显示“wdesc”类。
  • 是的,我可以在 Firefox Web 开发人员工具栏中看到它,但我不确定如果你只是在 python 中下载文本,你得到的结果是否相同。你能在 python 中为结果中的 wdesc 做一个查找文本吗?
  • 是的,你是对的,python 中的汤和检查元素不一样。他们不应该是一样的吗?如果没有,您知道如何获取检查元素吗?否则,是不是很难提取出你想要的东西?
  • ``` import requests url = 'timeanddate.com/weather/spain/salou/…' result = requests.get(url, verify = False) print(result.text.find('wdesc')) ``` 结果是-1,未找到

标签: python web-scraping beautifulsoup


【解决方案1】:

您的数据在脚本中,因此其中一种解决方案是使用 Selenium 。如果你还没有安装,你可以安装它:

https://chromedriver.storage.googleapis.com/index.html?path=2.35/

这是代码:

from  selenium import webdriver

driver_path = r'chromedriverpath'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014")
meta = browser.execute_script('return data')


my_json_string = meta['detail']

print my_json_string

输出:

[{u'hlsh': u'1 Oca', u'templow': 4, u'temp': 14, u'hum': 77, u'hls': u'1 Oca \xc7ar', u'ts': u'06:00', u'wd': 30, u'wind': 5, u'hl': True, u'date': 1388556000000, u'icon': 2, u'ds ': u'1 Ocak 2014 \xc7ar\u015famba, 06:00 \u2014 12:00', u'baro': 1019, u'desc': u'过云。'}, {u'templow': 11, u'temp': 15, u'hum': 72, u'ts': u'12:00', u'wd': 210, u'wind': 9, u'date': 1388577600000, u'desc ': u'过云。', u'ds': u'1 Ocak 2014 \xc7ar\u015famba, 12:00 \u2014 18:00', u'baro': 1016, u'icon': 2}, { u'templow': 9, u'temp': 11, u'hum': 90, u'ts': u'18:00', u'wd': 0, u'wind': 5, u'date ': 1388599200000, u'desc': u'过云。', u'ds': u'1 Ocak 2014 \xc7ar\u015famba, 18:00 \u2014 00:00', u'baro': 1015, u' icon': 14}, {u'hlsh': u'2 Oca', u'wd': 0, u'hum': 0, u'hls': u'2 Oca Per', u'ts': u '00:00', u'wind': 0, u'hl': True, u'date': 1388620800000, u'icon': 36, u'ds': u'2 Ocak 2014 Per\u015fembe, 00: 00 \u2014 06:00', u'baro': 0, u'desc': u'没有可用的天气数据'}, {u'templow': 6, u'te mp': 15, u'hum': 93, u'ts': u'06:00', u'wd': 0, u'wind': 6, u'date': 1388642400000, u'desc': u'Passing clouds.', u'ds': u'2 Ocak 2014 Per\u015fembe, 06:00 \u2014 12:00', u'baro': 1013, u'icon': 2}, {u'templow ': 15, u'temp': 18, u'hum': 61, u'ts': u'12:00', u'wd': 0, u'wind': 7, u'date': 1388664000000 , u'desc': u'Passing clouds.', u'ds': u'2 Ocak 2014 Per\u015fembe, 12:00 \u2014 18:00', u'baro': 1013, u'icon': 2 }, {u'templow': 13, u'temp': 15, u'hum': 80, u'ts': u'18:00', u'wd': 0, u'wind': 4, u'date': 1388685600000, u'desc': u'Passing clouds.', u'ds': u'2 Ocak 2014 Per\u015fembe, 18:00 \u2014 00:00', u'baro': 1014, u'icon': 14}, {u'hlsh': u'3 Oca', u'wd': 0, u'hum': 0, u'hls': u'3 Oca Cum', u'ts' : u'00:00', u'wind': 0, u'hl': True, u'date': 1388707200000, u'icon': 36, u'ds': u'3 Ocak 2014 Cuma, 00: 00 \u2014 06:00', u'baro': 0, u'desc': u'没有可用的天气数据'}, {u'templow': 9, u'temp': 18, u'hum': 76 , u'ts': u'06:00', u'wd': 0, u'wind': 6, u'date': 1388728800000, u'desc': u'Passin g cloud.', u'ds': u'3 Ocak 2014 Cuma, 06:00 \u2014 12:00', u'baro': 1015, u'icon': 2}, {u'templow': 17, u'temp': 20, u'hum': 55, u'ts': u'12:00', u'wd': 290, u'wind': 11, u'date': 1388750400000, u'desc ': u'Passing clouds.', u'ds': u'3 Ocak 2014 Cuma, 12:00 \u2014 18:00', u'baro': 1016, u'icon': 2}, .. UP TO结束

当你到达这些列表时,你可以用 json 或其他东西来解析它。使用 selenium 是一种选择

【讨论】:

  • 谢谢!我应该写成“chrome 驱动程序路径”吗?
  • 它是你下载的路径。我的意思是像 C://Download//chromedriver.exe @GulsahAyhan
  • 我要再问你一个问题@OmerTekbiyik,你是如何在编码 meta = browser.execute_script('return data') 时决定“返回数据”的。我的意思是,使用不同的网址它不起作用,我只是想知道你是如何找到这个“返回数据”部分的?
  • 当您看到 url 的来源时,脚本中的 datas 变量名称为 'data' 。 Meta 正在返回变量名为“数据”的脚本。可以根据脚本的变量名@GulsahAyhan进行更改
【解决方案2】:

我想你想看看名为 wt-his 的表。 它似乎有您要查找的所有值的行。

【讨论】:

    【解决方案3】:

    如果你只想要表格(只要它在<table> 标签下),使用 Pandas 拉取它比直接使用 BeautifulSoup 容易得多。

    import pandas as pd
    
    url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'
    tables = pd.read_html(url)
    df = tables[-1]
    

    输出:

    print (df.to_string())
                      Unnamed: 0_level_0                        Conditions                                                                                               Comfort                                                                                    Unnamed: 7_level_0                Unnamed: 8_level_0
                                    Time                Unnamed: 1_level_1                              Temp                           Weather                              Wind                Unnamed: 5_level_1                          Humidity                         Barometer                        Visibility
    0                  7:00 amWed, Jan 1                               NaN                             39 °F                            Clear.                             3 mph                                 ↑                               93%                         30.07 "Hg                             10 mi
    1                            7:30 am                               NaN                             41 °F                            Clear.                             3 mph                                 ↑                               87%                         30.07 "Hg                             10 mi
    2                            8:00 am                               NaN                             41 °F                   Passing clouds.                             5 mph                                 ↑                               87%                         30.07 "Hg                               NaN
    3                            8:30 am                               NaN                             43 °F                   Passing clouds.                             6 mph                                 ↑                               81%                         30.07 "Hg                               NaN
    4                            9:00 am                               NaN                             43 °F                   Passing clouds.                             2 mph                                 ↑                               87%                         30.07 "Hg                               NaN
    5                            9:30 am                               NaN                             46 °F                   Passing clouds.                             5 mph                                 ↑                               76%                         30.07 "Hg                               NaN
    6                           10:00 am                               NaN                             48 °F                   Passing clouds.                             3 mph                                 ↑                               76%                         30.09 "Hg                               NaN
    7                           10:30 am                               NaN                             54 °F                   Passing clouds.                           No wind                                 ↑                               67%                         30.09 "Hg                               NaN
    8                           11:00 am                               NaN                             55 °F                   Passing clouds.                           No wind                                 ↑                               63%                         30.09 "Hg                               NaN
    9                           11:30 am                               NaN                             55 °F                   Passing clouds.                             3 mph                                 ↑                               63%                         30.09 "Hg                               NaN
    10                          12:00 pm                               NaN                             57 °F                   Passing clouds.                             6 mph                                 ↑                               63%                         30.07 "Hg                               NaN
    11                          12:30 pm                               NaN                             57 °F                   Passing clouds.                             8 mph                                 ↑                               67%                         30.07 "Hg                               NaN
    12                           1:00 pm                               NaN                             59 °F                   Passing clouds.                             9 mph                                 ↑                               68%                         30.04 "Hg                               NaN
    13                           2:00 pm                               NaN                             59 °F                   Passing clouds.                            10 mph                                 ↑                               68%                         30.01 "Hg                               NaN
    14                           2:30 pm                               NaN                             59 °F                   Passing clouds.                             9 mph                                 ↑                               68%                         30.01 "Hg                               NaN
    15                           3:00 pm                               NaN                             57 °F                   Passing clouds.                             7 mph                                 ↑                               67%                         30.01 "Hg                               NaN
    16                           3:30 pm                               NaN                             57 °F                   Passing clouds.                             6 mph                                 ↑                               67%                         29.98 "Hg                               NaN
    17                           4:00 pm                               NaN                             57 °F                   Passing clouds.                             3 mph                                 ↑                               72%                         30.01 "Hg                               NaN
    18                           4:30 pm                               NaN                             55 °F                   Passing clouds.                             3 mph                                 ↑                               77%                         30.01 "Hg                               NaN
    19                           5:00 pm                               NaN                             55 °F                   Passing clouds.                             1 mph                                 ↑                               77%                         29.98 "Hg                               NaN
    20                           5:30 pm                               NaN                             54 °F                   Passing clouds.                           No wind                                 ↑                               82%                         30.01 "Hg                               NaN
    21                           6:00 pm                               NaN                             52 °F                   Passing clouds.                             1 mph                                 ↑                               88%                         29.98 "Hg                               NaN
    22                           6:30 pm                               NaN                             52 °F                   Passing clouds.                             1 mph                                 ↑                               88%                         29.98 "Hg                               NaN
    23                           7:30 pm                               NaN                             50 °F                   Passing clouds.                             3 mph                                 ↑                               94%                         29.98 "Hg                               NaN
    24                           8:00 pm                               NaN                             50 °F                   Passing clouds.                             3 mph                                 ↑                               94%                         29.98 "Hg                               NaN
    25                           8:30 pm                               NaN                             52 °F                   Passing clouds.                             7 mph                                 ↑                               88%                         29.98 "Hg                               NaN
    26                           9:00 pm                               NaN                             52 °F                   Passing clouds.                             5 mph                                 ↑                               82%                         29.98 "Hg                               NaN
    27                           9:30 pm                               NaN                             50 °F                   Passing clouds.                             5 mph                                 ↑                               88%                         29.98 "Hg                               NaN
    28                          10:00 pm                               NaN                             48 °F       Light rain. Passing clouds.                             1 mph                                 ↑                               94%                         29.95 "Hg                               NaN
    29  
    

    补充:

    要获取多天,我们将通过 ajax 获取数据。我们将遍历这些请求。我们还需要对返回的信息进行轻微处理,因为它不是完全有效的 json 格式,但似乎是一致的,所以应该不是问题。

    注意:您需要更改 start_datenum_of_days 才能获得所需的内容。此示例从 2014 年 1 月 1 日开始,将得到当天以及接下来的 9 天(总共 10 天)

    import requests
    from bs4 import BeautifulSoup
    import json
    import pandas as pd
    import re
    import datetime
    
    start_date = '20140101'
    num_of_days = 10
    
    url = 'https://www.timeanddate.com/scripts/cityajax.php'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
    
    datetime_object = datetime.datetime.strptime(start_date, '%Y%m%d')
    
    
    results = pd.DataFrame()
    for x in range(num_of_days):
        parse_time = datetime_object + datetime.timedelta(days=x)
        str_time = parse_time.strftime('%Y%m%d')
        month = parse_time.strftime('%#m')
        year = parse_time.strftime('%Y')
    
        payload = {
        'n': 'spain/salou',
        'mode': 'historic',
        'hd': str_time,
        'month': month,
        'year': year,
        'json': '1'}
    
        jsonStr = requests.get(url, headers=headers, params=payload).text
        jsonStr = jsonStr.replace('c:','"c":')
        jsonStr = jsonStr.replace('h:','"h":')
        jsonStr = jsonStr.replace('s:','"s":')
    
        jsonData = json.loads(jsonStr)
    
        for alpha in jsonData:
            row = alpha['c']
    
            try:
                date = BeautifulSoup(row[0]['h'], 'html.parser').find('span').text
            except:
                pass
    
            time = re.findall(r'\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm]))', BeautifulSoup(row[0]['h'], 'html.parser').text)[0][0]
            condition = BeautifulSoup(row[3]['h'], 'html.parser').text
            temp = BeautifulSoup(row[2]['h'], 'html.parser').text.replace('\xa0', ' ')
            wspd = BeautifulSoup(row[4]['h'], 'html.parser').text
            wdir = BeautifulSoup(row[5]['h'], 'html.parser').text
            wdesc = BeautifulSoup(row[5]['h'], 'html.parser').find('span')['title']
            humd = BeautifulSoup(row[6]['h'], 'html.parser').text
            barm = BeautifulSoup(row[7]['h'], 'html.parser').text
            vis = BeautifulSoup(row[8]['h'], 'html.parser').text.replace('\xa0', ' ')
    
            temp_df = pd.DataFrame([[date, time, temp, condition, wspd, wdir, wdesc, humd, barm, vis]], columns = ['Date', 'Time', 'Temp', 'Weather', 'Wind Speed', 'Wind Direction', 'Wind Description', 'Humidity', 'Barometer', 'Visibility'])
    
            print ('Processed: %s %s' %(date, time))
            results = results.append(temp_df).reset_index(drop=True)
    

    输出:

    print (results)
                Date      Time   Temp  ... Humidity  Barometer Visibility
    0     Wed, Jan 1   7:00 am  39 °F  ...      93%  30.07 "Hg      10 mi
    1     Wed, Jan 1   7:30 am  41 °F  ...      87%  30.07 "Hg      10 mi
    2     Wed, Jan 1   8:00 am  41 °F  ...      87%  30.07 "Hg        N/A
    3     Wed, Jan 1   8:30 am  43 °F  ...      81%  30.07 "Hg        N/A
    4     Wed, Jan 1   9:00 am  43 °F  ...      87%  30.07 "Hg        N/A
    5     Wed, Jan 1   9:30 am  46 °F  ...      76%  30.07 "Hg        N/A
    6     Wed, Jan 1  10:00 am  48 °F  ...      76%  30.09 "Hg        N/A
    7     Wed, Jan 1  10:30 am  54 °F  ...      67%  30.09 "Hg        N/A
    8     Wed, Jan 1  11:00 am  55 °F  ...      63%  30.09 "Hg        N/A
    9     Wed, Jan 1  11:30 am  55 °F  ...      63%  30.09 "Hg        N/A
    10    Wed, Jan 1  12:00 pm  57 °F  ...      63%  30.07 "Hg        N/A
    11    Wed, Jan 1  12:30 pm  57 °F  ...      67%  30.07 "Hg        N/A
    12    Wed, Jan 1   1:00 pm  59 °F  ...      68%  30.04 "Hg        N/A
    13    Wed, Jan 1   2:00 pm  59 °F  ...      68%  30.01 "Hg        N/A
    14    Wed, Jan 1   2:30 pm  59 °F  ...      68%  30.01 "Hg        N/A
    15    Wed, Jan 1   3:00 pm  57 °F  ...      67%  30.01 "Hg        N/A
    16    Wed, Jan 1   3:30 pm  57 °F  ...      67%  29.98 "Hg        N/A
    17    Wed, Jan 1   4:00 pm  57 °F  ...      72%  30.01 "Hg        N/A
    18    Wed, Jan 1   4:30 pm  55 °F  ...      77%  30.01 "Hg        N/A
    19    Wed, Jan 1   5:00 pm  55 °F  ...      77%  29.98 "Hg        N/A
    20    Wed, Jan 1   5:30 pm  54 °F  ...      82%  30.01 "Hg        N/A
    21    Wed, Jan 1   6:00 pm  52 °F  ...      88%  29.98 "Hg        N/A
    22    Wed, Jan 1   6:30 pm  52 °F  ...      88%  29.98 "Hg        N/A
    23    Wed, Jan 1   7:30 pm  50 °F  ...      94%  29.98 "Hg        N/A
    24    Wed, Jan 1   8:00 pm  50 °F  ...      94%  29.98 "Hg        N/A
    25    Wed, Jan 1   8:30 pm  52 °F  ...      88%  29.98 "Hg        N/A
    26    Wed, Jan 1   9:00 pm  52 °F  ...      82%  29.98 "Hg        N/A
    27    Wed, Jan 1   9:30 pm  50 °F  ...      88%  29.98 "Hg        N/A
    28    Wed, Jan 1  10:00 pm  48 °F  ...      94%  29.95 "Hg        N/A
    29    Thu, Jan 2   7:00 am  43 °F  ...     100%  29.89 "Hg        N/A
    ..           ...       ...    ...  ...      ...        ...        ...
    307  Sat, Jan 11   7:30 am  52 °F  ...      82%  30.07 "Hg        N/A
    308  Sat, Jan 11   8:00 am  52 °F  ...      82%  30.07 "Hg        N/A
    309  Sat, Jan 11   8:30 am  54 °F  ...      82%  30.07 "Hg        N/A
    310  Sat, Jan 11   9:00 am  54 °F  ...      77%  30.09 "Hg        N/A
    311  Sat, Jan 11   9:30 am  54 °F  ...      82%  30.09 "Hg        N/A
    312  Sat, Jan 11  10:00 am  54 °F  ...      82%  30.12 "Hg       4 mi
    313  Sat, Jan 11  10:30 am  54 °F  ...      82%  30.12 "Hg       4 mi
    314  Sat, Jan 11  11:00 am  54 °F  ...      82%  30.12 "Hg       4 mi
    315  Sat, Jan 11  11:30 am  55 °F  ...      77%  30.12 "Hg       4 mi
    316  Sat, Jan 11  12:00 pm  57 °F  ...      72%  30.12 "Hg       4 mi
    317  Sat, Jan 11  12:30 pm  57 °F  ...      72%  30.12 "Hg        N/A
    318  Sat, Jan 11   1:00 pm  57 °F  ...      72%  30.09 "Hg        N/A
    319  Sat, Jan 11   1:30 pm  57 °F  ...      72%  30.09 "Hg        N/A
    320  Sat, Jan 11   2:00 pm  57 °F  ...      72%  30.09 "Hg        N/A
    321  Sat, Jan 11   2:30 pm  59 °F  ...      72%  30.09 "Hg        N/A
    322  Sat, Jan 11   3:00 pm  59 °F  ...      72%  30.07 "Hg        N/A
    323  Sat, Jan 11   3:30 pm  59 °F  ...      72%  30.09 "Hg        N/A
    324  Sat, Jan 11   4:00 pm  57 °F  ...      77%  30.09 "Hg        N/A
    325  Sat, Jan 11   4:30 pm  57 °F  ...      77%  30.09 "Hg        N/A
    326  Sat, Jan 11   5:00 pm  55 °F  ...      88%  30.09 "Hg        N/A
    327  Sat, Jan 11   5:30 pm  55 °F  ...      88%  30.09 "Hg       6 mi
    328  Sat, Jan 11   6:00 pm  55 °F  ...      88%  30.12 "Hg       3 mi
    329  Sat, Jan 11   6:30 pm  55 °F  ...      94%  30.12 "Hg       3 mi
    330  Sat, Jan 11   7:00 pm  55 °F  ...      94%  30.12 "Hg       4 mi
    331  Sat, Jan 11   7:30 pm  54 °F  ...     100%  30.12 "Hg       4 mi
    332  Sat, Jan 11   8:00 pm  54 °F  ...     100%  30.15 "Hg       6 mi
    333  Sat, Jan 11   8:30 pm  54 °F  ...     100%  30.15 "Hg       6 mi
    334  Sat, Jan 11   9:00 pm  54 °F  ...     100%  30.15 "Hg       6 mi
    335  Sat, Jan 11   9:30 pm  54 °F  ...      94%  30.15 "Hg       6 mi
    336  Sat, Jan 11  10:00 pm  54 °F  ...      94%  30.15 "Hg       6 mi
    
    [337 rows x 10 columns]
    

    【讨论】:

    • 谢谢,这是我可以使用的替代解决方案。
    • 这个解决方案还有一个问题;我需要获取每日天气,我的意思是这些表只有 1 天的天气信息。如何获取本月的所有天数?
    • @GulsahAyhan,那么我们需要找到另一种方法。再次检查我的解决方案以进行编辑。
    • 非常感谢,很好的解决方案!
    猜你喜欢
    • 1970-01-01
    • 2018-09-06
    • 1970-01-01
    • 1970-01-01
    • 2019-12-18
    • 2014-06-16
    • 2018-05-08
    • 2014-12-01
    • 1970-01-01
    相关资源
    最近更新 更多