使用 beautifulsoup 从 span 标签中抓取数据答案

【问题标题】：To scrape the data from span tag using beautifulsoup使用 beautifulsoup 从 span 标签中抓取数据
【发布时间】：2020-05-19 22:32:33
【问题描述】：

我正在尝试抓取网页，我需要将整个表格解码为数据帧。为此，我正在使用漂亮的汤。在某些td 标签中，有span 标签没有任何文本。但是这些值显示在该特定跨度标记中的网页上。

下面的html代码对应那个网页，

<td>
  <span class="nttu">::after</span>
  <span class="ntbb">::after</span>
  <span class="ntyc">::after</span>
  <span class="nttu">::after</span>
</td>

但是，这个td 标签中显示的值是23.8。我试图刮掉它，但我得到的是空文本。

如何使用漂亮的汤来刮取这个值。

网址：https://en.tutiempo.net/climate/ws-432950.html

我的代码用于抓取下表给出，

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")
climate_table = soup.find("table", attrs={"class": "medias mensuales numspan"})
climate_data = climate_table.find_all("tr")
for data in climate_data[1:-2]:
  table_data = data.find_all("td")
  row_data = []
  for row in table_data:
    row_data.append(row.get_text())
  climate_df.loc[len(climate_df)] = row_data

【问题讨论】：

页面可能是动态的，您需要从呈现的页面中提取 html。除非您分享网址，否则没有人能提供更多帮助
@chitown88，我已经添加了该站点的 URL，您可以在其中发现第 5 行本身存在问题。谢谢
你最好包含你的代码，否则很难看出问题是什么;)
@ThananjayaS，你只是想拉那张桌子吗？
@Isma，我已添加代码供您参考，谢谢

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

误解了您的问题，因为您引用了 2 个不同的网址。我现在明白你的意思了。

是的，很奇怪，在第二个表格中，他们使用 CSS 填充了其中一些 <td> 标记的内容。您需要做的是从<style> 标签中提取那些特殊情况。一旦你有了它，你就可以替换 html 源代码中的那些元素，最后将它解析成一个数据框。我使用 pandas，因为它在后台使用 BeautifulSoup 来解析 <table> 标签。但我相信这会让你得到你想要的：

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")

hiddenData = str(soup.find_all('style')[1])
hiddenSpan = {}
for group in re.findall(r'span\.(.+?)}',hiddenData):
    class_attr = group.split('span.')[-1].split('::')[0]
    content = group.split('"')[1]
    hiddenSpan[class_attr] = content

climate_table = str(soup.find("table", attrs={"class": "medias mensuales numspan"}))   
for k, v in hiddenSpan.items():
    climate_table = climate_table.replace('<span class="%s"></span>' %(k), hiddenSpan[k])


df = pd.read_html(climate_table)[0]

输出：

print (df.to_string())
                          Day                          T                         TM                         Tm                        SLP                          H                         PP                         VV                          V                         VM                         VG                         RA                         SN                         TS                         FG
0                           1                       23.4                       30.3                         19                          -                         59                          0                        6.3                        4.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
1                           2                       22.4                       30.3                       16.9                          -                         57                          0                        6.9                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
2                           3                         24                       31.8                       16.9                          -                         51                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
3                           4                       24.2                         32                       17.4                          -                         53                          0                          6                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
4                           5                       23.8                         32                         18                          -                         58                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
5                           6                       23.3                         31                       18.3                          -                         60                          0                        6.9                          5                        9.4                          -                        NaN                        NaN                        NaN                        NaN
6                           7                       22.8                       30.2                       17.6                          -                         55                          0                        7.7                        3.7                        7.6                          -                        NaN                        NaN                        NaN                        NaN
7                           8                       23.1                       30.6                       17.4                          -                         46                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
8                           9                       22.9                       30.6                       17.4                          -                         51                          0                        6.9                        3.5                        3.5                          -                        NaN                        NaN                        NaN                        NaN
9                          10                       22.3                         30                         17                          -                         56                          0                        6.3                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
10                         11                       22.3                       29.4                         17                          -                         53                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
11                         12                       21.8                       29.4                       15.7                          -                         54                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
12                         13                       22.3                       30.1                       15.7                          -                         43                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
13                         14                       21.8                       30.6                       14.8                          -                         41                          0                        6.9                        1.9                        5.4                          -                        NaN                        NaN                        NaN                        NaN
14                         15                       21.6                       30.6                       14.2                          -                         43                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
15                         16                       21.1                       29.9                       15.4                          -                         55                          0                        6.9                        4.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
16                         17                       20.4                       28.1                       15.4                          -                         59                          0                        6.9                          5                       11.1                          -                        NaN                        NaN                        NaN                        NaN
17                         18                       21.2                       28.3                       14.5                          -                         53                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
18                         19                       21.6                       29.6                       16.4                          -                         58                          0                        6.9                        2.2                        3.5                          -                        NaN                        NaN                        NaN                        NaN
19                         20                       21.9                       29.6                       16.6                          -                         58                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
20                         21                       22.3                       29.9                       17.5                          -                         55                          0                        6.9                        3.1                        5.4                          -                        NaN                        NaN                        NaN                        NaN
21                         22                       21.9                       29.9                       15.1                          -                         46                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
22                         23                       21.3                         29                       15.2                          -                         50                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
23                         24                       21.3                       28.8                       14.6                          -                         45                          0                        6.9                          3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
24                         25                       21.6                       29.1                       15.5                          -                         47                          0                        7.7                        4.8                        7.6                          -                        NaN                        NaN                        NaN                        NaN
25                         26                       21.8                       29.2                       14.6                          -                         41                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
26                         27                       22.3                       30.1                       15.6                          -                         40                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
27                         28                       22.4                       30.3                         16                          -                         51                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
28                         29                         23                       30.3                       16.9                          -                         53                          0                        6.6                        2.8                        5.4                          -                        NaN                        NaN                        NaN                          o
29                         30                       23.1                         30                       17.8                          -                         54                          0                        6.9                        5.4                        7.6                          -                        NaN                        NaN                        NaN                        NaN
30                         31                       22.1                       29.8                       17.3                          -                         54                          0                        6.9                        5.2                        9.4                          -                        NaN                        NaN                        NaN                        NaN
31  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:
32                        NaN                       22.3                         30                       16.4                          -                       51.6                          0                        6.9                        3.5                        6.3                        NaN                          0                          0                          0                          1

【讨论】：

不知道为什么这最初被否决了。但现在我明白了。您在问题中使用了 2 个不同的 URL。我在看第一个，这是我提供的表格。但是使用的代码以及您实际指的是给您带来了问题。
@Thananjaya S，我更新了代码来回答你的问题。鉴于 2 个不同的 url 引用，最初并不清楚，但再看一遍我明白你的意思了。