【问题标题】:To scrape the data from span tag using beautifulsoup使用 beautifulsoup 从 span 标签中抓取数据
【发布时间】:2020-05-19 22:32:33
【问题描述】:

我正在尝试抓取网页,我需要将整个表格解码为数据帧。为此,我正在使用漂亮的汤。在某些td 标签中,有span 标签没有任何文本。但是这些值显示在该特定跨度标记中的网页上。

下面的html代码对应那个网页,

<td>
  <span class="nttu">::after</span>
  <span class="ntbb">::after</span>
  <span class="ntyc">::after</span>
  <span class="nttu">::after</span>
</td>

但是,这个td 标签中显示的值是23.8。我试图刮掉它,但我得到的是空文本。

如何使用漂亮的汤来刮取这个值。

网址:https://en.tutiempo.net/climate/ws-432950.html

我的代码用于抓取下表给出,

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")
climate_table = soup.find("table", attrs={"class": "medias mensuales numspan"})
climate_data = climate_table.find_all("tr")
for data in climate_data[1:-2]:
  table_data = data.find_all("td")
  row_data = []
  for row in table_data:
    row_data.append(row.get_text())
  climate_df.loc[len(climate_df)] = row_data

【问题讨论】:

  • 页面可能是动态的,您需要从呈现的页面中提取 html。除非您分享网址,否则没有人能提供更多帮助
  • @chitown88,我已经添加了该站点的 URL,您可以在其中发现第 5 行本身存在问题。谢谢
  • 你最好包含你的代码,否则很难看出问题是什么;)
  • @ThananjayaS,你只是想拉那张桌子吗?
  • @Isma,我已添加代码供您参考,谢谢

标签: python python-3.x web-scraping beautifulsoup


【解决方案1】:

误解了您的问题,因为您引用了 2 个不同的网址。我现在明白你的意思了。

是的,很奇怪,在第二个表格中,他们使用 CSS 填充了其中一些 &lt;td&gt; 标记的内容。您需要做的是从&lt;style&gt; 标签中提取那些特殊情况。一旦你有了它,你就可以替换 html 源代码中的那些元素,最后将它解析成一个数据框。我使用 pandas,因为它在后台使用 BeautifulSoup 来解析 &lt;table&gt; 标签。但我相信这会让你得到你想要的:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

http_url = "https://en.tutiempo.net/climate/01-2013/ws-432950.html"
retreived_data = requests.get(http_url).text

soup = BeautifulSoup(retreived_data, "lxml")

hiddenData = str(soup.find_all('style')[1])
hiddenSpan = {}
for group in re.findall(r'span\.(.+?)}',hiddenData):
    class_attr = group.split('span.')[-1].split('::')[0]
    content = group.split('"')[1]
    hiddenSpan[class_attr] = content

climate_table = str(soup.find("table", attrs={"class": "medias mensuales numspan"}))   
for k, v in hiddenSpan.items():
    climate_table = climate_table.replace('<span class="%s"></span>' %(k), hiddenSpan[k])


df = pd.read_html(climate_table)[0]

输出:

print (df.to_string())
                          Day                          T                         TM                         Tm                        SLP                          H                         PP                         VV                          V                         VM                         VG                         RA                         SN                         TS                         FG
0                           1                       23.4                       30.3                         19                          -                         59                          0                        6.3                        4.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
1                           2                       22.4                       30.3                       16.9                          -                         57                          0                        6.9                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
2                           3                         24                       31.8                       16.9                          -                         51                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
3                           4                       24.2                         32                       17.4                          -                         53                          0                          6                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
4                           5                       23.8                         32                         18                          -                         58                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
5                           6                       23.3                         31                       18.3                          -                         60                          0                        6.9                          5                        9.4                          -                        NaN                        NaN                        NaN                        NaN
6                           7                       22.8                       30.2                       17.6                          -                         55                          0                        7.7                        3.7                        7.6                          -                        NaN                        NaN                        NaN                        NaN
7                           8                       23.1                       30.6                       17.4                          -                         46                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
8                           9                       22.9                       30.6                       17.4                          -                         51                          0                        6.9                        3.5                        3.5                          -                        NaN                        NaN                        NaN                        NaN
9                          10                       22.3                         30                         17                          -                         56                          0                        6.3                        3.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
10                         11                       22.3                       29.4                         17                          -                         53                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
11                         12                       21.8                       29.4                       15.7                          -                         54                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
12                         13                       22.3                       30.1                       15.7                          -                         43                          0                        6.9                        2.8                        5.4                          -                        NaN                        NaN                        NaN                        NaN
13                         14                       21.8                       30.6                       14.8                          -                         41                          0                        6.9                        1.9                        5.4                          -                        NaN                        NaN                        NaN                        NaN
14                         15                       21.6                       30.6                       14.2                          -                         43                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
15                         16                       21.1                       29.9                       15.4                          -                         55                          0                        6.9                        4.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
16                         17                       20.4                       28.1                       15.4                          -                         59                          0                        6.9                          5                       11.1                          -                        NaN                        NaN                        NaN                        NaN
17                         18                       21.2                       28.3                       14.5                          -                         53                          0                        6.9                        3.1                        7.6                          -                        NaN                        NaN                        NaN                        NaN
18                         19                       21.6                       29.6                       16.4                          -                         58                          0                        6.9                        2.2                        3.5                          -                        NaN                        NaN                        NaN                        NaN
19                         20                       21.9                       29.6                       16.6                          -                         58                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
20                         21                       22.3                       29.9                       17.5                          -                         55                          0                        6.9                        3.1                        5.4                          -                        NaN                        NaN                        NaN                        NaN
21                         22                       21.9                       29.9                       15.1                          -                         46                          0                        6.9                        4.3                        7.6                          -                        NaN                        NaN                        NaN                        NaN
22                         23                       21.3                         29                       15.2                          -                         50                          0                        6.9                        3.3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
23                         24                       21.3                       28.8                       14.6                          -                         45                          0                        6.9                          3                        5.4                          -                        NaN                        NaN                        NaN                        NaN
24                         25                       21.6                       29.1                       15.5                          -                         47                          0                        7.7                        4.8                        7.6                          -                        NaN                        NaN                        NaN                        NaN
25                         26                       21.8                       29.2                       14.6                          -                         41                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
26                         27                       22.3                       30.1                       15.6                          -                         40                          0                        6.9                        2.4                        5.4                          -                        NaN                        NaN                        NaN                        NaN
27                         28                       22.4                       30.3                         16                          -                         51                          0                        6.9                        2.8                        3.5                          -                        NaN                        NaN                        NaN                        NaN
28                         29                         23                       30.3                       16.9                          -                         53                          0                        6.6                        2.8                        5.4                          -                        NaN                        NaN                        NaN                          o
29                         30                       23.1                         30                       17.8                          -                         54                          0                        6.9                        5.4                        7.6                          -                        NaN                        NaN                        NaN                        NaN
30                         31                       22.1                       29.8                       17.3                          -                         54                          0                        6.9                        5.2                        9.4                          -                        NaN                        NaN                        NaN                        NaN
31  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:  Monthly means and totals:
32                        NaN                       22.3                         30                       16.4                          -                       51.6                          0                        6.9                        3.5                        6.3                        NaN                          0                          0                          0                          1

【讨论】:

  • 不知道为什么这最初被否决了。但现在我明白了。您在问题中使用了 2 个不同的 URL。我在看第一个,这是我提供的表格。但是使用的代码以及您实际指的是给您带来了问题。
  • @Thananjaya S,我更新了代码来回答你的问题。鉴于 2 个不同的 url 引用,最初并不清楚,但再看一遍我明白你的意思了。