【问题标题】:Scraping values from table with Beautiful Soup用 Beautiful Soup 从表中提取值
【发布时间】:2018-06-14 20:39:36
【问题描述】:

我正在尝试抓取一些历史天气数据,但不知道如何从表格中提取值。我已经能够打印表中的行,但是当我尝试从每一行中提取“td”(更具体地说是值)时,我得到一个属性错误。这是我目前所拥有的:

import requests
from random import choice
from bs4 import BeautifulSoup
import pandas as pd

#---------------------------------------------------------------------------------------#
url = "https://www.wunderground.com/history/airport/KORD/2017/4/1/CustomHistory.html?dayend=10&monthend=4&yearend=2017&req_city=&req_state=&req_statename=&reqdb.zip=&reqdb.magic=&reqdb.wmo="
page = requests.get(url)

soup = BeautifulSoup(page.text,"lxml")
#---------------------------------------------------------------------------------------#
table = soup.find('table', id='obsTable')

table_head = table.find('thead')
header_1 = []    
for th in table_head.find_all('th'):
    key_1 = th.get_text()
    header_1.append(key_1)
#---------------------------------------------------------------------------------------#
table_head_2 = table.find_all('tr')[1]
header_2 = []
for td in table_head_2.find_all('td'):
    key_2 = td.get_text()
    header_2.append(key_2)
#---------------------------------------------------------------------------------------#    
rows = table.find_all('tr')[2]

for row in rows.find_all('td'):
    print(row)

当我打印单行数据时返回:

<tr>
<td><a href="/history/airport/KORD/2017/4/1/DailyHistory.html">1</a></td>
<td>
<span class="wx-value">59</span>
</td>
<td>
<span class="wx-value">47</span>
</td>
<td>
<span class="wx-value">34</span>
</td>
<td>
<span class="wx-value">31</span>
</td>
<td>
<span class="wx-value">23</span>
</td>
<td>
<span class="wx-value">16</span>
</td>
<td>
<span class="wx-value">82</span>
</td>
<td>
<span class="wx-value">51</span>
</td>
<td>
<span class="wx-value">20</span>
</td>
<td>
<span class="wx-value">30.24</span>
</td>
<td>
<span class="wx-value">30.19</span>
</td>
<td>
<span class="wx-value">30.09</span>
</td>
<td>
<span class="wx-value">10</span>
</td>
<td>
<span class="wx-value">10</span>
</td>
<td>
<span class="wx-value">10</span>
</td>
<td>
<span class="wx-value">13</span>
</td>
<td>
<span class="wx-value">6</span>
</td>
<td>
<span class="wx-value">17</span>
</td>
<td>
<span class="wx-value">0.00</span>
</td>
<td>
     
</td>
</tr>

如果有人可以帮助我,我将不胜感激。

【问题讨论】:

  • 打印(row.text)

标签: python beautifulsoup


【解决方案1】:

如果您只是想打印值,这可以通过调用row.text 属性来完成。但是,值周围有很多空白,因此您需要strip() 它们。

for row in rows.find_all('td'):
    print(row.text.strip())

将返回:

1
59
47
34
31
23
16
82
51
20
30.24
30.19
30.09
10
10
10
13
6
17
0.00

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-07-03
    • 2011-11-03
    • 1970-01-01
    • 1970-01-01
    • 2013-04-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多