【问题标题】:pandas read_html - no tables foundpandas read_html - 找不到表
【发布时间】:2019-08-17 00:20:54
【问题描述】:

我正在尝试查看是否可以从 WU.com 读取数据表,但是我收到了一个类型错误,因为找不到表。 (这里也是网络抓取的第一个计时器)还有另一个人有一个非常相似的 stackoverflow 问题here 与 WU 数据表,但解决方案对我来说有点复杂。

import pandas as pd

df_list = pd.read_html('https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26')

print(df_list)

On the webpage of historical data for Milwaukee,这是我试图检索到 Pandas 中的数据表 (daily observations):

任何提示都有帮助,谢谢。

【问题讨论】:

  • print(df_list[0]) 呢?

标签: python pandas web-scraping beautifulsoup


【解决方案1】:

页面是动态的,这意味着您需要先呈现页面。所以你需要使用 Selenium 之类的东西来呈现页面,然后你可以使用 pandas .read_html() 拉表:

from selenium import webdriver
import pandas as pd


driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get("https://www.wunderground.com/history/daily/us/wi/milwaukee/KMKE/date/2013-6-26")

html = driver.page_source

tables = pd.read_html(html)
data = tables[1]

driver.close()

输出:

print (data)
        Time Temperature      ...       Precip Accum      Condition
0    6:52 PM        68 F      ...             0.0 in  Mostly Cloudy
1    7:52 PM        69 F      ...             0.0 in  Mostly Cloudy
2    8:52 PM        70 F      ...             0.0 in  Mostly Cloudy
3    9:52 PM        67 F      ...             0.0 in         Cloudy
4   10:52 PM        65 F      ...             0.0 in  Partly Cloudy
5   11:42 PM        66 F      ...             0.0 in  Mostly Cloudy
6   11:52 PM        68 F      ...             0.0 in  Mostly Cloudy
7   12:08 AM        68 F      ...             0.0 in         Cloudy
8   12:52 AM        68 F      ...             0.0 in  Mostly Cloudy
9    1:52 AM        70 F      ...             0.0 in         Cloudy
10   2:13 AM        70 F      ...             0.0 in         Cloudy
11   2:52 AM        71 F      ...             0.0 in         Cloudy
12   3:52 AM        70 F      ...             0.0 in  Mostly Cloudy
13   4:19 AM        70 F      ...             0.0 in         Cloudy
14   4:29 AM        70 F      ...             0.0 in         Cloudy
15   4:52 AM        70 F      ...             0.0 in         Cloudy
16   5:25 AM        70 F      ...             0.0 in  Mostly Cloudy
17   5:52 AM        71 F      ...             0.0 in         Cloudy
18   6:52 AM        73 F      ...             0.0 in         Cloudy
19   7:52 AM        74 F      ...             0.0 in         Cloudy
20   8:52 AM        73 F      ...             0.0 in         Cloudy
21   9:52 AM        71 F      ...             0.0 in         Cloudy
22  10:52 AM        71 F      ...             0.0 in         Cloudy
23  11:52 AM        70 F      ...             0.0 in         Cloudy
24  12:52 PM        72 F      ...             0.0 in  Mostly Cloudy
25   1:52 PM        70 F      ...             0.0 in  Mostly Cloudy
26   2:52 PM        71 F      ...             0.0 in  Mostly Cloudy
27   3:52 PM        71 F      ...             0.0 in  Partly Cloudy
28   4:52 PM        68 F      ...             0.0 in  Mostly Cloudy
29   5:52 PM        66 F      ...             0.0 in  Mostly Cloudy

[30 rows x 11 columns]

【讨论】:

    【解决方案2】:

    如果你想访问一个不存在的文件,还要检查你的文件名是否正确,你会得到同样的错误“没有找到表” 我在 X.htm 上犯了错误,正在查看 X.html

    【讨论】:

      猜你喜欢
      • 2019-04-23
      • 2020-12-18
      • 1970-01-01
      • 1970-01-01
      • 2022-11-25
      • 2019-12-23
      • 1970-01-01
      • 1970-01-01
      • 2015-04-26
      相关资源
      最近更新 更多