【问题标题】:How to read html table as dataframe (urllib.error.URLError: <urlopen error unknown url type: https>)?如何将 html 表读取为数据框(urllib.error.URLError: <urlopen error unknown url type: https>)?
【发布时间】:2019-02-03 12:10:13
【问题描述】:

如果您能告诉我如何将 html 表格转换为数据框,我将不胜感激。

import pandas as pd
df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)

错误:

C:\Users\t\Anaconda3\python.exe C:/Users/t/Downloads/hyperopt12.py
Traceback (most recent call last):
  File "C:/Users/t/Downloads/hyperopt12.py", line 12, in <module>
    df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html
    displayed_only=displayed_only)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse
    raise_with_traceback(retained)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback
    raise exc.with_traceback(traceback)
urllib.error.URLError: <urlopen error unknown url type: https>

提前致谢。

【问题讨论】:

标签: python-3.x pandas


【解决方案1】:

您需要在页面上找到正确的表格才能阅读。 read_html 返回数据框对象的列表。请参阅文档here

import pandas as pd
tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
df = tables[2]
df

【讨论】:

  • 谢谢。同样的错误:tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html displayed_only=displayed_only) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse raise_with_traceback(retained) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback raise exc.with_traceback(traceback) urllib.error.URLError: &lt;urlopen error unknown url type: https&gt;
  • 这在 Jupyter Notebook 中为我工作。我无法重现您的错误。对不起。也许这篇文章会有用。 stackoverflow.com/questions/28376506/…>
  • 谢谢。我也在 R 中尝试过。我也无法在 R 中阅读它。同样,评论部分提供的链接也可以在 R 中阅读。
  • &gt; library(XML) &gt; library(RCurl) &gt; library(rlist) &gt; theurl &lt;- getURL("https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs",.opts = list(ssl.verifypeer = FALSE) ) Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version &gt; tables &lt;- readHTMLTable(theurl) Error in readHTMLTable(theurl) : object 'theurl' not found &gt; tables &lt;- list.clean(tables, fun = is.null, recursive = FALSE) Error in list.clean(tables, fun = is.null, recursive = FALSE) : object 'tables' not found
  • 我的 Pycharm 安装出现问题。不过,我接受了你的回答。谢谢。
猜你喜欢
  • 2018-12-04
  • 2018-08-02
  • 1970-01-01
  • 2015-01-22
  • 2019-07-22
  • 1970-01-01
  • 2021-12-06
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多