找不到表时如何优雅地处理pandas read_html？答案

【问题标题】：how to deal with pandas read_html gracefully when it fails to find a table?找不到表时如何优雅地处理pandas read_html？
【发布时间】：2021-11-05 18:03:15
【问题描述】：

pandas read_html 是一种非常快速的解析表格的方法；但是，如果它没有找到具有指定属性的表，它将失败并导致整个代码失败。

我正在尝试抓取数千个网页，如果仅仅因为没有找到一个表就导致错误并终止整个程序，这是非常烦人的。有没有办法捕获这些错误并让代码继续运行而不终止？

link = 'https://en.wikipedia.org/wiki/Barbados'  
req = requests.get(pink)
wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
df = wiki_table[0]

这会导致整个代码失败。我该如何处理？我觉得应该是和异常处理或者错误捕获有关的东西，但是我不熟悉python以及怎么做。

【问题讨论】：

标签： python pandas web-scraping exception wikipedia

【解决方案1】：

将pd.read_html 嵌入到try ... except ... 异常处理程序中

import requests
import pandas as pd

link = 'https://en.wikipedia.org/wiki/Barbados'
req = requests.get(link)

wiki_table = None 
try:
    wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
except TypeError as e: # to limit the catched exception to a minimal
    print(str(e)) # optional but useful

if wiki_table:
    df = wiki_table[0]
    
    # rest of your code

【讨论】：

【解决方案2】：

为此使用try catch

link = 'https://en.wikipedia.org/wiki/Barbados'  
req = requests.get(pink)
try:
    # No error
    wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
except:
    # Error
    print("Error") 
df = wiki_table[0]

【讨论】：

try 子句中的 # No error 注释有点误导，因为在这部分中可能会引发异常，except 子句会捕获错误。
谢谢，但这不会影响我认为的代码。