如何在 Python 中从 HTML 中提取多个表格答案

【问题标题】：How to extract multiple table from HTML in Python如何在 Python 中从 HTML 中提取多个表格
【发布时间】：2021-06-15 01:32:29
【问题描述】：

我想从html中提取安全公告表的所有数据https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html。根据我的代码，我只能将表中的数据一一提取出来。该代码无法从表中提取整体数据。

这是我的代码

soup = BeautifulSoup(html_content, "lxml")
print(soup.prettify())
gdp = soup.find_all("table")

table = gdp[0]
body = table.find_all("tr")
head = body[0]
body_rows = body[1:] 

headings = []
for item in head.find_all("td"): 
    item = (item.text).rstrip("\n")
    headings.append(item)

all_rows = [] # will be a list for list for all rows
for row_num in range(len(body_rows)): # A row at a time
    row = [] # this will old entries for one row
    for row_item in body_rows[row_num].find_all("td"): 
        aa = re.sub("(\xa0)|(\n)|,","",row_item.text)
        row.append(aa)
    all_rows.append(row)
df = pd.DataFrame(data=all_rows,columns=headings)
df.head()

df = pd.DataFrame(data=all_rows,columns=headings)
df.to_csv('C:/Users//AdobeAir-APSB16-23 Security Update Available for Adobe AIR.csv')
df.head()

代码的输出是

Bulletin ID Date Published  Priority
0   APSB21-13   February 09 2021    3

对于这段代码，我导入了 Beautifulsoup、requests、pandas 和 re 等库。希望任何人都可以帮助我如何一次提取表中的数据并可以转换为csv格式。谢谢。

【问题讨论】：

标签： python pandas dataframe beautifulsoup

【解决方案1】：

您可以通过read_html 让pandas 为您完成繁重的工作：

url = 'https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html'
dfs = pd.read_html(url, header=0)
dfs[1]

输出：

             Product  Affected Versions           Platform
0  Adobe Dreamweaver               20.2  Windows and macOS
1  Adobe Dreamweaver               21.0  Windows and macOS

附：它输出在 HTML 中找到的所有表的列表。例如，dfs[0] 是第一个表：

  Bulletin ID     Date Published  Priority
0   APSB21-13  February 09, 2021         3

【讨论】：

感谢您的回复。好的！代码比以前更简单。我想从一个 csv 文件中的表中提取数据。因此，我添加了更多代码，以便查看表格中的整体数据。 pd.concat([df[0], df[1], df[2], df[3]], ignore_index=True).to_csv('C:/Users/MY-PC/test.csv')
输出Bulletin ID Date Published Priority Product Affected Versions Platform Updated Version Priority rating Vulnerability Category Vulnerability Impact Severity CVE Numbers 0 APSB21-13 February 09, 2021 3 1 Adobe Dreamweaver 20.2 Windows and macOS 2 Adobe Dreamweaver 21 Windows and macOS 3 Adobe Dreamweaver Windowsâ€¯and macOS 20.2.1 3 4 Adobe Dreamweaver Windowsâ€¯and macOS 21.1 3 5 Uncontrolled Search Path Element Information disclosure Important CVE-2021-21055希望对其他人也有用。