【问题标题】:How to extract multiple table from HTML in Python如何在 Python 中从 HTML 中提取多个表格
【发布时间】:2021-06-15 01:32:29
【问题描述】:

我想从html中提取安全公告表的所有数据https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html。根据我的代码,我只能将表中的数据一一提取出来。该代码无法从表中提取整体数据。

这是我的代码

soup = BeautifulSoup(html_content, "lxml")
print(soup.prettify())
gdp = soup.find_all("table")

table = gdp[0]
body = table.find_all("tr")
head = body[0]
body_rows = body[1:] 

headings = []
for item in head.find_all("td"): 
    item = (item.text).rstrip("\n")
    headings.append(item)

all_rows = [] # will be a list for list for all rows
for row_num in range(len(body_rows)): # A row at a time
    row = [] # this will old entries for one row
    for row_item in body_rows[row_num].find_all("td"): 
        aa = re.sub("(\xa0)|(\n)|,","",row_item.text)
        row.append(aa)
    all_rows.append(row)
df = pd.DataFrame(data=all_rows,columns=headings)
df.head()

df = pd.DataFrame(data=all_rows,columns=headings)
df.to_csv('C:/Users//AdobeAir-APSB16-23 Security Update Available for Adobe AIR.csv')
df.head()

代码的输出是

Bulletin ID Date Published  Priority
0   APSB21-13   February 09 2021    3

对于这段代码,我导入了 Beautifulsoup、requests、pandas 和 re 等库。希望任何人都可以帮助我如何一次提取表中的数据并可以转换为csv格式。谢谢。

【问题讨论】:

    标签: python pandas dataframe beautifulsoup


    【解决方案1】:

    您可以通过read_htmlpandas 为您完成繁重的工作:

    url = 'https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html'
    dfs = pd.read_html(url, header=0)
    dfs[1]
    

    输出:

                 Product  Affected Versions           Platform
    0  Adobe Dreamweaver               20.2  Windows and macOS
    1  Adobe Dreamweaver               21.0  Windows and macOS
    

    附:它输出在 HTML 中找到的所有表的列表。例如,dfs[0] 是第一个表:

      Bulletin ID     Date Published  Priority
    0   APSB21-13  February 09, 2021         3
    

    【讨论】:

    • 感谢您的回复。好的!代码比以前更简单。我想从一个 csv 文件中的表中提取数据。因此,我添加了更多代码,以便查看表格中的整体数据。 pd.concat([df[0], df[1], df[2], df[3]], ignore_index=True).to_csv('C:/Users/MY-PC/test.csv')
    • 输出Bulletin ID Date Published Priority Product Affected Versions Platform Updated Version Priority rating Vulnerability Category Vulnerability Impact Severity CVE Numbers 0 APSB21-13 February 09, 2021 3 1 Adobe Dreamweaver 20.2 Windows and macOS 2 Adobe Dreamweaver 21 Windows and macOS 3 Adobe Dreamweaver Windows and macOS 20.2.1 3 4 Adobe Dreamweaver Windows and macOS 21.1 3 5 Uncontrolled Search Path Element Information disclosure Important CVE-2021-21055希望对其他人也有用。
    猜你喜欢
    • 2020-04-17
    • 2021-03-22
    • 1970-01-01
    • 2016-12-12
    • 1970-01-01
    • 2017-08-27
    • 2014-07-21
    • 1970-01-01
    • 2018-06-11
    相关资源
    最近更新 更多