【发布时间】:2021-06-15 01:32:29
【问题描述】:
我想从html中提取安全公告表的所有数据https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html。根据我的代码,我只能将表中的数据一一提取出来。该代码无法从表中提取整体数据。
这是我的代码
soup = BeautifulSoup(html_content, "lxml")
print(soup.prettify())
gdp = soup.find_all("table")
table = gdp[0]
body = table.find_all("tr")
head = body[0]
body_rows = body[1:]
headings = []
for item in head.find_all("td"):
item = (item.text).rstrip("\n")
headings.append(item)
all_rows = [] # will be a list for list for all rows
for row_num in range(len(body_rows)): # A row at a time
row = [] # this will old entries for one row
for row_item in body_rows[row_num].find_all("td"):
aa = re.sub("(\xa0)|(\n)|,","",row_item.text)
row.append(aa)
all_rows.append(row)
df = pd.DataFrame(data=all_rows,columns=headings)
df.head()
df = pd.DataFrame(data=all_rows,columns=headings)
df.to_csv('C:/Users//AdobeAir-APSB16-23 Security Update Available for Adobe AIR.csv')
df.head()
代码的输出是
Bulletin ID Date Published Priority
0 APSB21-13 February 09 2021 3
对于这段代码,我导入了 Beautifulsoup、requests、pandas 和 re 等库。希望任何人都可以帮助我如何一次提取表中的数据并可以转换为csv格式。谢谢。
【问题讨论】:
标签: python pandas dataframe beautifulsoup