【发布时间】:2021-08-01 05:33:47
【问题描述】:
我在 Python 中使用 bs4,我想从 python 中的列表中获取内容并使用 bs4 将其输入到 html 代码中,以便可以使用 requests.put() 方法将 html 表发布到网站链接上。 html代码是这样的,每一行都包含标签:
<tr></tr>
每个单元格,即每一列中对应于一行的一个数据元素由标记表示:
<td></td>
所以每个数据元素都会进入 td 标签内,包围我的 p 标签,例如:
<tr><td><p>data 1 in cell 1</p></td><td><p>data 2 in cell 2</p></td></tr>
应该进入html表格的数据是列表的形式,看起来像:
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
因此,在列表中,每个元素对应于一行,并且每个单元格都按照“````”进行拆分,因此 1 进入第一个单元格,Jam 进入第一行的第 3 个单元格。 html 表格字符串前面应有表格标题,并应以表格页脚结束,如下所示:
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
因此,构成用于创建表格的数据的整个 html 代码应如下所示:
<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr><tr><td><p>1</p></td><td><p>Mon, 22 Feb 2021 13:44:27 -0800</p></td><td><p>Jam</p></td><td><p>IAP-5998</p></td><td><p>10004</p></td><td><p>Model Observing a ModelIPCException</p></td><td><p>1ba4416fdd7</p></td></tr><tr><td><p>2</p></td><td><p>Mon, 30 Feb 2021 13:44:27 -0800</p></td><td><p>Rizwan</p></td><td><p>IAP-6998</p></td><td><p>10014</p></td><td><p>Model Observing</p></td><td><p>1ba4416fdd7</p></td></tr>....................................Other elements in list according to rows go here.............</tbody></table><p class=\"auto-cursor-target\"><br /></p>
这是我使用的代码:
import re
import sys
import requests
import json
from requests.auth import HTTPBasicAuth
from bs4 import BeautifulSoup
html_table_header = "<p><br /></p><table><colgroup><col style=\"width: 115.0px;\" /><col style=\"width: 95.0px;\" /><col style=\"width: 58.0px;\" /><col style=\"width: 105.0px;\" /><col style=\"width: 110.0px;\" /><col style=\"width: 215.0px;\" /><col style=\"width: 215.0px;\" /></colgroup><tbody><tr><th><p>No.</p></th><th><p>Date and Time</p></th><th><p>Author</p></th><th><p>Jira</p></th><th><p>PR</p></th><th><p>Title</p></th><th><p>Commit ID</p></th></tr>"
html_table_footer = "</tbody></table><p class=\"auto-cursor-target\"><br /></p>"
rows = ["1" + "````" + "Mon, 22 Feb 2021 13:44:27 -0800" + "````" + "Jam" + "````" + "IAP-5998" + "````" + "10004" + "````" + "Model Observing a ModelIPCException" + "````" + "1ba4416fdd7", "2" + "````" + "Mon, 30 Feb 2021 13:44:27 -0800" + "````" + "Rizwan" + "````" + "IAP-6998" + "````" + "10014" + "````" + "Model Observing." + "````" + "3ba4416fdd7", "3" + "````" + "Fri, 20 Mar 2021 13:44:27 -0800" + "````" + "John" + "````" + "ATL-5998" + "````" + "10456" + "````" + "Exception during JumpToROM function call." + "````" + "8ca4416fdd7", "4" + "````" + "Mon, 14 Feb 2021 13:44:27 -0800" + "````" + "Brock Lesnar" + "````" + "IAP-6005" + "````" + "10009" + "````" + "RAM flushing JumpToROM function call." + "````" + "1ba4416fd10"]
row_string = ""
for idx in range(0, len(rows)):
soup = BeautifulSoup("<tr></tr>", 'html.parser')
for cell_id in range(0, 7):
original_tag = soup.tr
new_tag = soup.new_tag("td")
original_tag.append(new_tag)
p_tag = soup.new_tag("p")
original_tag.td.next_sibling.append(p_tag)
original_tag.p.string = rows[idx].split("````")[cell_id]
row_string += str(original_tag)
pass_str = html_table_header + row_string + html_table_footer
pass_string = str(pass_str).replace('\"', '\\"')
headers = {
'Content-Type': 'application/json',
}
data = '{"id":"534756378","type":"page", "title":"GL_Engine Output","space":{"key":"CSSAI"},"body":{"storage":{"value":"' + pass_string + '","representation":"storage"}}, "version":{"number":2}}'
response = requests.put('https://confluence.ai.com/rest/api/content/534756378', headers=headers, data=data,
auth=HTTPBasicAuth('svc-Automation@ai.com', 'AIengineering1@ai'))
但在我的代码中,只有列表中的第一个元素,即数字 1、2、3 等进入正确的单元格,但其他元素仍被插入第一列,因此表格在获取时看起来不正确发布到网站上,因为只有表格的标题是正确的,但其他元素都在第一列本身中被压缩在一起。 我查看了发布到我的网站上的 rest/api html 代码,它看起来不正确,如下图所示:
【问题讨论】:
标签: python html beautifulsoup html-table python-requests