【发布时间】:2021-07-05 10:51:55
【问题描述】:
我有很多 HTML 表格,我正在尝试将其转换为 json 格式,但我的代码仅适用于第一个水平表格(第一个图像)而不适用于第二个垂直表格(第二个图像)...
我已在此处附上我的代码和示例表
到目前为止我尝试过的代码
html_data=Path("Table2.html").read_text()
table_data = [[cell.text for cell in row("td")]
for row in BeautifulSoup(html_data,features="lxml")("tr")]
json_data=[]
for list1 in table_data:
list1 = [i.replace('\n', '') for i in list1]
dict1 = dict(itertools.zip_longest(*[iter(list1)] * 2, fillvalue=""))
json_data.append(dict1)
print(json_data)
上述 HTML 表格的输出:
[{'Address': '41 B Market street'}, {'City': 'Gujarat'}, {'Postal/Zip Code': '123456'}, {'Product Details': ''}, {'Pallet Dimension': '10" x 10" x 10"'}, {'Total Weight': '1375 LBS'}]
[{'Pickup Location': 'Description', '': ''}, {'Some Address': 'Rubics cube', '': ''}, {}, {'PLTS': 'total weight', 'L': 'W', 'H': ''}, {'1': '20', '40': ''}, {'2': '60', '40': ''}]
表 2 的 HTML 代码
<table>
<tbody>
<tr style="height:15.0pt">
<td colspan="2" style="width:130.9pt; border-top:solid windowtext 1.0pt; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid black 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:15.0pt" width="175">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">Pickup Location</span></b></p>
</td>
<td colspan="3" style="width:130.1pt; border-top:solid windowtext 1.0pt; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid black 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:15.0pt" width="173">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">Description</span></b></p>
</td>
<td style="width:1.5pt; padding:0in 0in 0in 0in; height:15.0pt" width="2">
<p class="MsoNormal"></p>
</td>
<td style="width:.3pt; padding:0in 0in 0in 0in; height:15.0pt" width="0"></td>
</tr>
<tr style="height:13.15pt">
<td colspan="2" rowspan="2" style="width:130.9pt; border-top:none; border-left:none; border-bottom:solid black 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="175">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">Some Address</span></b></p>
</td>
<td colspan="3" rowspan="2" style="width:130.1pt; border-top:none; border-left:none; border-bottom:solid black 1.0pt; border-right:solid black 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="173">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">Rubics cube</span></b></p>
</td><td style="width:1.5pt; padding:0in 0in 0in 0in; height:13.15pt" width="2">
<p class="MsoNormal"></p>
</td>
<td style="width:.3pt; padding:0in 0in 0in 0in; height:13.15pt" width="0"></td>
</tr>
<tr style="height:15.75pt">
</tr>
<tr style="height:.3in">
<td style="width:42.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; background:#D9E1F2; padding:0in 5.4pt 0in 5.4pt; height:.3in" width="56" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">PLTS</span></b><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif"></span></b></p>
</td>
<td style="width:88.75pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; background:#D9E1F2; padding:0in 5.4pt 0in 5.4pt; height:.3in" width="118" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">total weight</span></b><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif"></span></b></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; background:#D9E1F2; padding:0in 5.4pt 0in 5.4pt; height:.3in" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">L</span></b><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif"></span></b></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; background:#D9E1F2; padding:0in 5.4pt 0in 5.4pt; height:.3in" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">W</span></b><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif"></span></b></p>
</td>
<td style="width:17.0pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; background:#D9E1F2; padding:0in 5.4pt 0in 5.4pt; height:.3in" width="23" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif; color:black">H</span></b><b><span style="font-size:10.0pt; font-family:"Arial",sans-serif"></span></b></p>
</td>
</tr>
<tr style="height:13.9pt">
<td style="width:42.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.9pt" width="56" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif; color:black">1</span></p>
</td>
<td style="width:88.75pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.9pt" width="118" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">20</span></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.9pt" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.9pt" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
<td style="width:17.0pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.9pt" width="23" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
<td style="width:.3pt; padding:0in 0in 0in 0in; height:13.9pt" width="0"></td>
</tr>
<tr style="height:13.15pt">
<td style="width:42.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="56" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif; color:black">2</span></p>
</td>
<td style="width:88.75pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="118" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">60</span></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
<td style="width:20.15pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="27" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
<td style="width:17.0pt; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt; padding:0in 5.4pt 0in 5.4pt; height:13.15pt" width="23" valign="bottom" nowrap="nowrap">
<p class="MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; font-family:"Arial",sans-serif">40</span></p>
</td>
</tr>
</tbody>
</table>
如果表是水平表(表 1),那么旧的输出就足够了
[{'Address': '41 B Market street'}, {'City': 'Gujarat'}, {'Postal/Zip Code': '123456'}, {'Product Details': ''}, {'Pallet Dimension': '10" x 10" x 10"'}, {'Total Weight': '1375 LBS'}]
如果表格是垂直表格(表 2),则输出应如下所示:
[{'Pickup address': 'some address'}, {'Description': 'Rubicks cube'}, {'PLTS': ['1','2']}, {'Total weight': ['20','60']}, {'L':['40','40']}, {'W':['40','40']},{'H':['40','40']}]
我已尝试更改代码,但对我不起作用 有什么建议吗???
【问题讨论】:
-
您希望第二个表的输出结构如何?你能把它包括在你的问题中吗?
-
@HenryEcker 在问题中添加了这一点
-
{'PLTS': '1','2'}在 python 中不是有效的dict。你想要一个字符串'1,2'还是你想要一个list['1', '2']? -
@HenryEcker 列表应该没问题
标签: python html json beautifulsoup