【发布时间】:2017-04-03 03:33:35
【问题描述】:
这是参考 - Iterate over python dictionary to retrieve only required rows
我的 HTML 正在被外部应用程序格式化,如下所示 - 当我使用以下代码处理此 HTML 输入时
from xml.etree import ElementTree as ET
s = """<table class="darshan" style="width: 290px;">
<thead>
<tr>
<th style="background-color: #efefef; width: 55px;">Release</th>
<th style="background-color: #efefef; width: 63px;">REFDB</th>
<th style="background-color: #efefef; width: 151px;">URL</th>
</tr>
</thead>
<tbody>
<tr>
<td style="width: 55px;">3.7.3</td>
<td style="width: 63px;">
<p>12345</p>
<p>232323</p>
<p>4343454</p>
<p>5454554</p>
</td>
<td style="width: 151px;">
<p><a class="jive-link-external-small" href="http://google.com" rel="nofollow">http://google.com</a>
</p>
<p><a class="jive-link-external-small" href="http://test12213.com" rel="nofollow">http://test12213.com</a>
</p>
</td>
</tr>
<tr>
<td style="width: 55px;">3.7.4</td>
<td style="width: 63px;">
<p>456789</p>
<p>54545</p>
<p>5454545</p>
<p>545454</p>
</td>
<td style="width: 151px;"><a class="jive-link-external-small" href="http://foo.com" rel="nofollow">http://foo.com</a>
</td>
</tr>
</tbody>
</table>
"""
def find_version(ver):
table = ET.XML(s)
rows = iter(table)
headers = [col.text for col in next(rows)]
for row in rows:
values = [col.text for col in row]
out = dict(zip(headers, values))
if out['Release'] == ver:
return out
return None
res = find_version('3.7.3')
if res:
for x in res.items():
print(' - '.join(x))
else:
print ('Version not found')
我得到以下输出:
trs: [<Element 'th' at 0x0431CDE0>, <Element 'th' at 0x0431CE40>, <Element 'th' at 0x0431CEA0>]
ths: []
tds: []
out: OrderedDict()
Traceback (most recent call last):
File "parse_html.py", line 141, in <module>
res = find_version(ver)
File "parse_html.py", line 136, in find_version
if out['Release'] == ver:
KeyError: 'Release'
【问题讨论】:
-
使用
print()查看变量中的内容。它有助于发现错误。 -
@furas - 我已根据您在原始问题中的要求更新了 HTML。
标签: python python-3.x dictionary xpath elementtree