【发布时间】:2021-02-18 11:33:35
【问题描述】:
我一直坚持使用 beautifulsoup 从 html 代码中获取信息。我通过执行以下步骤提取了下面的 HTML 片段:
result = requests.get(url, headers = headers)
soup = BeautifulSoup(result.text, 'lxml')
tably = soup.find("table", id="table4")
last_row = tably.findAll('tr')[-1]
现在,我想获得以下输出:
Classification: Mass murderer
Characteristics: Militant Al-Takfir wa al-Hijran (Renunciation and Exile) faction
Number of victims: 23
示例 HTML:
<tr>
<td style="font-size: 8pt; color: #000000" width="100%">
<style color="#000000" face="Verdana">
Classification: <b>Mass murderer</b></font></td>
</tr>
<tr>
<td width="100%" style="font-size: 8pt; color: #000000">
<style="font-size: 8pt" color="#000000" face="Verdana">
Characteristics: <b>Militant Al-Takfir wa
al-Hijran </b>(Renunciation and Exile)<b> faction</b></font></td>
</tr>
<tr>
<td width="100%" style="font-size: 8pt; color: #000000">
<style="font-size: 8pt" color="#000000" face="Verdana">
Number of victims: <b>23</b></font></td>
</tr>
</font>
【问题讨论】:
标签: web-scraping beautifulsoup tags