【问题标题】:Extract specific text in tr beautifulsoup提取 tr beautifulsoup 中的特定文本
【发布时间】:2021-02-18 11:33:35
【问题描述】:

我一直坚持使用 beautifulsoup 从 html 代码中获取信息。我通过执行以下步骤提取了下面的 HTML 片段:

result = requests.get(url, headers = headers)
soup = BeautifulSoup(result.text, 'lxml')
tably = soup.find("table", id="table4")
last_row = tably.findAll('tr')[-1]
    

现在,我想获得以下输出:

Classification: Mass murderer
Characteristics: Militant Al-Takfir wa al-Hijran (Renunciation and Exile) faction
Number of victims: 23

示例 HTML:

    <tr>
    <td style="font-size: 8pt; color: #000000" width="100%">
    <style color="#000000" face="Verdana">
                  Classification: <b>Mass murderer</b></font></td>
                </tr>
                <tr>
                  <td width="100%" style="font-size: 8pt; color: #000000">
                                             
                  <style="font-size: 8pt" color="#000000" face="Verdana">
                  Characteristics:&nbsp;<b>Militant Al-Takfir wa
            al-Hijran </b>(Renunciation and Exile)<b> faction</b></font></td>
                </tr>
                <tr>
                  <td width="100%" style="font-size: 8pt; color: #000000">
                                             
                  <style="font-size: 8pt" color="#000000" face="Verdana">
                  Number of victims:&nbsp;<b>23</b></font></td>
                </tr>
                </font>

【问题讨论】:

    标签: web-scraping beautifulsoup tags


    【解决方案1】:

    你可能想试试这个:

    import requests
    from bs4 import BeautifulSoup
    from tabulate import tabulate
    
    
    headers = {
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36"
    }
    
    page = requests.get("https://murderpedia.org/male.A/a/abbas.htm", headers=headers).text
    table = BeautifulSoup(page, "html5lib").find("table", {"id": "table4"})
    
    output = [
                 " ".join(i.getText(strip=True).split()).split(":") for i
                 in table.find_all("td") if i.getText(strip=True)
             ][:9]
    
    print(tabulate(output))
    

    输出:

    -----------------  --------------------------------------------------------------
    Classification     Mass murderer
    Characteristics    Militant Al-Takfir wa al-Hijran(Renunciation and Exile)faction
    Number of victims  23
    Date of murders    December 8,2000
    Date of birth      1967
    Victims profile    Maleworshippers
    Method of murder   Shooting(Kalashnikov assault rifle)
    Location           Omdurman, Sudan
    Status             Shot to death by police
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-30
      • 2017-09-21
      • 1970-01-01
      • 2020-06-19
      • 1970-01-01
      • 2013-02-25
      • 1970-01-01
      • 2017-05-06
      相关资源
      最近更新 更多