【问题标题】:How to scrape specfic tr or td from a table python如何从表 python 中抓取特定的 tr 或 td
【发布时间】:2020-01-03 16:50:16
【问题描述】:

我想抓取该网站的名字和姓氏,以便在自动浏览器输入中使用它。

from lxml import html
import requests

page = requests.get('https://www.getnewidentity.com/uk-identity-generator.php')
tree = html.fromstring(page.content)


firstname = tree.xpath('//*[@id="reslist"]/tbody/tr[3]/td[2]/text()')

lastname = tree.xpath('//*[@id="reslist"]/tbody/tr[4]/td[2]/text()')

print ('FirstName: ', firstname)
print ('LastName: ', lastname)

input("close")

网站是这个https://www.getnewidentity.com/uk-identity-generator.php

<table class="table table-bordered table-striped" id="reslist"><thead><tr><th colspan="2" class="bg-primary">General Information</th></tr></thead><tbody><tr><td style="width:150px;">Name</td><td><b>Kamila Harmon</b></td></tr>
<tr><td>Gender</td><td>Female</td></tr>
<tr><td>First Name</td><td>Kamila</td></tr>
<tr><td>Last Name</td><td>Harmon</td></tr>
<tr><td>Birthday</td><td>12/26/1989</td></tr>

【问题讨论】:

    标签: python beautifulsoup screen-scraping


    【解决方案1】:
    • find_all()-返回元素的集合。
    • strip()- Python 的内置函数用于从字符串中删除所有前导和尾随空格。

    例如

    from bs4 import BeautifulSoup
    import requests
    
    request = requests.post('https://www.getnewidentity.com/data/uk-identity-generator.php'
                            ,data={"num":"undefine","add":"address","unique":"true"})
    
    soup = BeautifulSoup(request.content,'lxml')
    td = soup.find_all("td")
    data = {}
    for x in range(0,len(td)-1,2):
        data[td[x].text.strip()] = td[x+1].text.strip()
    
    print(data)
    

    O/P:

    {'Name': 'Jayda Key', 'Gender': 'Female', 'First Name': 'Jayda', 'Last Name': 'Key', 
    'Birthday': '55', 'NINO': 'EB 29 38 84 B', 'Address': 'Flat 31l\nMartin Walk, Leoberg, S81
     0HT', 'Street Address': 'Flat 31l\nMartin Walk', 'State': 'Leoberg', 'Zip Code': 'S81 0HT',
    'Phone': '+44(0)9487 957056', 'Credit Card Type': 'MasterCard', 'Credit Card Number': 
    '5246585772859818', 'CVV': '899', 'Expires': '02/2022', 'Username': 'twinhero', 'Email': 
    'Gamestomper@gmail.com', 'Password': 'Go7ByznZ', 'User Agent': 'Mozilla/5.0 (Macintosh; 
    Intel Mac OS X 10_11_6) AppleWebKit/601.7.7 (KHTML, like Gecko) Version/9.1.2 
    Safari/601.7.7', 'Height': '1.85m (6.17ft)', 'Weight': '75.22kg (158.31pounds)', 
    'Blood type': 'O−'}
    

    【讨论】:

      【解决方案2】:

      你说你想要名字和姓氏;使用 bs4 4.7.1+,您可以使用 :contains 来适当定位。正如其他答案中已经详述的那样,内容是从帖子 xhr 动态检索的

      from bs4 import BeautifulSoup as bs
      import requests
      
      r = requests.post('https://www.getnewidentity.com/data/uk-identity-generator.php',data={"num":"undefine","add":"address","unique":"true"})
      soup = bs(r.content,'lxml')
      first_name = soup.select_one('td:contains("First Name") + td').text
      last_name = soup.select_one('td:contains("Last Name") + td').text
      full_name = soup.select_one('td:contains("Name") + td').text
      print(first_name, last_name, full_name)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-07-27
        • 1970-01-01
        • 1970-01-01
        • 2016-02-17
        • 2014-06-29
        • 1970-01-01
        • 2021-09-25
        • 1970-01-01
        相关资源
        最近更新 更多