【问题标题】:How to extract the text of value attribute indside a tag using selenium如何使用硒提取标签内的值属性文本
【发布时间】:2017-09-26 15:01:19
【问题描述】:

所以我想提取让我们在下面的 html 代码中说 value="THE TEXT IWANNA EXTRACT ;0"。我想提取“td class="regu" 的值属性内的所有字符串。但我似乎找不到提取它的方法。我已经提取了 ppl 的名称,但我无法提取值属性内的字符串。任何帮助都很大感谢。谢谢。我已经卡了大约 24 小时。只要我能提取它,我就可以使用其他库。

<table class="dbtable" border="0" width="100%">
                           <tbody><tr>
                             <td class="tableheader" align="center" width="1%"><b>#</b></td>
                                       <td class="tableheader" align="center" width="60%"><b>User Name</b></td>
                                       <td class="tableheader" align="center"><b>User Type</b></td>
                           </tr><tr bgcolor="#ffffff">
                         <td class="regu"><input name="chkStud" value="THE TEXT IWANNA EXTRACT ;0" type="checkbox"></td>
                         <td class="regu">NAME OF STUDENT HERE   </td>
                         <td class="regu">&nbsp;Student</td>
                       </tr><tr bgcolor="#ffffff">
                         <td class="regu"><input name="chkStud" value="PLEASE EXTRACT ME HERE, IM DYING TO GET OUT;0" type="checkbox"></td>
                         <td class="regu">FOO BAR FOO BAR</td>
                         <td class="regu">&nbsp;Student</td>
                         </tbody></table>

这是python代码

#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys


from bs4 import BeautifulSoup
import logging
driver = webdriver.Firefox()
driver.get("http://somewebsite/iwannascrape/login.php") #page requires a login T_T
assert "Student" in driver.title
elem = driver.find_element_by_name("txtUser")
elem.clear()
elem.send_keys("YOU_SIR_NAME") #login creds. please dont mind :D 
elem2 = driver.find_element_by_name("txtPwd")
elem2.clear()
elem2.send_keys("PASSSWORDHERE")  
elem2.send_keys(Keys.RETURN)
driver.get("http://somewebsite/iwannascrape/afterlogin/illhere")






# using this to extract only the table with class='dbtable' so its easier to manipulate :)
table_clas = driver.find_element_by_xpath("//*[@class='dbtable']")



source_code = table_clas.get_attribute("outerHTML") #this prints out     the     table and its children.
print source_code



for i in range (10): #  spacing for readability
    print "\n"



print table_clas.text #this prints out the names.

【问题讨论】:

    标签: python html selenium web-scraping


    【解决方案1】:

    找到所需元素后,使用get_attribute() 方法:

    elm = driver.find_element_by_css_selector("#dbtable input[name=chkStud]")
    print(elm.get_attribute("value"))
    

    【讨论】:

      【解决方案2】:
      table_clas = driver.find_element_by_xpath("//*[@class='dbtable']")
       #select the desired element to thin down the html
      
      td = table_clas.find_elements_by_xpath("//*[@name='chkStud']")
      
      #finally hunt down the element you want specifally.
      #find_elements or find_element
      #should you use find_elements, then it returns a list you can iterate it
      # like
      
      for things in td:
          print things.get_attribute("value") 
      

      打印出来:

      IWANNA 摘录

      请在这里提取我,我很想离开;0

      【讨论】:

        猜你喜欢
        • 2017-03-09
        • 2020-02-08
        • 1970-01-01
        • 2021-03-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多