【问题标题】:how to scrape data from table如何从表中抓取数据
【发布时间】:2021-11-28 13:57:33
【问题描述】:

我正在尝试从表中抓取数据,但它们会为我提供空列表

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
tabledata = driver.find_elements_by_xpath("//tbody/tr")
print(tabledata)

【问题讨论】:

    标签: python selenium web-scraping


    【解决方案1】:

    由于 <table> 元素位于 <iframe> 内,因此您必须:

    • 诱导WebDriverWait 使所需的帧可用并切换到它

    • WebDriverWait 诱导为visibility_of_element_located() 所需的_element,您可以使用以下任一Locator Strategies

      • 使用CSS-SELECTOR

        driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
        WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Inline Frame Example']")))
        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//tbody"))).text)
        
      • 使用XPATH

        driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
        WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#sites tbody"))).text)
        
    • 注意:您必须添加以下导入:

       from selenium.webdriver.support.ui import WebDriverWait
       from selenium.webdriver.common.by import By
       from selenium.webdriver.support import expected_conditions as EC
      
    • 控制台输出:

      FAM-1293 AmTech Ingredients Albert Lea UNITED STATES Valid 2020-10-08 2023-10-07
      FAM-0841 3F FEED & FOOD S L Vizcolozano SPAIN Valid 2020-04-17 2023-04-16
      FAM-1361 5N Plus Additives GmbH Eisenhüttenstadt GERMANY Valid 2020-10-01 2023-09-30
      FAM-1301-01 A & V Corp. Limited Xiamen CHINA Valid 2020-09-09 2023-09-08
      FAM-1146 A. + E. Fischer-Chemie GmbH & Co. KG Wiesbaden GERMANY Valid 2020-06-05 2023-06-04
      FAM-1589 A.M FOOD CHEMICAL CO LIMITED Jinan CHINA Valid 2020-01-07 2023-01-06
      FAM-0613-01 A.W.P. S.r.l Crevalcore ITALY Valid 2020-02-27 2023-02-07
      FAM-0867 AB AGRI POLSKA Sp. z o.o. Smigiel POLAND Valid 2020-08-03 2023-03-19
      FAM-1510-02 AB Vista Marlborough UNITED KINGDOM Valid 2020-04-16 2023-04-15
      FAM-1510-01 AB Vista * Rotterdam NETHERLANDS Valid 2020-04-16 2023-04-15
      

    参考

    您可以在以下位置找到一些相关讨论:

    【讨论】:

    【解决方案2】:
    1. 您尝试访问的表格元素位于 iframe 中。您必须先切换到该 iframe 才能访问这些元素。
    2. 您应该提取表格文本以打印它们。
      试试这个:
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.select import Select
    from selenium import webdriver
    driver= webdriver.Chrome('C:\Program Files (x86)\chromedriver.exe')
    wait = WebDriverWait(driver, 20)
    driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
    wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#inlineFrameExample")))
    table = wait.until(EC.visibility_of_element_located((By.XPATH, "//tbody/tr")))
    print(table.text)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-09
      • 2022-10-03
      • 2022-01-10
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多