【问题标题】:Getting the content of a table on the website with Selenium and Python使用 Selenium 和 Python 获取网站上表格的内容
【发布时间】:2022-01-04 19:08:53
【问题描述】:

当我转到代码中的网址时,我没有从“同义词”部分获得内容。它进行选择,但将其作为列表,不输出文本内容。

synonyms= []
driver= webdriver.Chrome()
url = "https://pubchem.ncbi.nlm.nih.gov/compound/71308229"
driver.get(url)
synonym = driver.find_elements_by_class_name("overflow-x-auto")
synonyms.append(synonym)
driver.close()

【问题讨论】:

    标签: python selenium xpath webdriverwait pubchem


    【解决方案1】:
    1. 您错过了等待/延迟。
    2. 您必须从网络元素中提取文本
    3. 您似乎使用了错误的定位器

    我想这会给你你正在寻找的东西:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import time
    
    synonyms= []
    driver= webdriver.Chrome()
    url = "https://pubchem.ncbi.nlm.nih.gov/compound/71308229"
    driver.get(url)
    wait = WebDriverWait(driver, 20)
    wait.until(EC.visibility_of_element_located((By.XPATH, "//div[@class='overflow-x-auto']//p")))
    time.sleep(0.1)
    elements = driver.find_elements_by_xpath("//div[@class='overflow-x-auto']//p")
    for el in elements:
        synonyms.append(el.text)
    driver.close()
    

    【讨论】:

      【解决方案2】:

      您需要明确获取元素的文本

      synonyms= []
      driver= webdriver.Chrome()
      url = "https://pubchem.ncbi.nlm.nih.gov/compound/71308229"
      driver.get(url)
      synonym = driver.find_elements_by_class_name("overflow-x-auto")
      synonyms.append([s.text for s in synonym])
      print(synonyms)
      driver.close()
      

      输出

      [['Lanthanum boride\n12008-21-8\nLanthanum hexaboride\nMFCD00151350\nB6La\nMore...', 'Lanthanum boride\n12008-21-8\nLanthanum hexaboride\nMFCD00151350\nB6La\nLanthanum Hexaboride Nanoparticles\nLanthanum boride, 99.5% (REO)\nIron Boride (FeB) Sputtering Targets\nFT-0693450\nLanthanum hexaboride, powder, 10 mum, 99%\nY1387\nLanthanum hexaboride LaB6 GRADE A (H?gan?s)\nLanthanum hexaboride, powder, -325 mesh, 99.5% metals basis\nLanthanum boride, powder, -325 mesh, 99.5% trace metals basis\nLine position and line shape standard for powder diffraction, NIST SRM 660c, Lanthanum hexaboride powder', 'Property Name Property Value Reference\nMolecular Weight 203.8 Computed by PubChem 2.1 (PubChem release 2021.05.07)\nHydrogen Bond Donor Count 0 Computed by Cactvs 3.4.8.18 (PubChem release 2021.05.07)\nHydrogen Bond Acceptor Count 2 Computed by Cactvs 3.4.8.18 (PubChem release 2021.05.07)\nRotatable Bond Count 0 Computed by Cactvs 3.4.8.18 (PubChem release 2021.05.07)\nExact Mass 203.965826 Computed by PubChem 2.1 (PubChem release 2021.05.07)\nMonoisotopic Mass 204.962194 Computed by PubChem 2.1 (PubChem release 2021.05.07)\nTopological Polar Surface Area 0 Ų Computed by Cactvs 3.4.8.18 (PubChem release 2021.05.07)\nHeavy Atom Count 7 Computed by PubChem\nFormal Charge -2 Computed by PubChem\nComplexity 132 Computed by Cactvs 3.4.8.18 (PubChem release 2021.05.07)\nIsotope Atom Count 0 Computed by PubChem\nDefined Atom Stereocenter Count 0 Computed by PubChem\nUndefined Atom Stereocenter Count 0 Computed by PubChem\nDefined Bond Stereocenter Count 0 Computed by PubChem\nUndefined Bond Stereocenter Count 0 Computed by PubChem\nCovalently-Bonded Unit Count 2 Computed by PubChem\nCompound Is Canonicalized Yes Computed by PubChem (release 2021.05.07)', 'Mixtures, Components, and Neutralized Forms 2 Records\nSimilar Compounds 2 Records', 'Same 25 Records']]
      

      【讨论】:

        【解决方案3】:

        要从 Synonyms 表中提取内容,您必须将WebDriverWait 诱导为visibility_of_all_elements_located(),您可以使用以下Locator Strategy

        • 使用XPATH

          driver.get("https://pubchem.ncbi.nlm.nih.gov/compound/71308229")
          print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//th[text()='Synonyms']//following::td[1]//p")))])
          
        • 控制台输出:

          ['Lanthanum boride', '12008-21-8', 'Lanthanum hexaboride', 'MFCD00151350', 'B6La']
          
        • 注意:您必须添加以下导入:

          from selenium.webdriver.support.ui import WebDriverWait
          from selenium.webdriver.common.by import By
          from selenium.webdriver.support import expected_conditions as EC
          

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-07-17
          • 2021-01-06
          • 1970-01-01
          • 1970-01-01
          • 2021-07-10
          • 2018-05-31
          • 2013-06-06
          • 2020-10-03
          相关资源
          最近更新 更多