【问题标题】:WebDriverWait fail to locate element even though the element is there即使元素存在,WebDriverWait 也无法定位元素
【发布时间】:2023-04-10 13:11:01
【问题描述】:

所以我想为一个想法制作一个刮板,我必须点击以下链接:https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Huilliche_people,但我想要从该网站获取的信息是动态加载的,所以我使用的是 WebDriverWait,但它给了我超时异常,即使元素在那里并且类不重复。

    TimeoutException                          Traceback (most recent call last)
<ipython-input-44-426ebd9560dd> in <module>
     37 driver.get('https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Huilliche_people')
     38 
---> 39 element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CLASS_NAME, "single-page-stats text-center")))
     40 
     41 content2 = element.get_attribute('innerHTML')

~\anaconda3\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
     78             if time.time() > end_time:
     79                 break
---> 80         raise TimeoutException(message, screen, stacktrace)
     81 
     82     def until_not(self, method, message=''):

TimeoutException: Message:

这是我的代码,最后一部分是乱七八糟的地方。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import *
import pandas as pd
import html.parser

driver = webdriver.Chrome(executable_path='C:\Python39\Scripts\chromedriver')

namesList=[] #List to store name of the product
descriptionList=[] #List to store price of the product
categoryList=[]
imagesList=[] #List to store rating of the product


driver.get('https://en.wikipedia.org/wiki/Huilliche_people')
content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')

name=soup.find('h1', attrs={'id':'firstHeading'})
image=soup.find('img', attrs={'class':'pi-image-thumbnail'})

for a in soup.findAll('div', attrs={'class':'mw-parser-output'}):
    description=''
    a = a.find('p')
    description+=a.get_text()
    break
    
print(description)
    
descriptionList.append(description)    
namesList.append(name.text)

driver.get('https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&range=latest-20&pages=Huilliche_people')

element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CLASS_NAME, "single-page-stats text-center")))

content2 = element.get_attribute('innerHTML')
soup2 = BeautifulSoup(content2, 'html.parser')

viewsTotal=soup2.find('span', attrs={'class':'text-muted'}).text
viewsPerDay=soup2.find('span', attrs={'class':'hidden-lg'}).text
print(viewsTotal)
print(viewsPerDay)

data = {'Name': namesList, 'Description': descriptionList, 'Total views': viewsTotal, 'Average views per pay': viewsPerDay}

df = pd.DataFrame(data=data)
df.to_excel("all.xlsx")
df.to_json("all.json")

driver.quit()

【问题讨论】:

    标签: python selenium selenium-webdriver beautifulsoup selenium-chromedriver


    【解决方案1】:

    “single-page-stats text-center”不是类名,它是两个类名,“single-page-stats”和“text-center”。您必须选择其中一个来使用类名,或者如果您想/需要同时使用两者,您可以使用 CSS 选择器“.single-page-stats.text-center”。 CSS 选择器中的. 表示类名。

    请参阅the W3C CSS selectors reference 了解更多信息。

    【讨论】:

    • 我抓取了页面的另一部分,该部分更易于访问且数据污染更少,但这几乎可以做到。
    猜你喜欢
    • 2014-12-03
    • 2018-10-18
    • 1970-01-01
    • 1970-01-01
    • 2016-11-05
    • 1970-01-01
    • 2019-08-01
    • 1970-01-01
    • 2017-12-03
    相关资源
    最近更新 更多