【问题标题】:web scraping data from glassdoor using selenium使用 selenium 从 glassdoor 抓取数据
【发布时间】:2021-10-01 17:17:51
【问题描述】:

请帮助我运行此代码 (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py) 为了从 Glassdoor 抓取工作机会数据
这是sn-p的代码:

from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd

 options = webdriver.ChromeOptions()
    
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
    
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
    
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)

#Let the page load. Change this number based on your internet speed.
        #Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)

        #Test for the "Sign Up" prompt and get rid of it.
try:
    driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
    pass
time.sleep(.1)
try:
    driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
    print(' x out worked')
except NoSuchElementException:
    print(' x out failed')
    pass

        
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")

我得到一个空列表

job_buttons
[]

【问题讨论】:

    标签: python-3.x selenium web-scraping css-selectors try-catch


    【解决方案1】:

    您的问题是 except 参数错误。
    使用driver.find_element_by_class_name("selected").click(),您正在尝试单击不存在的元素。该页面上没有与“选定”类名匹配的元素。这会导致 NoSuchElementException 异常,因为您在尝试捕获 ElementClickInterceptedException 异常时可以看到自己。
    要解决此问题,您应该使用正确的定位器或至少在 except 中使用正确的参数。
    像这样:

    try:
        driver.find_element_by_class_name("selected").click()
    except NoSuchElementException:
        pass
    

    甚至

    try:
        driver.find_element_by_class_name("selected").click()
    except:
        pass
    

    我不确定你想在job_buttons 中加入哪些元素。
    可以通过以下方式找到包含每个工作的所有详细信息的搜索结果:

    job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")
    

    【讨论】:

    • 我将 except 参数更改为 NoSuchElementException,现在我得到一个空列表
    • 好的,这样更好。你想在job_buttons 中加入哪些元素?我在该页面上没有看到任何匹配 find_elements_by_class_name("jl") 的内容。
    猜你喜欢
    • 2019-01-20
    • 1970-01-01
    • 1970-01-01
    • 2021-03-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多