【问题标题】:How to scrape the Glassdoor Rating using Selenium and Python如何使用 Selenium 和 Python 抓取 Glassdoor 评级
【发布时间】:2019-01-20 04:08:32
【问题描述】:

我试图提取评级数字的代码。我的错误指数超出范围,我需要得到评级和子评级。

from selenium import webdriver
import pandas as pd
import time
import re

init_url = 'https://www.glassdoor.co.in/Reviews/DXC-Technology-Reviews- 
E1603125.htm'

driver = webdriver.Chrome()
driver.maximize_window()
driver.get(init_url)
time.sleep(5)

i=0
while(i< 11):
    rate1 = driver.find_elements_by_xpath("//*[@class='rating']")
    rate = driver.find_element_by_xpath("//input[@title='3.0']")[i]    
    print(rate.text)
    i+=1

【问题讨论】:

    标签: python selenium xpath css-selectors webdriver


    【解决方案1】:

    要提取评级数字,您可以使用以下任一解决方案:

    • xpath:

      rating = driver.find_element_by_xpath("//div[@class='ratingsSummary cf']//span[@class='bigRating strong margRtSm h2']").get_attribute("innerHTML")
      
    • css_selector:

      rating = driver.find_element_by_css_selector("div.ratingsSummary.cf span.bigRating.strong.margRtSm.h2").get_attribute("innerHTML")
      

    【讨论】:

    • 请检查新链接的子分级,因为它没有列出任何变量,只是 星值
    【解决方案2】:

    您应该改为阅读以下元素的文本:-

    <span class="bigRating strong margRtSm h1">3.3</span>
    

    如您所见,它包含您所需的评分。

    此外,由于您需要不同的评分,因此在循环中执行此操作的正确方法是计算可用评论的数量,因此您的代码只会运行那么多次。

    最终代码-

    from selenium import webdriver
    import time
    import re
    
    driver = webdriver.Chrome(executable_path=r'//path')
    init_url = 'https://www.glassdoor.co.in/Reviews/bangalore-hcl-technologies-reviews-SRCH_IL.0,9_IM1091_KE10,26.htm'
    
    driver.get(init_url)
    driver.maximize_window()
    time.sleep(5)
    i=1
    count = len(driver.find_elements_by_xpath("//span[@class='bigRating strong margRtSm h1']"))
    while(i<= count):
        rate = driver.find_element_by_xpath("(//span[@class='bigRating strong margRtSm h1'])[" + str(i) + "]")
        print(rate.text)
        i+=1
    

    编辑 - 是的,对于像this 这样的网址,您可以像这样提取评级-

    from selenium import webdriver
    import time
    import re
    
    driver = webdriver.Chrome(executable_path=r'//path')
    init_url = 'https://www.glassdoor.co.in/Reviews/DXC-Technology-Reviews-E1603125.htm'
    
    driver.get(init_url)
    driver.maximize_window()
    time.sleep(5)
    i=1
    count = len(driver.find_elements_by_xpath("//span[@class='rating']/span[@class='value-title']"))
    print count
    while(i<= count):
        rate = driver.find_element_by_xpath("(//span[@class='rating']/span[@class='value-title'])[" + str(i) + "]")
        print(rate.get_attribute("title"))
        i+=1
    

    评级存储在&lt;span&gt; 元素的title 属性中,因此我使用get_attribute("value") 提取。

    要提取子评分(如工作/生活平衡等),请使用以下解决方案 -

    count = len(driver.find_elements_by_xpath("//ul[@class='undecorated']//div[@class='minor']"))
    while(i<= count):
        sub_rating = driver.find_element_by_xpath("(//ul[@class='undecorated']//div[@class='minor'])["  + str(i) + "]/following-sibling::span")
        sub_rating_title = driver.find_element_by_xpath("(//ul[@class='undecorated']//div[@class='minor'])["  + str(i) + "]")
        print(sub_rating_title.get_attribute("innerHTML") , "-" , sub_rating.get_attribute("title"))
        i+=1
    

    控制台输出 -

    Work/Life Balance - 2.0
    Culture &amp; Values - 2.0
    Career Opportunities - 3.0
    Comp &amp; Benefits - 3.0
    Senior Management - 2.0
    Work/Life Balance - 5.0
    Culture &amp; Values - 3.0
    Career Opportunities - 4.0
    Comp &amp; Benefits - 2.0
    Senior Management - 2.0
    Work/Life Balance - 3.0
    Culture &amp; Values - 3.0
    Career Opportunities - 3.0
    Comp &amp; Benefits - 3.0
    Senior Management - 3.0
    Work/Life Balance - 5.0
    Culture &amp; Values - 5.0
    Career Opportunities - 5.0
    Comp &amp; Benefits - 2.0
    Senior Management - 2.0
    Work/Life Balance - 3.0
    Culture &amp; Values - 3.0
    Career Opportunities - 2.0
    Comp &amp; Benefits - 2.0
    Senior Management - 1.0
    Work/Life Balance - 3.0
    Culture &amp; Values - 3.0
    Career Opportunities - 4.0
    Comp &amp; Benefits - 5.0
    Senior Management - 2.0
    Work/Life Balance - 3.0
    Culture &amp; Values - 4.0
    Career Opportunities - 3.0
    Comp &amp; Benefits - 2.0
    Senior Management - 3.0
    

    【讨论】:

    • 请检查新链接的子分级,因为它没有列出任何变量,只是 星值
    • 谢谢您,先生,但是有什么办法可以得到子评分吗? 1. 工作/生活平衡 2. 文化与价值观 3. 职业机会 4. 薪酬与福利 5. 高级管理
    猜你喜欢
    • 2021-10-01
    • 2021-04-03
    • 2022-01-10
    • 1970-01-01
    • 2022-11-13
    • 2022-01-16
    • 1970-01-01
    • 1970-01-01
    • 2020-01-10
    相关资源
    最近更新 更多