【问题标题】:Extract Snapdeal star rating for reviews提取 Snapdeal 星级评分以进行评论
【发布时间】:2021-10-13 07:22:51
【问题描述】:

我正在尝试抓取评论及其对 Snapdeal 的各种产品的星级评分。我正在通过产品 URL 访问产品。在特定页面上,我想根据星级过滤掉评级并获取评级编号以及评论。我正在使用以下代码来这样做

'''

url_snapdeal=('https://www.snapdeal.com/')

driver.get(url_snapdeal)
time.sleep(2)

search = driver.find_element_by_id('inputValEnter')
search.clear()
search.send_keys('smartphone')
search.send_keys(Keys.ENTER)
time.sleep(2)

for i in range(0,3):
    driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
    time.sleep(1)

urls=[]

for link in driver.find_elements_by_xpath("//div[@class='product-desc-rating ']/a"):
    urls.append(link.get_attribute('href'))

snap_reviews=[]
snap_ratings=[]

for url in urls:
    driver.get(url)
    time.sleep(4)

    try:
        for x in range(2,7):
            driver.find_element_by_xpath("//div[@class='selectarea']").click()
            time.sleep(1)
            driver.find_element_by_xpath(f"//div[@class='options']/ul/li[{x}]").click()
            time.sleep(1)

            for rating in driver.find_elements_by_xpath("//div[@class='user-review']/div[1]"):
                stars = rating.find_elements_by_xpath("i[@class='sd-icon sd-icon-star active']")
                snap_ratings.append(len(stars))
    
    except NoSuchElementException:
        print('Not found')

try 块应该是点击星级过滤下拉菜单并选择 5star,收集星级评分和评论文本,再次点击下拉菜单,点击 4star 并收集评分和评论,等等。

我的代码设法点击下拉菜单,但无法点击 5 星、4 星等过滤选项。它会引发 ElementNotInteractable 异常。

任何帮助或建议将不胜感激。提前致谢。

【问题讨论】:

    标签: python selenium web-scraping


    【解决方案1】:

    您实际上可以通过产品编号直接获得评分。因此,获取产品编号并将其输入(我没有看过,但也有可能获得那些没有硒的产品)。然后你可以过滤数据框。以下是 1 个产品的示例:

    import requests
    import pandas as pd
    import math
    
    
    productId = 639365186960
    
    url = 'https://www.snapdeal.com/acors/web/getSelfieList/v2'
    payload = {
        'productId':productId,
        'offset':0}
    
    jsonData = requests.get(url, params=payload).json()
    total_pages = math.ceil(jsonData['selfieTotal'] / 10)
    
    for page in range(1,total_pages+1):
        if page == 1:
            ratings = jsonData['selfieList']
        else:
            payload['offset'] = 10*(page-1)
            jsonData = requests.get(url, params=payload).json()
            
            ratings += jsonData['selfieList']
    
    
    df = pd.DataFrame(ratings)
    

    输出:

    df[df['rating'] == 4]
    Out[82]: 
                                selfieId  ...                                       reducedImage
    0   015cd9a6a1e80000dd22850445ac4f71  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    1   015bffb8bcb60000dd22850466cc1c72  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    6   015b4dd88df70000dd228504c9271488  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    7   015b4dd7f9b80000dd228504b8777fc8  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    8   015b1e574cdc0000dd228504d4a694ad  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    9   015b182be5640000dd22850418c0bdd6  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    10  015aa6e8700e0000dd228504a3378958  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    11  015a9df9ab640000dd2285045069dcff  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    14  015a4c7a37040000dd2285045daaa6d3  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    15  015a4b377b8b0000dd228504d3dd159b  ...  https://n1.sdlcdn.com/image/upload/h_162,w_162...
    
    [10 rows x 9 columns]
    

    【讨论】:

    • 只需要使用 Selenium Web Driver 来完成。有没有办法解决这个问题?
    • 是的。请问,为什么需要使用 Selenium?
    • 我想说的任务标准。对我的代码进行任何建议的修改以完成工作?
    • 我明天去看看。但是你很可能可以做到
    猜你喜欢
    • 2021-08-01
    • 2014-09-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-05-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多