【问题标题】:How do I scrape data from Trip Advisor by using Selenium? - Python如何使用 Selenium 从 Trip Advisor 中抓取数据? - Python
【发布时间】:2021-11-01 15:57:22
【问题描述】:

我正在学习如何使用selenium通过Python从TripAdvisor抓取数据,并想在(https://en.tripadvisor.com.hk/Hotels-g294217-Hong_Kong-Hotels.html)的链接中按“旅行者排名”排序后提取酒店信息。 酒店名称和每家酒店的“data-location=”要提取html页面。

["data-location="的html代码][1] [1]:https://i.stack.imgur.com/x668S.png

这是我的代码。我不知道为什么它不能打印酒店名称。我也不知道如何列出“data-location=”中的数字。

!pip install selenium

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome(executable_path='C:\ProgramData\Anaconda3\Lib\site-packages\jupyterlab\chromedriver.exe')
browser.get('https://en.tripadvisor.com.hk/Hotels-g294217-Hong_Kong-Hotels.html')

browser.maximize_window()
CheckinDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[1]')
CheckinDate.click()

CheckOutDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[2]')
CheckOutDate.click()

Roombutton = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div[4]/button')
Roombutton.click()

WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div'))).click()
browser.find_element(By.XPATH,'//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div[2]/div[1]/div').click()

results = browser.find_elements_by_css_selector('#bodycon_main .prw_meta_hsx_responsive_listing')
for result in results:
    try:
        link = result.find_element_by_xpath("./div/div[1]/div[2]/div[1]/div/a")
        print(link.text)
    except:
        continue

非常感谢!

【问题讨论】:

    标签: python selenium


    【解决方案1】:

    您没有正确定位 results 变量,它返回了一个空对象,导致没有输出。以下代码应该可以工作。

    代码片段-

    CheckinDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[1]')
    CheckinDate.click()
    
    CheckOutDate = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div/div[2]/div/div[2]/div[1]/div[3]/div[3]/div[2]')
    CheckOutDate.click()
    
    Roombutton = browser.find_element(By.XPATH, '//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[4]/div[2]/div/div[2]/div/div[4]/button')
    Roombutton.click()
    
    WebDriverWait(browser, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div'))).click()
    browser.find_element(By.XPATH,'//*[@id="component_15"]/div[2]/div[2]/span[1]/div/div[2]/div[1]/div').click()
    
    #time sleep to wait for all results to load after applying the preferences
    #can be adjusted accordingly
    time.sleep(10)
    
    #locate all hotel results
    results = browser.find_elements_by_xpath('//div[@class="prw_rup prw_meta_hsx_responsive_listing ui_section listItem"]')
    
    #for each hotel in page results
    for result in results:
        try:
            #find hotel name
            link = result.find_element_by_xpath('*//div[@class="listing_title"]/a')
    
            #find class which contains data-location attribute
            data_location=result.find_element_by_xpath('*//div[@class="pdWrapper node-preserve ajax_preserve"]').get_attribute("data-location")
            
            print(link.text)
            print(data_location)
    
        except:
            continue
    

    【讨论】:

    • 它有效。非常感谢!
    • @Wong_0606 不客气!!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-21
    • 2020-03-19
    • 2017-09-09
    • 1970-01-01
    • 2020-01-01
    • 2021-03-21
    相关资源
    最近更新 更多