【问题标题】:Python selenium skips over necessary elementsPython selenium 跳过必要的元素
【发布时间】:2016-10-16 11:29:00
【问题描述】:

请不要投票,这个问题与上一个问题不同,我在这里使用不同的逻辑

我试图从这个页面https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or560-TAP-Portugal#REVIEWS迭代所有用户评论(“partial_entry”类)

如果有非英文评论,那么我想打印它的英文翻译版本。否则,如果评论已经是英文,我想自己打印英文。但是它的代码跳过了这些 cmets(不打印它们)。您还可以在输出中看到 cmets 被打印了两次。

此页面上有 10 条评论/cmets(已翻译+未翻译),应该全部打印出来。

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()

driver.maximize_window()
url="https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or560-TAP-Portugal#REVIEWS"
driver.get(url)
ctr=0


def expand_reviews(driver):
    # TRYING TO EXPAND REVIEWS (& CLOSE A POPUP)    
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err"
    try:
        driver.find_element_by_class_name("ui_close_x").click()
    except:
        print "err2"
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err3"

# FIRST EXPAND THE REVIEWS BY CLICKING "MORE" BUTTON
expand_reviews(driver)




for j in driver.find_elements_by_xpath("//div[@class='wrap']"): # FIND ALL REVIEW ELEMENTS
    for ent in j.find_elements_by_xpath('.//p[@class="partial_entry"]'): # FIND REVIEW TEXT
        # FIRST CHECK IF TRANSLATION IS AVAILABLE (I.E. NON ENGLISH COMMENTS)
        if j.find_elements_by_css_selector('#REVIEWS .googleTranslation>.link'):
            #print 'NOW PRINTING TRANSLATED COMMENTS'
            gt= driver.find_elements(By.CSS_SELECTOR,"#REVIEWS .googleTranslation>.link")
            size=len(gt)
            while (ctr<size):
                for i in gt:
                    try:
                        if not i.is_displayed():
                            continue
                        driver.execute_script("arguments[0].click()",i)
                        wait = WebDriverWait(driver, 10)
                        wait.until(EC.element_to_be_clickable((By.XPATH, ".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")))
                        com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
                        print com.text
                        print "++" * 60
                        time.sleep(5)
                        driver.find_element_by_class_name("ui_close_x").click()
                        time.sleep(5)
                        #loop+=1
                    except Exception as e:
                        print "skipped"
                        pass
                    ctr+=1
        # COMMENT ALREADY IN ENGLISH, PRINT AS IT IS
        else:
            print ent
            print "="*60

driver.quit()

==================================输出============= =============

<selenium.webdriver.remote.webelement.WebElement (session="15b6c83088a289e59c544a2c7787d27d", element="0.40753995907133644-28")>
============================================================
<selenium.webdriver.remote.webelement.WebElement (session="15b6c83088a289e59c544a2c7787d27d", element="0.40753995907133644-29")>
============================================================
<selenium.webdriver.remote.webelement.WebElement (session="15b6c83088a289e59c544a2c7787d27d", element="0.40753995907133644-30")>
============================================================
<selenium.webdriver.remote.webelement.WebElement (session="15b6c83088a289e59c544a2c7787d27d", element="0.40753995907133644-31")>
============================================================
<selenium.webdriver.remote.webelement.WebElement (session="15b6c83088a289e59c544a2c7787d27d", element="0.40753995907133644-32")>
============================================================
On my change my flight without asking my opinion or offer another solution without paying extra I stay more than 10 hours in boarding of room I have the urge to have something to eat I haven not even able to rest after my flight c is inadmissible night I no longer would resume this company and would not advise a person to take
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A little apprehensive before but quickly lifted. Very welcome and good service from the PNC, hot meal and good even for this short flight (1h50). Good punctuality and boarding more efficient
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Everything normal. Aircraft clean and almost full. Embarking on time, regular. Arrive slightly earlier. friendly and courteous staff. On board it was given a snack.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In the recent past I have traveled a few times from Venice to Lisbon and from Venice to Oporto via Lisbon. Good facilities on land and aboard; friendly service, clean air, punctuality and competitive rates. recommended
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sympathy and competence. The company strives to make passengers as comfortable as possible.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On my change my flight without asking my opinion or offer another solution without paying extra I stay more than 10 hours in boarding of room I have the urge to have something to eat I haven not even able to rest after my flight c is inadmissible night I no longer would resume this company and would not advise a person to take
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A little apprehensive before but quickly lifted. Very welcome and good service from the PNC, hot meal and good even for this short flight (1h50). Good punctuality and boarding more efficient
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Everything normal. Aircraft clean and almost full. Embarking on time, regular. Arrive slightly earlier. friendly and courteous staff. On board it was given a snack.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

【问题讨论】:

  • 我在这个页面上看到所有需要翻译的 cmets,你能分享一个我们有英语和非英语 cmets 的页面
  • 当你点击“更多”展开文本时,文本包含在“div”内的“p”中,类为“entry”......
  • @thebadguy 此页面有前 5 个英语 cmets,其余 5 个葡萄牙语 tripadvisor.com/…
  • @shalini ...您的代码在我的机器上运行良好...它打印第一个英文评论..而不是翻译一个。
  • 我赞成你的问题

标签: python selenium web-scraping


【解决方案1】:

在每个脚本中删除 chromedriver 路径的一个技巧。将 chromedriver.exe 放在 C:\Python27\Scripts 中,而不需要将 chromedriver 路径放在每个脚本中,而只需使用 driver = webdriver.Chrome()

我正在运行这段代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.maximize_window()
url="https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or560-TAP-Portugal#REVIEWS"
driver.get(url)
ctr=0


for j in driver.find_elements_by_xpath("//div[@class='wrap']"): # FIND ALL REVIEW ELEMENTS
    for ent in j.find_elements_by_xpath('.//p[@class="partial_entry"]'): # FIND REVIEW TEXT
        # FIRST CHECK IF TRANSLATION IS AVAILABLE (I.E. NON ENGLISH COMMENTS)
        if j.find_elements_by_css_selector('#REVIEWS .googleTranslation>.link'):
            #print 'NOW PRINTING TRANSLATED COMMENTS'
            gt= driver.find_elements(By.CSS_SELECTOR,"#REVIEWS .googleTranslation>.link")
            size=len(gt)
            while (ctr<size):
                for i in gt:
                    try:
                        if not i.is_displayed():
                            continue
                        driver.execute_script("arguments[0].click()",i)
                        wait = WebDriverWait(driver, 10)
                        wait.until(EC.element_to_be_clickable((By.XPATH, ".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")))
                        com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
                        print com.text
                        print "++" * 60
                        time.sleep(5)
                        driver.find_element_by_class_name("ui_close_x").click()
                        time.sleep(5)
                        #loop+=1
                    except Exception as e:
                        print "skipped"
                        pass
                    ctr+=1
        # COMMENT ALREADY IN ENGLISH, PRINT AS IT IS
        else:
            print ent.text
            print "="*60

driver.quit()

我得到的输出是:

Quite comfortable for the economy class, with a friendly staff and good service. The food is good but could still be better.
============================================================
The pilot was amazing, soft take off, soft landing (even with ruf weather), very nice staff with amazing portuguese food and wine. The only downsize was the interior condition, although clean and without scratches or so you could see that is already aged. Appart from that all was good.
============================================================
Speedy check in process was very accurate and precise. They allowed cabin to be booked into the hold with no additional charges. Boarding was efficient and timely. the seats were very comfortable. Wide enough to fit me fairly comfortably with armrests that were able to lift during the flight. The really stand out thing for me was the leg space....
============================================================
My country's flag airline, It has struggle to survive in a hard economic cycle. Clever choice of unique African and south american cities, guarantied its continuity.~ Do not expect a exquisite food, alcoholic drinks, down to beer and wine, forget white spirits. Good safety record. Pilots well trained, good maintenance. I have flight TAP for the last 40 odd years...
============================================================
Our first trip to Europe on a long flight both ways. The flight TO Rome was good. I am tall and have back issues, and thank God we were able to get exit row seats. This made all the difference in the world. The food served was fair to good. There were movies offered which helped pass the time and...
============================================================
On my change my flight without asking my opinion or offer another solution without paying extra I stay more than 10 hours in boarding of room I have the urge to have something to eat I haven not even able to rest after my flight c is inadmissible night I no longer would resume this company and would not advise a person to take
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A little apprehensive before but quickly lifted. Very welcome and good service from the PNC, hot meal and good even for this short flight (1h50). Good punctuality and boarding more efficient
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Everything normal. Aircraft clean and almost full. Embarking on time, regular. Arrive slightly earlier. friendly and courteous staff. On board it was given a snack.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In the recent past I have traveled a few times from Venice to Lisbon and from Venice to Oporto via Lisbon. Good facilities on land and aboard; friendly service, clean air, punctuality and competitive rates. recommended
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sympathy and competence. The company strives to make passengers as comfortable as possible.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

处理评论内容的更新:

  1. 首先使用命令“pip install nltk”安装ntkl
  2. 安装完成后
  3. 打开 python shell ..即空闲
  4. 输入命令:import nltk
  5. 现在输入:nltk.download()
  6. UI 将打开:点击 Models...搜索 punkt 并点击下载
  7. 在此之后...点击语料库....搜索停用词并点击下载

这些安装完成:运行以下程序:

from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from nltk import word_tokenize
from nltk.corpus import stopwords



def detect_lang(text):
    lang_ratios = {}

    tokens = word_tokenize(text)
    words = [word.lower() for word in tokens]

    for language in stopwords.fileids():
        stopwords_set = set(stopwords.words(language))
        words_set = set(words)
        common_elements = words_set.intersection(stopwords_set)

        lang_ratios[language] = len(common_elements)
    return max(lang_ratios, key=lang_ratios.get)



driver = webdriver.Chrome()
driver.maximize_window()
url="https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or570-TAP-Portugal#REVIEWS"
driver.get(url)
ctr=0

time.sleep(5)

def expand_reviews(driver):
    # TRYING TO EXPAND REVIEWS (& CLOSE A POPUP)
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err"
    try:
        driver.find_element_by_class_name("ui_close_x").click()
    except:
        print "err2"
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err3"

# # FIRST EXPAND THE REVIEWS BY CLICKING "MORE" BUTTON
expand_reviews(driver)

time.sleep(10)


for ent in driver.find_elements_by_xpath('.//div[@class="entry"]/p[1]'): # FIND REVIEW TEXT
    lang = detect_lang(ent.text)
    if (lang == 'english'):
        print ent.text
        print "=="*30
    else:
        if driver.find_elements_by_css_selector('#REVIEWS .googleTranslation>.link'):

            gt= driver.find_elements(By.CSS_SELECTOR,"#REVIEWS .googleTranslation>.link")
            size=len(gt)
            while (ctr<(size/2)):
                for i in gt:
                    try:
                        if not i.is_displayed():
                            continue
                        driver.execute_script("arguments[0].click()",i)
                        wait = WebDriverWait(driver, 10)
                        wait.until(EC.element_to_be_clickable((By.XPATH, ".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")))
                        com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
                        print com.text
                        print "++" * 60
                        time.sleep(5)
                        driver.find_element_by_class_name("ui_close_x").click()
                        time.sleep(5)
                        #loop+=1
                    except Exception as e:
                        print "skipped"
                        pass
                    ctr+=1

这将打印以下输出:

 Speedy check in process was very accurate and precise. They allowed cabin to be booked into the... read more
============================================================
Very pleasant flight, excellent service on board and on the ground, the best seats in the Buisness Class and Top Food and drinks during the flight.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Perfect atendimento.Bom care of Commissioners and Commissioners, punctuality. Good movies offered.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Overall, a good flight! Time (departure and arrival). Enough time for the change to Lisbon. Very nice crew!
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
It was a flight noturno.Teve strong turbulence and I could not dormir.Rezei all night. After all it was a decent trip. only regret the discomfort of the aircraft but praise the good atendimento.Toda the crew was very kind and helpful.The journey back was quieter.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The food really is not the best to tell the truth, I could not even eat. But the service is very good.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Kindness, professionalism, and willingness on the part of the crew: good landing and includes drinks and light dinner
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 9 October flown with this company. By delayed entering the unit departed late. Atmosphere Loos routine operation. The evening meal consisted of a tuna sandwich and a liquid plum in plastic vial. A choice of meat or cheese was not there. For me and many others so no meal on this flight. Downright depressing.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Speedy check in process was very accurate and precise. They allowed cabin to be booked into the hold with no additional charges. Boarding was efficient and timely. the seats were very comfortable. Wide enough to fit me fairly comfortably with armrests that were able to lift during the flight. The really stand out thing for me was the leg space. I'm 6ft 4in and I was able to fully extend my legs under the chair in front of me. if for this reason alone I have started looking for other destinations they fly to. In flight entertainment was none existent but then it was only a short haul flight so I won't fault them. Food snack and beverages were included in the price. The in flight attendants were professional, courteous and well presented. I will definitely use them again in the future!
============================================================
My country's flag airline, It has struggle to survive in a hard economic cycle. Clever choice of unique African and south american cities, guarantied its continuity.~
Do not expect a exquisite food, alcoholic drinks, down to beer and wine, forget white spirits. Good safety record. Pilots well trained, good maintenance. I have flight TAP for the last 40 odd years I have seen many faces and crisis, however customer service really Bad.
============================================================
Our first trip to Europe on a long flight both ways. The flight TO Rome was good. I am tall and have back issues, and thank God we were able to get exit row seats. This made all the difference in the world. The food served was fair to good. There were movies offered which helped pass the time and blankets because the cabin got very cold. We had to sit in Lisbon Airport for six hours to complete our journey. It was tiring but still the trip was good (fun even)

The flight back home was not as pleasant. The first leg from Rome to Lisbon was horrible. The woman in front of me kept slamming her seat up against my legs and telling me she had to lay back and to put my tray table up. The flight attendant SAW this happen and did nothing. I found the crew to be very unfriendly on all legs of the flight. They were not warm and friendly and even bordered on rude at some point.
I would probably fly this airline again because the price was right and it was acceptable....
============================================================

【讨论】:

  • 谢谢,但评论没有扩大。如果您查看输出,请查看(行)3、4、5 没有展开。在展开所有评论之前都是“p.partial_entry”,但在单击“更多”按钮后,它们只是在“p”中。这就是导致各种问题的原因,但是现在我已经弄清楚了,我该如何解决它
  • 用我得到的输出更新了我的问题。评论没有被打印出来,cmets 被打印了两次......网页抓取真的很令人沮丧!!!!!
  • 你能帮我完成“评论扩展”部分吗?非常感谢
  • @vinita ctr ..我没有在我的第二个代码中使用它。只需在您想要的地方初始化它..这取决于您如何使用代码...我只是提供了打印评论的解决方案它们在 p 标签下,没有任何其他标识符..
  • 发生这种情况是因为 ntkl 库,因此 if 中断并进入 else 然后先执行 else。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-10-02
  • 1970-01-01
  • 1970-01-01
  • 2016-02-25
  • 2023-03-27
  • 1970-01-01
相关资源
最近更新 更多