使用 selenium 删除容器中的元素答案

【问题标题】：Remove an element in a container using selenium使用 selenium 删除容器中的元素
【发布时间】：2018-11-19 11:25:59
【问题描述】：

我只想抓取黑框内包含的所需信息，删除/移除/排除红框内包含的信息

我这样做是因为两个框中都存在类名“条目”和“部分条目”。只有第一个“部分条目”包含我需要的信息，所以我打算删除/移除/排除类名“mgrRspnInLine”。

我的代码是：

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for item in container:
        try:
            element = item.find_element_by_class_name('mgrRspnInline')
            driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element.click()
            time.sleep(2)
            rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
            for rate in rating:
                rate = rate.get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
            time.sleep(2)
            stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
            for stayed in stay:
                stayed = stayed.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
            summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
            for comment in summary:
                comment = comment.text
                comments.append(comment)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
            rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
            for date in rating_date:
                date = date.get_attribute("title")
                date = str(date)
                review_date.append(date)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
            review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
            for comment in review:
                comment = comment.text
                print(comment)
                reviews.append(comment)
        except (NoSuchElementException) as e:
            continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

基本上在“review-container”中，我首先搜索了类名“mgrRspnInLine”，然后尝试使用 execute_script 将其删除。

但不幸的是，输出仍然显示“mgrRspnInLine”中包含的内容。

【问题讨论】：

您删除元素的代码应该可以工作。可能有几个类名为 mgrRspnInLine 的元素（隐藏？），所以您可能删除了错误的元素...您可以将代码简化为 driver.execute_script("""arguments[0].parentNode.removeChild(arguments[0]);""", element)

标签： python selenium selenium-webdriver web-scraping selenium-chromedriver

【解决方案1】：

如果您想避免通过 XPath 匹配第二个元素，您可以修改 XPath，如下所示：

.//*[contains(@class,"partial_entry") and not(ancestor::*[@class="mgrRspnInLine"])]

这将匹配类名 "partial_entry" 的元素，前提是它没有类名 "mgrRspnInLine" 的祖先

【讨论】：

真棒表情@sir Andersson。总是有新东西要学习。

【解决方案2】：

如果你想要第一次出现，你可以使用 css 类选择器来代替：

.partial_entry

并使用find_element_by_css_selector检索：

find_element_by_css_selector(".partial_entry")

【讨论】：

【解决方案3】：

您可以使用以下命令删除所有 .mgrRspnInLine 元素：

driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")

【讨论】：

【解决方案4】：

拼接 Andersson 的评论，以及 QHarr 和 pguardiario 提供的两个答案。我终于解决了这个问题。

关键是针对容器内的一个容器，所有信息都包含在类名“ui_column is-9”中，该类名包含在类名“review-container”中，因此解决了Andersson的多个@987654321的评论@。

在嵌套循环中，我使用 pguardianrio 的建议删除现有的多个 mgrRspnInLine，然后在 .partial_entry 上添加 QHarr 的答案

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for items in container:
        element = WebDriverWait(driver, 1000).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
        element.click()
        time.sleep(10)
        contained = items.find_elements_by_xpath('.//*[contains(@class,"ui_column is-9")]')
        for item in contained:
            try:
                driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")
                rating = item.find_element_by_xpath('//*[contains(@class,"ui_bubble_rating bubble_")]')
                rate = rating .get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
                time.sleep(2)
                stay = item.find_element_by_xpath('.//*[contains(@class,"recommend-titleInline")]')
                stayed = stay.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
                summary = item.find_element_by_xpath('.//*[contains(@class,"noQuotes")]')
                comment = summary.text
                comments.append(comment)
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
                rating_date = item.find_element_by_xpath('.//*[contains(@class,"ratingDate")]')
                date = rating_date.get_attribute("title")
                date = str(date)
                review_date.append(date)
                WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
                review = item.find_element_by_css_selector(".partial_entry")
                comment = review.text
                print(comment)
            except (NoSuchElementException) as e:
                continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

【讨论】：