【问题标题】:Remove an element in a container using selenium使用 selenium 删除容器中的元素
【发布时间】:2018-11-19 11:25:59
【问题描述】:

我只想抓取黑框内包含的所需信息,删除/移除/排除红框内包含的信息

我这样做是因为两个框中都存在类名“条目”和“部分条目”。只有第一个“部分条目”包含我需要的信息,所以我打算删除/移除/排除类名“mgrRspnInLine”。

我的代码是:

while True:
    container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
    for item in container:
        try:
            element = item.find_element_by_class_name('mgrRspnInline')
            driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
            element.click()
            time.sleep(2)
            rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
            for rate in rating:
                rate = rate.get_attribute("class")
                rate = str(rate)
                rate = rate[-2:]
                score_list.append(rate)
            time.sleep(2)
            stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
            for stayed in stay:
                stayed = stayed.text
                stayed = stayed.split(', ')
                stayed.append(stayed[0])
                travel_type.append(stayed[1])
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
            summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
            for comment in summary:
                comment = comment.text
                comments.append(comment)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
            rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
            for date in rating_date:
                date = date.get_attribute("title")
                date = str(date)
                review_date.append(date)
            WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
            review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
            for comment in review:
                comment = comment.text
                print(comment)
                reviews.append(comment)
        except (NoSuchElementException) as e:
            continue
    try:
        element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
        element.click()
        time.sleep(2)
    except (ElementClickInterceptedException,NoSuchElementException) as e:
        print(e)
        break

基本上在“review-container”中,我首先搜索了类名“mgrRspnInLine”,然后尝试使用 execute_script 将其删除。

但不幸的是,输出仍然显示“mgrRspnInLine”中包含的内容。

【问题讨论】:

  • 您删除元素的代码应该可以工作。可能有几个类名为 mgrRspnInLine 的元素(隐藏?),所以您可能删除了错误的元素...您可以将代码简化为 driver.execute_script("""arguments[0].parentNode.removeChild(arguments[0]);""", element)

标签: python selenium selenium-webdriver web-scraping selenium-chromedriver


【解决方案1】:

如果您想避免通过 XPath 匹配第二个元素,您可以修改 XPath,如下所示:

.//*[contains(@class,"partial_entry") and not(ancestor::*[@class="mgrRspnInLine"])]

这将匹配类名 "partial_entry" 的元素,前提是它没有类名 "mgrRspnInLine" 的祖先

【讨论】:

  • 真棒表情@sir Andersson。总是有新东西要学习。
【解决方案2】:

如果你想要第一次出现,你可以使用 css 类选择器来代替:

.partial_entry

并使用find_element_by_css_selector检索:

find_element_by_css_selector(".partial_entry")

【讨论】:

    【解决方案3】:

    您可以使用以下命令删除所有 .mgrRspnInLine 元素:

    driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")
    

    【讨论】:

      【解决方案4】:

      拼接 Andersson 的评论,以及 QHarr 和 pguardiario 提供的两个答案。我终于解决了这个问题。

      关键是针对容器内的一个容器,所有信息都包含在类名“ui_column is-9”中,该类名包含在类名“review-container”中,因此解决了Andersson的多个@987654321的评论@。

      在嵌套循环中,我使用 pguardianrio 的建议删除现有的多个 mgrRspnInLine,然后在 .partial_entry 上添加 QHarr 的答案

      while True:
          container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
          for items in container:
              element = WebDriverWait(driver, 1000).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
              element.click()
              time.sleep(10)
              contained = items.find_elements_by_xpath('.//*[contains(@class,"ui_column is-9")]')
              for item in contained:
                  try:
                      driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")
                      rating = item.find_element_by_xpath('//*[contains(@class,"ui_bubble_rating bubble_")]')
                      rate = rating .get_attribute("class")
                      rate = str(rate)
                      rate = rate[-2:]
                      score_list.append(rate)
                      time.sleep(2)
                      stay = item.find_element_by_xpath('.//*[contains(@class,"recommend-titleInline")]')
                      stayed = stay.text
                      stayed = stayed.split(', ')
                      stayed.append(stayed[0])
                      travel_type.append(stayed[1])
                      WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
                      summary = item.find_element_by_xpath('.//*[contains(@class,"noQuotes")]')
                      comment = summary.text
                      comments.append(comment)
                      WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
                      rating_date = item.find_element_by_xpath('.//*[contains(@class,"ratingDate")]')
                      date = rating_date.get_attribute("title")
                      date = str(date)
                      review_date.append(date)
                      WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
                      review = item.find_element_by_css_selector(".partial_entry")
                      comment = review.text
                      print(comment)
                  except (NoSuchElementException) as e:
                      continue
          try:
              element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
              element.click()
              time.sleep(2)
          except (ElementClickInterceptedException,NoSuchElementException) as e:
              print(e)
              break
      

      【讨论】:

        猜你喜欢
        • 2012-12-28
        • 2012-03-28
        • 2020-01-21
        • 1970-01-01
        • 2019-04-16
        • 1970-01-01
        • 1970-01-01
        • 2019-04-21
        • 1970-01-01
        相关资源
        最近更新 更多