【发布时间】:2018-11-19 11:25:59
【问题描述】:
我只想抓取黑框内包含的所需信息,删除/移除/排除红框内包含的信息
我这样做是因为两个框中都存在类名“条目”和“部分条目”。只有第一个“部分条目”包含我需要的信息,所以我打算删除/移除/排除类名“mgrRspnInLine”。
我的代码是:
while True:
container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
for item in container:
try:
element = item.find_element_by_class_name('mgrRspnInline')
driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
element.click()
time.sleep(2)
rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
for rate in rating:
rate = rate.get_attribute("class")
rate = str(rate)
rate = rate[-2:]
score_list.append(rate)
time.sleep(2)
stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
for stayed in stay:
stayed = stayed.text
stayed = stayed.split(', ')
stayed.append(stayed[0])
travel_type.append(stayed[1])
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
for comment in summary:
comment = comment.text
comments.append(comment)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
for date in rating_date:
date = date.get_attribute("title")
date = str(date)
review_date.append(date)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
for comment in review:
comment = comment.text
print(comment)
reviews.append(comment)
except (NoSuchElementException) as e:
continue
try:
element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
element.click()
time.sleep(2)
except (ElementClickInterceptedException,NoSuchElementException) as e:
print(e)
break
基本上在“review-container”中,我首先搜索了类名“mgrRspnInLine”,然后尝试使用 execute_script 将其删除。
但不幸的是,输出仍然显示“mgrRspnInLine”中包含的内容。
【问题讨论】:
-
您删除元素的代码应该可以工作。可能有几个类名为
mgrRspnInLine的元素(隐藏?),所以您可能删除了错误的元素...您可以将代码简化为driver.execute_script("""arguments[0].parentNode.removeChild(arguments[0]);""", element)
标签: python selenium selenium-webdriver web-scraping selenium-chromedriver