【发布时间】:2015-10-17 08:45:35
【问题描述】:
这运行和抓取链接完全按照我想要的方式运行,除了当我在终端中运行它时 python 无法识别“scraped_pages”的值,抓取的页面将在每个循环中递增 1,但它只会在整数更高时继续比“page_nums”。当我将“page_nums”设置为低于 5 的整数时,它将运行并在 5 处停止,但如果再多,它将崩溃。如果我没有把这个问题表述得最好,我深表歉意。 上面的所有代码都在工作,这是问题代码。所有模块也正确导入。 它使用硒,我不确定显式等待是否有效,因为它在达到“page_nums”值之前就崩溃了。
page_nums = raw_input("how many pages to scrape?: ")
urls_list = []
scraped_pages = 0
scraped_links = 0
while scraped_pages <= page_nums:
for li in list_items:
for a in li.find_all('a', href=True):
url = a['href']
if slicer(url,'http'):
url1 = slicer(url,'http')
urls_list.append(url1)
scraped_links += 1
elif slicer(url,'www'):
url1 = slicer(url,'www')
urls_list.append(url1)
scraped_links += 1
else:
pass
scraped_pages += 1
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]")))
driver.find_element_by_xpath("/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]").click()
print scraped_links
print urls_list
这是返回的错误的一部分。
1
2
Traceback (most recent call last):
File "google page click 2.py", line 51, in <module>
driver.find_element_by_xpath("/html/body/div[5]/div[4]/div[9]/div[1]/div[3]/div/div[5]/div/span[1]/div/table/tbody/tr/td[12]").click()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 75, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 454, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 201, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 181, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: Element is not currently visible and so may not be interacted with
Stacktrace:
at fxdriver.preconditions.visible (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:9981)
at DelayedCommand.prototype.checkPreconditions_ (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12517)
at DelayedCommand.prototype.executeInternal_/h (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12534)
at DelayedCommand.prototype.executeInternal_ (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12539)
at DelayedCommand.prototype.execute/< (file:///tmp/tmpzSHEeb/extensions/fxdriver@googlecode.com/components/command-processor.js:12481)
【问题讨论】:
标签: python selenium selenium-webdriver