【发布时间】:2019-07-16 15:12:24
【问题描述】:
我正在尝试截取 Bootstrap 模式中的元素的屏幕截图。经过一番挣扎,我终于想出了这个代码:
driver.get('https://enlinea.sunedu.gob.pe/')
driver.find_element_by_xpath('//div[contains(@class, "img_publica")]').click()
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'modalConstancia')))
driver.find_element_by_xpath('//div[contains(@id, "modalConstancia")]').click()
active_element = driver.switch_to.active_element
active_element.find_elements_by_id('doc')[0].send_keys(graduate.id)
# Can't take this screenshot
active_element.find_elements_by_id('captchaImg')[0].screenshot_as_png('test.png')
错误是:
Traceback (most recent call last):
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/rq/worker.py", line 812, in perform_job
rv = job.perform()
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/rq/job.py", line 588, in perform
self._result = self._execute()
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/rq/job.py", line 594, in _execute
return self.func(*self.args, **self.kwargs)
File "./jobs/sunedu.py", line 82, in scrap_document_number
record = scrap_and_recognize(driver, graduate)
File "./jobs/sunedu.py", line 33, in scrap_and_recognize
active_element.find_elements_by_id('captchaImg')[0].screenshot_as_png('test.png')
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 567, in screenshot_as_png
return base64.b64decode(self.screenshot_as_base64.encode('ascii'))
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 557, in screenshot_as_base64
return self._execute(Command.ELEMENT_SCREENSHOT)['value']
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/cesar/Development/manar/venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"Cannot take screenshot with 0 width."}
(Session info: chrome=75.0.3770.100)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Linux 4.4.0-154-generic x86_64)
经过一些调试,我意识到元素没有宽度或高度:
(Pdb) active_element.find_elements_by_id('captchaImg')[0].rect
{'height': 0, 'width': 0, 'x': 0, 'y': 0}
(Pdb) active_element.find_elements_by_id('captchaImg')[0].size
{'height': 0, 'width': 0}
我认为这是失败的原因。有没有办法解决这个问题?
这些是步骤:
- 点击链接:
- 等待模态并填充第一个输入:
- 尝试截取验证码图片:
如果我在浏览器中检查元素(保存验证码图像的span),我可以看到它实际上是 100x50:
【问题讨论】:
-
您可能不会击败验证码。您不能期望通过浏览器检查元素时看到的内容与脚本所看到的内容相同,即使它访问同一页面也是如此。 Captcha 很聪明,它会知道你正在尝试抓取页面,但它不会起作用。尝试截取整个页面而不是只截取该元素。
-
webdriver
Firefox提供正确的大小,但函数screenshot()始终保存整页。 -
@c0lon 验证码不是有情众生,它只是页面上的另一个元素,可以使用 selenium 进行交互和克服。
标签: python python-3.x selenium web-scraping