如何在 python selenium-webdriver 中获取标题答案

【问题标题】：How to grab headers in python selenium-webdriver如何在 python selenium-webdriver 中获取标题
【发布时间】：2017-02-14 09:51:37
【问题描述】：

我正在尝试获取 selenium webdriver 中的标题。类似于以下内容：

>>> import requests
>>> res=requests.get('http://google.com')
>>> print res.headers

我需要使用Chrome webdriver，因为它支持flash 和一些我需要测试网页的东西。这是我目前在 Selenium 中所拥有的：

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://login.comcast.net/login?r=comcast.net&s=oauth&continue=https%3A%2F%2Flogin.comcast.net%2Foauth%2Fauthorize%3Fclient_id%3Dxtv-account-selector%26redirect_uri%3Dhttps%3A%2F%2Fxtv-pil.xfinity.com%2Fxtv-authn%2Fxfinity-cb%26response_type%3Dcode%26scope%3Dopenid%2520https%3A%2F%2Flogin.comcast.net%2Fapi%2Flogin%26state%3Dhttps%3A%2F%2Ftv.xfinity.com%2Fpartner-success.html%26prompt%3Dlogin%26response%3D1&reqId=18737431-624b-44cb-adf0-2a85d91bd662&forceAuthn=1&client_id=xtv-account-selector')
driver.find_element_by_css_selector('#user').send_keys('XY@comcast.net')
driver.find_element_by_css_selector('#passwd').send_keys('XXY')
driver.find_element_by_css_selector('#passwd').submit()
print driver.headers ### How to do this?

我看到了其他一些建议运行整个 selenium 服务器来获取此信息的答案 (https://github.com/derekargueta/selenium-profiler)。我如何通过 Webdriver 使用与上述类似的东西来获得它？

【问题讨论】：

您能否详细说明您要提取哪些标头以及用于什么目的？谢谢。
很确定你不能开箱即用。

标签： python selenium

【解决方案1】：

很遗憾，您无法从 Selenium 网络驱动程序中获得此信息，而且您似乎在不久的将来也无法获得此信息。摘自a very long conversation on the subject：

此功能不会发生。

主要的原因是，根据我从讨论中收集到的信息，webdriver 旨在“驱动浏览器”，并且在开发人员看来，将 API 扩展到该主要目标之外将导致API 的整体质量和可靠性受到影响。

我在很多地方看到的一个潜在的解决方法，包括上面链接的对话，是使用BrowserMob Proxy，它可以用来捕获 HTTP 内容，和can be used with selenium - 虽然链接的例子没有使用 Python 硒 API。似乎确实有a Python wrapper for BrowserMob Proxy，但我无法保证它的功效，因为我从未使用过它。

【讨论】：

如何在页面内执行 javascript 或其他内容以将其记录到控制台或其他内容？有没有（hackish）的方式来做这样的事情？
我在这个主题上反复看到的一个建议是使用 BrowserMob 代理：github.com/lightbody/browsermob-proxy，它可以与 selenium 一起使用：github.com/lightbody/browsermob-proxy#using-with-selenium。但是，我没有使用此实用程序的经验。抱歉，我无法提供更多帮助！
@David542 另请参阅我更新答案的最后一段。它包含一个指向 BrowserMob 代理的 Python 包装器的链接，它可能适用于您的用例。

【解决方案2】：

现在，我想这很容易https://pypi.org/project/selenium-wire/ 它是硒的延伸。使用from seleniumwire import webdriver 并照常进行。

【讨论】：

我希望它更受欢迎并且更新仍在推出中，它是如此直观......

【解决方案3】：

您可以尝试Mobilenium，这是一个绑定 BrowserMob 代理和 Selenium 的 python 包（仍在开发中）。

一个用法示例：

>>> from mobilenium import mobidriver
>>>
>>> browsermob_path = 'path/to/browsermob-proxy'
>>> mob = mobidriver.Firefox(browsermob_binary=browsermob_path)
>>> mob.get('http://python-requests.org')
301
>>> mob.response['redirectURL']
'http://docs.python-requests.org'
>>> mob.headers['Content-Type']
'application/json; charset=utf8'
>>> mob.title
'Requests: HTTP for Humans \u2014 Requests 2.13.0 documentation'
>>> mob.find_elements_by_tag_name('strong')[1].text
'Behold, the power of Requests'

【讨论】：

我收到一个错误ImportError: cannot import name 'mobidriver'。有什么想法可以解决这个问题吗？
@west pip install mobilenium
@West 我错了，那个 repo 有问题，你可以解决它：pip install -U git+https://github.com/rafpyprog/Mobilenium

【解决方案4】：

可以通过日志获取header（来源Mma's answer）

from selenium import webdriver
import json
driver = webdriver.PhantomJS(executable_path=r"your_path")
har = json.loads(driver.get_log('har')[0]['message']) # get the log
print('headers: ', har['log']['entries'][0]['request']['headers'])

【讨论】：

你把网站网址放在哪里？

【解决方案5】：

您可以使用 JAVASCRIPT 内置方法。

但只有在驱动程序已经创建后才能完成。

from selenium import webdriver
driver = webdriver.Chrome()
# Store it in a variable and print the value
agent = driver.execute_script("return navigator.userAgent")
print(agent)
# directly print the value
print(driver.execute_script("return navigator.userAgent"))

【讨论】：