使用 Selenium 获取 HTML 注释标签的内容答案

【问题标题】：Get content of HTML comment tag with Selenium使用 Selenium 获取 HTML 注释标签的内容
【发布时间】：2017-02-24 16:48:34
【问题描述】：

我发现使用 python 2.7 和 selenium 很难获取 HTML 页面的 head 标记中包含的 HTML 注释标记  的内容。

<head>
   <!-- I would like to get this sentence -->
   [...]
</head>

我使用 FirePath/FireBug 获得了该评论的 XPath（所以我假设它是正确的）：html/head/comment()[1]。

然后：

这个given_driver.find_element_by_xpath('html/head/comment()[1]')给我InvalidSelectorException说Message: The given selector html/head/comment()[1] is either invalid or does not result in a WebElement. The following error occurred: InvalidSelectorError: The result of the xpath expression "html/head/comment()[1]" is: [object Comment]. It should be an element.
这个head_element = given_driver.find_element_by_xpath('html/head')然后给了我head标签中的整个HTML代码head_element.get_attribute('innerHTML')就像：u'\n [...]

但我想只获取head 标签内的评论标签的内容。我想知道这对于 Selenium 是不可能的，但对我来说似乎很奇怪。我怎么能得到它？

【问题讨论】：

标签： html python-2.7 selenium comments

【解决方案1】：

Selenium API 不支持评论节点。但是，您可以使用这段 JavaScript 轻松获得评论：

head = driver.find_element_by_css_selector("head")
comment = get_element_comment(head)
print(comment)

def get_element_comment(element):
    return element._parent.execute_script("""
      return Array.prototype.slice.call(arguments[0].childNodes)
        .filter(function(e) { return e.nodeType === 8 })
        .map(function(e) { return e.nodeValue.trim() })
        .join('\n');
      """, element)

【讨论】：

我用 BeautifulSoup 来解决这个问题：head_content_soup = BeautifulSoup(given_driver.find_element_by_xpath('html/head').get_attribute('innerHTML'), 'html.parser') element_from_comment_tag = head_content_soup.findAll(text=lambda text:isinstance(text, Comment)) 我只是希望有一种方法可以使用 Selenium 来解决这个问题

【解决方案2】：

您必须获取页面源并从那里找到（解析）所需的评论。像这样的：

driver.Navigate().GoToUrl("your url");
var src = driver.PageSource;

然后解析src

【讨论】：