使用 Selenium 从元素中提取“href”答案

【问题标题】：Extract 'href' from elements using Selenium使用 Selenium 从元素中提取“href”
【发布时间】：2012-12-23 21:18:12
【问题描述】：

这是我当前的 XPath：

//table//tr/td/div/div[1]/div/a/@href

它匹配我正在查看的页面上的十个网址。这种形式有十个匹配项jobs/720800-Associate-Partner-Investment-Consulting-Vancouver-Job-ID-39708.aspx

我正在尝试使用selenium.get_text() 拉取@href 字符串；但是，我的电话正在拉空白（注意：没有失败，只是拉空白）。我能够成功地在同一页面上的其他元素上提取字符串。

我已搜索并找不到任何解决我的问题的方法 - 有人有什么建议吗？

【问题讨论】：

需要更多示例输入......例如 - //table//tr/td/div/div[1]/div/a[@href]/@href 产生什么？
您想要特定链接的文本还是所有链接的文本？
xpather 告诉我这是一个合法的 xpath；但是，当我运行我的脚本时，get_text 会提取空字符串，而 get_attribute 会给我一个无效的 xpath 错误。
是的，我正在寻找所有链接的文本。

标签： python xpath selenium

【解决方案1】：

如果您使用的是 python selenium，这可能有点晚了（根据您的标签）您可以这样做（作为 v2.44.0）：

from selenium import webdriver
# set the driver
driver = webdriver.Firefox()
# get the element
elem = driver.find_element_by_xpath('//table//tr/td/div/div[1]/div/a')
# get the attribute value
link = elem.get_attribute('href')

【讨论】：

【解决方案2】：

如果我理解正确，问题在于该路径有 <a href="XXX"> 的 href 为空，而其他锚点 href 不为空。你只想得到那些不为空的href。那么，使用这个表达式：

//table//tr/td/div/div[1]/div/a[@href!=""]/@href

【讨论】：

谢谢。它们都是空的，但我知道它们不是基于查看源代码——我只是以错误的方式访问它们。

【解决方案3】：

试试这个

get_attribute("//table//tr/td/div/div[1]/div/a@href");

【讨论】：

感谢 Santoshsarma。不幸的是，当我运行此命令时，我收到了一个无效的 xpath 错误： Invalid xpath [2]: (//div[@id='event-listings']/ul/li/p/a where the "@href: follow the “.../p/a”
试试这样：//div[@id='event-listings']/ul/li/p/a@href

【解决方案4】：

只引用 Anchor 标签，不要引用 href 属性。一旦我们拥有所有元素，然后对 href 元素执行 Get_Attribute()....

find_elements_by_xpath("//table//tr/td/div/div[1]/div/a[@href]")
For Loop
print Each_element.Get_Attribute("href")

我希望这会有所帮助...

【讨论】：

这给了我 /a 元素的文本，而不是 href 对象的文本（即 url 字符串）。不过感谢您的建议。
对不起，麻烦的伙伴...检查我更新的答案...这将工作...检索元素后...而不是 GetText() 我们需要对每个元素执行 Get_Attribute()对于 url 字符串。
我尝试了一个 for 循环，使用 find_elements_by_xpath、find_element_by_xpath 以及这些方法与“驱动程序”的组合。或“浏览器”。在他们面前。我不断收到“名称：find_element not defined”错误或未定义驱动程序或浏览器。我的 for 循环采用这种形式： for i in find_element_by_xpath("//div[@id='event-listings']/ul/li/p/a[@href]"): print i.get_attribute("href" )
除了简单的 selenium 之外，我可能还需要导入扩展吗？
我终于能够使用 get_attribute 做到这一点。我的错误是在引用 xpath 时不包括“xpath”。这是具体的命令： element1 = sel.get_attribute("xpath=//div[@id='event-listings']/ul/li[2]/p/a@href") 。我想让它拾取与特定 xpath 匹配的所有 xpath，但我可以使用它。