从标签python的onclick属性获取URL答案

【问题标题】：Get URL from onclick attribute of a tag python从标签python的onclick属性获取URL
【发布时间】：2019-10-18 04:00:24
【问题描述】：

我正在尝试使用 selenium python 访问标签的 onclick 属性中存在的 URL。它存在于 javascript 函数中。我已经尝试了各种技术来做到这一点，但我还没有找到解决方案。我尝试使用 execute_script 方法执行单击功能。我也尝试过 get_attribute 来获取 onclick 函数，但它没有返回。我想访问 openPopUpFullScreen 函数中的 url

这是html：

<td class="tdAction">
<div class="formResponseBtn icon-only">
<a href="#fh" onclick="javascript: openPopUpFullScreen('/esop/toolkit/negotiation/rfq/publicRfqSummaryReport.do?rfqId=rfq_229969', '');" class="openNewPage" title="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions">
<img src="/esop_custom/images/buttons/print_button.png" title="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions" alt="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions"><img src="/esop_custom/images/buttons/openNewWindow_button.png" title="(Opens in new window)" alt="(Opens in new window)">
</a>
</div>
</td>

这是python代码：

url=browser.find_element_by_xpath("//img[@title='Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions']").click()
print(browser.current_url)
#it returns the previous page I am at.

这是另一个：

id=browser.find_element_by_css_selector(".openNewPage").get_attribute("onclick")
print(id)
#it returns none

我需要 openPopUpFullScreen 函数中存在的 URL，但我无法弄清楚完成此操作的正确解决方案是什么。

更新：我也尝试使用 beautifulsoup 提取 onclick 函数，但似乎没有出现：

这是我的代码：

content = browser.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
res = soup.find("a",{"class":"openNewPage"})
print(res)
#it returns the complete tag but it does not contain onclick attribute
#i tried using this
res = soup.find("a",{"class":"openNewPage"})[onclick]
#it returns an error NameError: name 'onclick' is not defined

【问题讨论】：

尝试 ['onclick'] 而不是 [onclick]
它返回 getitem 返回 self.attrs[key] KeyError: 'onclick'
看来您已经在使用浏览器模拟器了。使用一些延迟以使页面完全加载。
是的，我正在使用 selenium，但正如我上面提到的，当我将 get_attirbute 函数与 selenium 一起使用时，它不会返回任何内容。在执行这个colde之前，我已经有 time.sleep(10) 了。
使用presence_of_element_located：selenium-python.readthedocs.io/waits.html

标签： javascript python html selenium web-scraping

【解决方案1】：

下面

from bs4 import BeautifulSoup


html = '''<td class="tdAction">
<div class="formResponseBtn icon-only">
<a href="#fh" onclick="javascript: openPopUpFullScreen('/esop/toolkit/negotiation/rfq/publicRfqSummaryReport.do?rfqId=rfq_229969', '');" class="openNewPage" title="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions">
<img src="/esop_custom/images/buttons/print_button.png" title="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions" alt="Open a new window to view > View or download a Summary of this PQQ/ITT which includes details of the PQQ/ITT settings, format and questions"><img src="/esop_custom/images/buttons/openNewWindow_button.png" title="(Opens in new window)" alt="(Opens in new window)">
</a>
</div>
</td>'''


soup = BeautifulSoup(html, features="lxml")
a = soup.find('a')
onclick = a.attrs['onclick']
left = onclick.find("'")
right = onclick.find("'",left+1)
print('URL is: {}'.format(onclick[left+1:right]))

输出

URL is: /esop/toolkit/negotiation/rfq/publicRfqSummaryReport.do?rfqId=rfq_229969

【讨论】：

onclick = a.attrs['onclick'] KeyError: 'onclick' 我得到了这个
@PrakharSood 你复制/粘贴我的代码并得到关键错误？
不，我在我的代码中应用了它，其中变量 html 包含页面的 url。
@PrakharSood 所以从检查我的代码开始。我的代码基于您发布的 HTML，并且可以正常工作。一旦你理解了代码的工作原理 - 检查你在代码中使用它的方式有什么问题。
干杯伙伴！我查看了我的代码并发现了问题。感谢您的帮助！

【解决方案2】：

对于get_attribute：

我认为您在没有“onclick”属性的情况下得到了错误的元素。

您应该扩展 css 选择器并确认它只会找到一个元素。

对于 current_url：

您应该先切换到新窗口。尝试使用以下代码：

# window_handles[-1] refer to last window created.
browser.switch_to.window(browser.window_handles[-1])
print(browser.current_url)

【讨论】：