如何从这个标签中获取href链接？答案

【问题标题】：How to get href link from in this a tag?如何从这个标签中获取href链接？
【发布时间】：2020-01-25 19:10:35
【问题描述】：

我通过实现从http://quotes.toscrape.com/示例成功获取href链接：

response.css('div.quote > span > a::attr(href)').extract()

它给出了每个 a 标签的 href 内的所有部分链接：

['/author/Albert-Einstein', '/author/J-K-Rowling', '/author/Albert-Einstein', '/author/Jane-Austen', '/author/Marilyn-Monroe', '/author/Albert-Einstein', '/author/Andre-Gide', '/author/Thomas-A-Edison', '/author/Eleanor-Roosevelt', '/author/Steve-Martin']

顺便说一下，在上面的例子中，每个标签都有这种格式：

<a href="/author/Albert-Einstein">(about)</a>

所以我试着为这个网站做同样的事情：http://www.thegoodscentscompany.com/allproc-1.html 这里的问题是标签的样式有点不同：

<a href="#" onclick="openMainWindow('http://www.thegoodscentscompany.com/data/rw1247381.html');return false;">formaldehyde</a>

如您所见，我无法通过使用上述类似方法从 href 获取链接。我想从这个标签中获取链接（http://www.thegoodscentscompany.com/data/rw1247381.html），但我做不到。我怎样才能得到这个链接？

【问题讨论】：

一种可能可行的相当幼稚的方法可能是：response.xpath('//a/@onclick').re(r"openMainWindow\('(.*?)'\)") ?
你有没有尝试过？究竟是什么问题？
@Jon Clements，谢谢。

标签： python python-3.x web-scraping scrapy

【解决方案1】：

试试这个response.css('a::attr(onclick)').re(r"Window\('(.*?)'\)")

【讨论】：