如何使用python 3从“a”中的href中获取带有类名的链接答案

【问题标题】：How can I get the link from href in "a" with class name by using python 3如何使用python 3从“a”中的href中获取带有类名的链接
【发布时间】：2018-09-06 11:15:28
【问题描述】：

我试图从谷歌地图中获取元素所在的链接：

<div class="something1">
  <span class="something2"></span>
  <a data-track-id="Google Map" href="https://www.google.com/maps/dir//11111/@22222" target="_blank" class="something3">Google Map</a>
</div>

我只想得到https://www.google.com/maps/dir//11111/@22222

我的代码是

 gpslocation = []
 for gps in (secondpage_parser.find("a", {"data-track-id":"Google Map"})):
     gpslocation.append(gps.attrs["href"])

我正在使用 2 个 url 页面（主页和第二页）来抓取位于第二页中的博客网站。故事标题或作者姓名等其他信息以文本形式显示，因此我可以使用 get_text()。

但是在这种情况下，我无法获得href 之后的链接。请帮忙。

附言。如果我只想要链接中的纬度和经度（11111 和 22222），有没有办法使用str.rplit？

非常感谢

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup jupyter-notebook

【解决方案1】：

您可以使用以下内容：

secondpage_parser.find("a", {"data-track-id":"Google Map"})['href']

【讨论】：

谢谢！但是我运行后没有链接。我会走正确的路吗？我输入的代码：gpslocation = [] for gps in (secondpage_parser.find("a", {"data-track-id":"Google Map"})['href']): gpslocation.append(gps.attrs ["href"])

【解决方案2】：

使用soup.find(...)['href'] 查找所有带有href 的链接或soup.find_all('a' ... , href=True) 查找所有链接
是的，你可以使用 split 来获取 lat 和 long
- 在// 上首次拆分并获取最新的[-1]
- 然后在/@ 上拆分以获得纬度和经度

from bs4 import BeautifulSoup

data = """
<div class="something1">
  <span class="something2"></span>
  <a data-track-id="Google Map" href="https://www.google.com/maps/dir//11111/@22222" target="_blank" class="something3">Google Map</a>
</div>
"""

soup = BeautifulSoup(data, "html.parser")
for gps in soup.find_all('a', href=True):
    href = gps['href']
    print(href)
    lati, longi = href.split("//")[-1].split('/@')
    print(lati)
    print(longi)

【讨论】：

谢谢！但事实证明，该 url 链接来自另一个类。我应该如何指定我想要得到它？或者有什么我需要给你更多的细节吗？我在 secondpage_parser.find_all('a',{"data-track-id":"Google Map"}, href=True): href = gps['href'] print(href ) 2) 对于 secondpage_parser.find_all('a', href=True) 中的 gps： href = gps['href'] print(href) 但两者都不起作用