无法访问 html 元素中的 href -Beautifulsoup答案

【问题标题】：Cannot reach to the href in a html element -Beautifulsoup无法访问 html 元素中的 href -Beautifulsoup
【发布时间】：2021-04-23 12:16:08
【问题描述】：

我一直在尝试随机化维基百科页面并获取该随机站点的 URL。尽管我可以获取网站上的每个链接，但由于某种原因，我无法访问这段 html 代码并获取 href。
一个随机的维基百科页面示例。

<a accesskey="v" href="https://en.wikipedia.org/wiki/T%C5%99eb%C3%ADvlice?action=edit" class="oo-ui-element-hidden"></a>

所有的维基百科页面都有这个，我需要得到href，这样我才能以一种可以获得当前 URL 的方式操作它。
到目前为止我写的代码：

from bs4 import BeautifulSoup
import requests
links = []
for x in range(0, 1):
    source = requests.get("https://en.wikipedia.org/wiki/Special:Random").text
    soup = BeautifulSoup(source, "lxml")
    print(soup.find(id="firstHeading"))
    for link in soup.findAll('a'):
        links.append(link.get('href'))
    print(links)

直接获取当前 URL 也会有所帮助，但是我在网上找不到解决方案。
我也在使用 Lunix 操作系统——如果有帮助的话——

【问题讨论】：

帮助我们为您提供帮助 - 请显示您的代码并改进您的问题，以便我们重现您的问题。 How to create a Minimal, Reproducible Example谢谢

标签： python python-3.x url web-scraping beautifulsoup

【解决方案1】：

查看属性

您应该使用<a> 的属性来指定您的搜索：

soup.find_all('a', accesskey='e')

示例

import requests
from bs4 import BeautifulSoup
links = []
for x in range(0, 1):
    source = requests.get("https://en.wikipedia.org/wiki/Special:Random").text
    soup = BeautifulSoup(source, "lxml")
    print(soup.find(id="firstHeading"))
    for link in soup.find_all('a', accesskey='e'):
        links.append(link.get('href'))
    print(links)

输出

<h1 class="firstHeading" id="firstHeading" lang="en">James Stack (golfer)</h1>
['/w/index.php?title=James_Stack_(golfer)&action=edit']

以防万一

您不需要第二个循环，如果您只想处理单个 <a> 使用 find() 而不是 find_all()

示例

import requests
from bs4 import BeautifulSoup
links = []

for x in range(0, 5):
    source = requests.get("https://en.wikipedia.org/wiki/Special:Random").text
    soup = BeautifulSoup(source, "lxml")
    links.append(soup.find('a', accesskey='e').get('href'))

links

输出

['/w/index.php?title=Rick_Moffat&action=edit',
 '/w/index.php?title=Mount_Burrows&action=edit',
 '/w/index.php?title=The_Rock_Peter_and_the_Wolf&action=edit',
 '/w/index.php?title=Yamato,_Yamanashi&action=edit',
 '/w/index.php?title=Craig_Henderson&action=edit']

【讨论】：

哦，然后我可以获取标题，然后在它们前面加上 en.wikipedia.org/wiki 以获取他们网站的 URL。谢谢
乐于助人 - 是的，你可以这样做！