使用 Python 和 BeautifulSoup 来抓取 HTML 标签标识符值 [重复]答案

【问题标题】：Using Python & BeautifulSoup to scrape HTML tag identifier values [duplicate]使用 Python 和 BeautifulSoup 来抓取 HTML 标签标识符值 [重复]
【发布时间】：2023-09-06 03:40:01
【问题描述】：

我还在学习 Python，一直在用 BeautifulSoup 抓取一些网络数据，我的问题是：是否可以抓取标签 ID 值？

也许最好举个例子，我正在使用的 HTML 代码如下所示：

<A CLASS="someClass" uniqueID="someValue" anotherID="someOtherValue">
Here is the data I can scrape right now.
</A>

所以从上面的例子中，我可以成功抓取A标签之间的内容，但是我不知道如何抓取存在于A标签内的“uniqueID”和“anotherID”的值.

感谢您的指点！

【问题讨论】：

请看*.com/questions/19468438/…
使用get() 方法。

标签： python web-scraping beautifulsoup

【解决方案1】：

要获取element 的attributes，可以使用.get() 方法（python3），即：

<A CLASS="someClass" uniqueID="someValue" anotherID="someOtherValue">
Here is the data I can scrape right now.
</A>

...

_as = xmlSoup.find_all('a')

for a in _as :
    print(a.get('CLASS'))
    print(a.get('uniqueID'))
    print(a.get('anotherID'))
    print(a.text))

上面会循环html中的所有a标签，并打印每个标签的指定属性。

【讨论】：

想评论否决票？
您进行了编辑，我正在发表评论。值得注意的是，他没有明确要求 find_all。

【解决方案2】：

请查看我发布的评论中的链接，但我认为您正在尝试做的是这样的事情；

soup.find("a", {"uniqueID": "someValue"})

如果您要发布您的代码示例，我可以对其进行调整，但因为您没有，所以它相当通用。

【讨论】：