Python 3，漂亮的汤，获取下一个标签答案

【问题标题】：Python 3, beautiful soup, get next tagPython 3，漂亮的汤，获取下一个标签
【发布时间】：2013-05-31 15:43:25
【问题描述】：

我有以下 html 部分，它与其他 href 链接重复多次：

<div class="product-list-item  margin-bottom">
<a title="titleexample" href="http://www.urlexample.com/example_1" data-style-id="sp_2866">

现在我想获取此文档中所有直接在 div 标记之后的带有“product-list-item”类的 href 链接。对beautifulsoup 很陌生，我想出的任何方法都没有。

感谢您的想法。

编辑：不一定非要是美丽的汤；当它可以用正则表达式和python html解析器完成时，这也可以。

EDIT2：我尝试了什么（我对 python 还很陌生，所以从高级的角度来看，我所做的可能是完全愚蠢的）：

soup = bs4.BeautifulSoup(htmlsource)
x = soup.find_all("div")
for i in range(len(x)):
    if x[i].get("class") and "product-list-item" in x[i].get("class"):
        print(x[i].get("class"))

这会给我一个所有“产品列表项”的列表，但后来我尝试了类似的东西

print(x[i].get("class").next_element)

因为我认为 next_element 或 next_sibling 应该给我下一个标签，但它只会导致 AttributeError: 'list' object has no attribute 'next_element'。所以我只尝试了第一个列表元素：

print(x[i][0].get("class").next_element)

导致此错误的原因：return self.attrs[key] KeyError: 0。也尝试使用 .find_all("href") 和 .get("href") 但这都会导致相同的错误。

EDIT3：好吧，我似乎找到了解决方法，现在我做到了：

x = soup.find_all("div")

for i in range(len(x)):    
    if x[i].get("class") and "product-list-item" in x[i].get("class"):
        print(x[i].next_element.next_element.get("href"))

这也可以通过使用 find_all 函数的另一个属性来缩短：

x = soup.find_all("div", "product-list-item")
for i in x:
    print(i.next_element.next_element.get("href"))

问候

【问题讨论】：

你能告诉我们你尝试了什么吗？谢谢

标签： python python-3.x beautifulsoup

【解决方案1】：

我想获取本文档中所有直接在 div 标签之后的带有“product-list-item”类的 href 链接

要查找<div> 中的第一个<a href> 元素：

links = []
for div in soup.find_all('div', 'product-list-item'): 
    a = div.find('a', href=True) # find <a> anywhere in <div>
    if a is not None:
       links.append(a['href'])

假设链接在<div> 内。 <div> 中第一个 <a href> 之前的任何元素都将被忽略。

如果你愿意；你可以更严格，例如，只有当它是<div> 中的第一个孩子时才使用链接：

a = div.contents[0] # take the very first child even if it is not a Tag
if a.name == 'a' and a.has_attr('href'):
   links.append(a['href'])

或者如果<a>不在<div>里面：

a = div.find_next('a', href=True) # find <a> that appears after <div>
if a is not None:
   links.append(a['href'])

There are many ways to search and navigate in BeautifulSoup.

如果你用lxml.html搜索，你也可以使用xpath和css表达式，如果你熟悉的话。

【讨论】：