怎么解决，每个链接找两个（Beautifulsoup，python）答案

【问题标题】：How to solve, finding two of each link (Beautifulsoup, python)怎么解决，每个链接找两个（Beautifulsoup，python）
【发布时间】：2017-03-21 23:33:13
【问题描述】：

我正在使用 beautifulsoup4 解析网页并使用此代码收集所有 href 值

    #Collect links from 'new' page
pageRequest = requests.get('http://www.supremenewyork.com/shop/all/shirts')
soup = BeautifulSoup(pageRequest.content, "html.parser")
links = soup.select("div.turbolink_scroller a")

allProductInfo = soup.find_all("a", class_="name-link")
print allProductInfo

linksList1 = []
for href in allProductInfo:
    linksList1.append(href.get('href'))

print(linksList1)

linksList1 打印两个链接。我相信这是因为它从标题中获取链接以及项目颜色。我已经尝试了一些事情，但无法让 BS 仅解析标题链接，并列出每个链接中的一个而不是两个。我想它真的很简单，但我想念它。提前致谢

【问题讨论】：

将 linksList1 设为 set() 而不是 list()
非常感谢

标签： python parsing beautifulsoup href

【解决方案1】：

set(linksList1)        # use set() to remove duplicate link
list(set(linksList1))  # use list() convert set to list if you need

【讨论】：

【解决方案2】：

alldiv = soup.findAll("div", {"class":"inner-article"})
for div in alldiv:
    linkList1.append(div.h1.a['href'])

【讨论】：

【解决方案3】：

此代码将为您提供结果而不会得到重复的结果（也使用 set() 作为@Tarum Gupta 可能是一个好主意）但我改变了你爬的方式

import requests
from  bs4 import BeautifulSoup

#Collect links from 'new' page
pageRequest = requests.get('http://www.supremenewyork.com/shop/all/shirts')
soup = BeautifulSoup(pageRequest.content, "html.parser")
links = soup.select("div.turbolink_scroller a")

# Gets all divs with class of inner-article then search for a with name-link class
that is inside an h1 tag
allProductInfo = soup.select("div.inner-article h1 a.name-link")
# print (allProductInfo)

linksList1 = []
for href in allProductInfo:
    linksList1.append(href.get('href'))

print(linksList1)

【讨论】：