【发布时间】:2021-05-10 12:15:38
【问题描述】:
我正在尝试从以下页面获取网站地址列表:https://www.wer-zu-wem.de/dienstleister/filmstudios.html
我的代码:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
links = soup.find_all('a', {'class': 'col-md-4 col-lg-5 col-xl-4 text-center text-lg-right'})
print(links)
import requests
from bs4 import BeautifulSoup
webLinksList = []
result = requests.get(
"https://www.wer-zu-wem.de/dienstleister/filmstudios.html")
src = result.content
soup = BeautifulSoup(src, 'lxml')
website_Links = soup.find_all(
'div', class_='col-md-4 col-lg-5 col-xl-4 text-center text-lg-right')
if website_Links != "":
print("List is empty")
for website_Link in website_Links:
try:
realLink = website_Link.find(
"a", attrs={"class": "btn btn-primary external-link"})
webLinksList.append(featured_challenge.attrs['href'])
except:
continue
for link in webLinksList:
print(link)
"list is empty" 在开头打印,我没有尝试将任何数据添加到列表中。
【问题讨论】:
-
您对该站点的哪些链接感兴趣?
-
你有
if website_Links != "":。因此,如果website_Links强制转换为包含任何内容的字符串,您将得到List is empty。我不认为.find_all返回一个字符串,我相信它返回一个列表。除了空字符串,我认为通常不会强制转换为空字符串,因此website_Links != ""将始终解析为True。 -
你是对的,但作为一个列表它也是空的
标签: python html css web-scraping beautifulsoup