【发布时间】:2021-02-05 06:57:42
【问题描述】:
这是我通过不同页面提取供应商信息的代码。我一直在尝试从每个页面上的每个供应商网址中获取公司名称、联系方式等。但联系人返回值“无”。因此添加 .text 方法会产生错误: “None”类型对象没有“text”属性。
我已经检查了我的浏览器兼容性的标题。因为我一直在使用谷歌浏览器(85.0.4183.102)。但是返回的值仍然是“无”。 可能是什么问题?我被困在这里了。
import requests
from bs4 import BeautifulSoup
import pandas as pd
base_url="https://idn.bizdirlib.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'}
supplierlinks=[]
for x in range(1,2): #Loop through all the pages
response=requests.get(f"https://idn.bizdirlib.com/taxonomy/term/946?page={x}",headers=headers)
soup=BeautifulSoup(response.content,'lxml')
supplierslist=soup.find_all("div",class_="views-field views-field-title")
for element in supplierslist: #Looking through each of the items
for link in element.find_all("a",href=True):
supplierlinks.append(base_url+link['href'])
Database=[]
for link in supplierlinks:
r=requests.get(link,headers=headers)
soup=BeautifulSoup(r.content,'lxml')
companyname=soup.find_all("span", attrs={"itemprop": "name"})[-1].get_text()
country=soup.find_all("span", attrs={"itemprop": "location"})[-1].get_text()
address=soup.find_all("span", attrs={"itemprop": "address"})[-1].get_text()
#print(soup.select_one('strong:contains("Contact") + *').text )
contact=soup.find("span",attrs={"itemprop": "contactPoint"})
#print(soup.select_one('strong:contains("Contact") + *').text )
#Prepare a dictionary to store all of it
data= {"Company Name":companyname,
"Country":country,
"Address":address,
"Contact Person":contact
}
Database.append(data)
print(data)
【问题讨论】:
标签: python html css beautifulsoup