【发布时间】:2021-01-13 12:32:20
【问题描述】:
我正在编写网页抓取代码,但出现上述错误。
import requests
import lxml
import bs4
title = ''
date = ''
text = ''
top = []
link = []
web_link = 'https://timesofindia.indiatimes.com/{}/'
web_link = web_link.format('india')
req = requests.get(web_link)
soup = bs4.BeautifulSoup(req.text, 'lxml')
topi = soup.find('div', {'class':'main-content'})
topi = topi.find_all('span', {'class':'w_tle'})
for i in range(len(topi)):
top = topi[i].find('a').get('href')
link.append('https://timesofindia.indiatimes.com' + top)
for i in range(len(link)):
rq = requests.get(link[i])
sp = bs4.BeautifulSoup(rq.text, 'lxml')
title = sp.find('div', {'class':'_2NFXP'})
title = title.find('h1',{'class':'_23498'})
追溯:
Traceback (most recent call last):
File "C:\Users\xxx\xxx\py\so65702068.py", line 26, in <module>
title=title.find('h1',{'class':'_23498'})
AttributeError: 'NoneType' object has no attribute 'find'
我是网络抓取的新手,我不明白为什么会显示此错误。
【问题讨论】:
-
看起来前面的指令 -
title = sp.find('div', {'class':'_2NFXP'})- 失败了,所以在执行title = title.find('h1',{'class':'_23498'})时,title是None。顺便说一句,请同时发布错误消息/回溯 - 我在这里为你做了。
标签: python python-3.x web-scraping