【发布时间】:2011-10-27 21:02:02
【问题描述】:
目前,如果在检索网页时出错,soup 将不会填充页面,而是从 beautifulsoup 获取默认返回。
我正在寻找一种方法来检查这一点,这样如果在获取网页时出现错误,我可以跳过一大段代码,例如
if soup:
do stuff
但我不想一起终止。新手查询的请求。
def getwebpage(address):
try:
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(address, None, headers)
web_handle = urllib2.urlopen(req)
except urllib2.HTTPError, e:
error_desc = BaseHTTPServer.BaseHTTPRequestHandler.responses[e.code][0]
appendlog('HTTP Error: ' + str(e.code) + ': ' + address)
return
except urllib2.URLError, e:
appendlog('URL Error: ' + e.reason[1] + ': ' + address)
return
except:
appendlog('Unknown Error: ' + address)
return
return web_handle
def test():
soup = BeautifulSoup(getwebpage('http://doesnotexistblah.com/'))
print soup
if soup:
do stuff
test()
【问题讨论】:
标签: python beautifulsoup