【发布时间】:2011-05-18 04:05:37
【问题描述】:
我正在使用 python urllib2 从网上下载页面。我没有使用任何类型的 user_agent 等。我遇到了以下示例错误。谁能告诉我一个避免它们的简单方法。
http://www.rottentomatoes.com/m/foxy_brown/
The server couldn't fulfill the request.
Error code: 403
http://www.spiritus-temporis.com/marc-platt-dancer-/
The server couldn't fulfill the request.
Error code: 503
http://www.golf-equipment-guide.com/news/Mark-Nichols-(golfer).html!!
The server couldn't fulfill the request.
Error code: 500
http://www.ehx.com/blog/mike-matthews-in-fuzz-documentary!!
We failed to reach a server.
Reason: timed out
IncompleteRead(5621 bytes read)
Traceback (most recent call last):
File "download.py", line 43, in <module>
localFile.write(response.read())
File "/usr/lib/python2.6/socket.py", line 327, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.6/httplib.py", line 517, in read
return self._read_chunked(amt)
File "/usr/lib/python2.6/httplib.py", line 563, in _read_chunked
raise IncompleteRead(value)
IncompleteRead: IncompleteRead(5621 bytes read)
谢谢
巴拉
【问题讨论】:
标签: python http web-crawler web-scraping