【发布时间】:2014-11-09 14:40:49
【问题描述】:
我想编写一个蜘蛛程序来使用python3中的gevent下载网页。这是我的代码:
import gevent
import gevent.pool
import gevent.monkey
import urllib.request
gevent.monkey.patch_all()
def download(url):
return urllib.request.urlopen(url).read(10)
urls = ['http://www.google.com'] * 100
jobs = [gevent.spawn(download, url) for url in urls]
gevent.joinall(jobs)
但是当我运行它时,出现错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/gevent/greenlet.py", line 340, in run
result = self._run(*self.args, **self.kwargs)
File "e.py", line 8, in download
return urllib.request.urlopen(url).read(10)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
......
return greenlet.switch(self)
gevent.hub.LoopExit: This operation would block forever
<Greenlet at 0x7f4b33d2fdf0: download('http://www.google.com')> failed with LoopExit
......
似乎 urllib.request 阻塞了,所以程序无法运行。如何解决?
【问题讨论】:
-
您应该考虑为此使用
requests(docs.python-requests.org/en/latest)。如果猴子补丁不够好,它会比直接 urllib 痛苦得多,支持 Python 3 并通过grequests(github.com/kennethreitz/grequests) 提供“原生”gevent 支持。
标签: python web-crawler block gevent