【发布时间】:2015-02-05 12:26:43
【问题描述】:
我遇到了一个问题,在我卸载并重新安装之前,scrapy 工作正常。 由于 0.24 当时在 debian 上不可用,我将 ubuntu 存储库添加到我的 /etc/pat/sources.list.d 并使用 apt-get 安装它,如下所述:http://doc.scrapy.org/en/0.24/topics/ubuntu.html
今天看到它在 debian 上可用时,我 apt-get remove scrapy-0.24(从 ubuntu repo 安装的那个)并做了 apt-get install python-scrapy
现在当我执行scrapy shell www.google.fr 时,输出如下:
2014-12-07 22:08:26+0100 [scrapy] INFO: Scrapy 0.24.2 started (bot: scrapybot)
2014-12-07 22:08:26+0100 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-12-07 22:08:26+0100 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled item pipelines:
2014-12-07 22:08:26+0100 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-12-07 22:08:26+0100 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2014-12-07 22:08:26+0100 [default] INFO: Spider opened
2014-12-07 22:08:26+0100 [default] DEBUG: Retrying <GET file:///home/lotso/www.google.fr> (failed 1 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
2014-12-07 22:08:26+0100 [default] DEBUG: Retrying <GET file:///home/lotso/www.google.fr> (failed 2 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
2014-12-07 22:08:26+0100 [default] DEBUG: Gave up retrying <GET file:///home/lotso/www.google.fr> (failed 3 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
Traceback (most recent call last):
File "/usr/bin/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/commands/shell.py", line 50, in run
shell.start(url=url, spider=spider)
File "/usr/lib/python2.7/dist-packages/scrapy/shell.py", line 45, in start
self.fetch(url, spider)
File "/usr/lib/python2.7/dist-packages/scrapy/shell.py", line 90, in fetch
reactor, self._schedule, request, spider)
File "/usr/lib/python2.7/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
IOError: [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
如你所想➜ ~ pwd
/home/lotso
如果我更改目录,它会附加我所在的目录
我尝试使用 purge 卸载 python-scrapy 然后通过 pip 安装它,我遇到了同样的问题
我现在很茫然,我怀疑某个地方的环境变量但我自己没有解决它......
【问题讨论】:
-
您没有将完全限定的 URL 传递给您的脚本。
-
omg....我可以收回我的问题吗?谢谢你让我大开眼界:)
-
这里有一个例子:
scrapy shell 'http://scrapy.org'->doc.scrapy.org/en/latest/topics/…
标签: python linux debian scrapy