【问题标题】:scrapy install on debian在Debian上安装scrapy
【发布时间】:2015-02-05 12:26:43
【问题描述】:

我遇到了一个问题,在我卸载并重新安装之前,scrapy 工作正常。 由于 0.24 当时在 debian 上不可用,我将 ubuntu 存储库添加到我的 /etc/pat/sources.list.d 并使用 apt-get 安装它,如下所述:http://doc.scrapy.org/en/0.24/topics/ubuntu.html

今天看到它在 debian 上可用时,我 apt-get remove scrapy-0.24(从 ubuntu repo 安装的那个)并做了 apt-get install python-scrapy

现在当我执行scrapy shell www.google.fr 时,输出如下:

2014-12-07 22:08:26+0100 [scrapy] INFO: Scrapy 0.24.2 started (bot: scrapybot)
2014-12-07 22:08:26+0100 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-12-07 22:08:26+0100 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-12-07 22:08:26+0100 [scrapy] INFO: Enabled item pipelines: 
2014-12-07 22:08:26+0100 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-12-07 22:08:26+0100 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2014-12-07 22:08:26+0100 [default] INFO: Spider opened
2014-12-07 22:08:26+0100 [default] DEBUG: Retrying <GET file:///home/lotso/www.google.fr> (failed 1 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
2014-12-07 22:08:26+0100 [default] DEBUG: Retrying <GET file:///home/lotso/www.google.fr> (failed 2 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
2014-12-07 22:08:26+0100 [default] DEBUG: Gave up retrying <GET file:///home/lotso/www.google.fr> (failed 3 times): [Errno 2] No such file or directory: '/home/lotso/www.google.fr'
Traceback (most recent call last):
  File "/usr/bin/scrapy", line 4, in <module>
execute()
  File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
  File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
  File "/usr/lib/python2.7/dist-packages/scrapy/commands/shell.py", line 50, in run
shell.start(url=url, spider=spider)
  File "/usr/lib/python2.7/dist-packages/scrapy/shell.py", line 45, in start
self.fetch(url, spider)
  File "/usr/lib/python2.7/dist-packages/scrapy/shell.py", line 90, in fetch
reactor, self._schedule, request, spider)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
  File "<string>", line 2, in raiseException
IOError: [Errno 2] No such file or directory: '/home/lotso/www.google.fr'

如你所想➜ ~ pwd /home/lotso 如果我更改目录,它会附加我所在的目录 我尝试使用 purge 卸载 python-scrapy 然后通过 pip 安装它,我遇到了同样的问题

我现在很茫然,我怀疑某个地方的环境变量但我自己没有解决它......

【问题讨论】:

  • 您没有将完全限定的 URL 传递给您的脚本。
  • omg....我可以收回我的问题吗?谢谢你让我大开眼界:)
  • 这里有一个例子:scrapy shell 'http://scrapy.org' ->doc.scrapy.org/en/latest/topics/…

标签: python linux debian scrapy


【解决方案1】:

如何在 Debian 8 (x86) 上使用 Python 2.7 安装 Scrapy 1.3

重启你的机器,从 root 运行(或使用 sudo)。

apt-get update 
apt-get upgrade
apt-get install virtualenv

虚拟环境基本信息:https://virtualenv.pypa.io/en/stable/userguide/

virtual ENV
cd ENV
source bin/activate

虚拟环境被激活,...(命令“deactivate”简单地关闭它)

apt-get install gcc
apt-get install python-pip
apt-get install cython
apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
pip install pip --upgrade
pip install scrapy
pip install scrapy --upgrade

这对我有用,我将它应用于空安装。

【讨论】:

  • 在 x64 Debian 8、Pyton 3.4 上测试 刚刚安装了 Scrapy 1.4.0
【解决方案2】:

此方法适用于 scrapy 1.0.3 和 debian 8.2

  1. 安装依赖安装程序
    sudo apt-get install python-twisted python-libxml2 python- libxml2-dbg python-openssl python-simplejson
  2. 从此站点下载scrapy:http://scrapy.org/(例如:选择tarball)
  3. 解压安装
    tar zxf Scrapy-X.X.X.tar.gz; cd Scrapy-X.X.X; sudo python setup.py install

【讨论】: