在 Windows 服务器上使用 scrapy 阻塞线程答案

【问题标题】：have thread blocking with scrapy on windows server在 Windows 服务器上使用 scrapy 阻塞线程
【发布时间】：2014-03-03 22:13:31
【问题描述】：

在 windows 服务器上运行以下代码时出现错误

scrapy shell "http://www.yahoo.com"

但我对不重定向到 https 的网站没有问题我认为问题在于线程阻塞。有人可以帮我吗

这是错误信息

C:\Documents and Settings\mahyar>scrapy shell "http://www.yahoo.com"
2014-03-03 15:49:38-0600 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapybot)
2014-03-03 15:49:38-0600 [scrapy] INFO: Optional features available: ssl, http11
2014-03-03 15:49:38-0600 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL
': 0}
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled extensions: TelnetConsole, Close
Spider, WebService, CoreStats, SpiderState
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled downloader middlewares: HttpAuth
Middleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, Def
aultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, Redirec
tMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled spider middlewares: HttpErrorMid
dleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddlew
are
2014-03-03 15:49:38-0600 [scrapy] INFO: Enabled item pipelines:
2014-03-03 15:49:38-0600 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:602
3
2014-03-03 15:49:38-0600 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-03-03 15:49:38-0600 [default] INFO: Spider opened
2014-03-03 15:49:38-0600 [default] DEBUG: Redirecting (301) to <GET https://www.
yahoo.com/> from <GET http://www.yahoo.com>
Traceback (most recent call last):
  File "c:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "c:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 168, in <module>
    execute()
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 89, in _run_print
_help
    func(*a, **kw)
  File "c:\Python27\lib\site-packages\scrapy\cmdline.py", line 150, in _run_comm
and
    cmd.run(args, opts)
  File "c:\Python27\lib\site-packages\scrapy\commands\shell.py", line 50, in run

    shell.start(url=url, spider=spider)
  File "c:\Python27\lib\site-packages\scrapy\shell.py", line 45, in start
    self.fetch(url, spider)
  File "c:\Python27\lib\site-packages\scrapy\shell.py", line 90, in fetch
    reactor, self._schedule, request, spider)
  File "c:\Python27\lib\site-packages\twisted\internet\threads.py", line 122, in
 blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
OverflowError: integer 2147486719 does not fit '32-bit int'

【问题讨论】：

标签： python web-crawler scrapy twisted

【解决方案1】：

看起来您运行的是 32 位版本的 Windows，而 Scrapy 需要 64 位操作系统。

【讨论】：

你能提供一个这样说的参考吗？

【解决方案2】：

这是由 pyOpenSSL 错误引起的：https://github.com/pyca/cryptography/issues/773

【讨论】：