【问题标题】:scrapy starting a new projectscrapy 开始一个新项目
【发布时间】:2016-10-18 09:39:18
【问题描述】:

我在 Windows 7 系统上安装了 python 2.7.12 版本。我还安装了 pywin32 和 Visual C++。当我输入命令pip --version 时,它不会产生任何输出,光标移动到下一行并闪烁。

但是当我使用命令python -m pip --version 时,会显示 pip 的版本。另外要安装scrapy,我必须使用命令python -m pip install scrapy。 Scrapy 安装成功。

我已经在环境变量中正确设置了路径-C:\Python27;C:\Python27\Scripts;

当我不得不在 scrapy 中开始我的新项目时,我使用了命令 scrapy startproject project_name。光标再次移动到下一行并闪烁。没有结果,甚至没有任何错误消息。

当我一次又一次地尝试时,它在目录中创建了包含相应文件的文件夹。

当我开发代码并尝试通过命令scrapy crawl name 再次运行蜘蛛时,出现了同样的问题 - 没有响应。

现在再次出现同样的问题,我无法创建新项目。

如果有人可以请提出错误的可能原因和解决方案。

成功了 当我使用命令python -m scrapy <command> <arguments? 遵循scrapy 教程时。但是在我运行 crawl 命令之前它很好。当我使用python -m scrapy.cmdline shell 'http://quotes.toscrape.com/page/1/' 命令时,它显示错误

C:\Users\MinorMiracles\Desktop\tutorial>python -m scrapy.cmdline crawl quotes
2016-10-19 10:26:15 [scrapy] INFO: Scrapy 1.2.0 started (bot: tutorial)
2016-10-19 10:26:15 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'ROBOTSTXT_OBEY': True,
 'BOT_NAME': 'tutorial'}
2016-10-19 10:26:16 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-10-19 10:26:17 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-10-19 10:26:17 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-10-19 10:26:17 [scrapy] INFO: Enabled item pipelines:
[]
2016-10-19 10:26:17 [scrapy] INFO: Spider opened
2016-10-19 10:26:17 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 i
tems (at 0 items/min)
2016-10-19 10:26:17 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-10-19 10:26:18 [scrapy] DEBUG: Crawled (404) <GET http://quotes.toscrape.co
m/robots.txt> (referer: None)
2016-10-19 10:26:18 [scrapy] DEBUG: Crawled (200) <GET http://quotes.toscrape.co
m/page/1/> (referer: None)
2016-10-19 10:26:18 [quotes] DEBUG: Saved file quotes-1.html
2016-10-19 10:26:18 [scrapy] DEBUG: Crawled (200) <GET http://quotes.toscrape.co
m/page/2/> (referer: None)
2016-10-19 10:26:19 [quotes] DEBUG: Saved file quotes-2.html
2016-10-19 10:26:19 [scrapy] INFO: Closing spider (finished)
2016-10-19 10:26:19 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 675,
 'downloader/request_count': 3,
 'downloader/request_method_count/GET': 3,
 'downloader/response_bytes': 5974,
 'downloader/response_count': 3,
 'downloader/response_status_count/200': 2,
 'downloader/response_status_count/404': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 10, 19, 4, 56, 19, 56000),
 'log_count/DEBUG': 6,
 'log_count/INFO': 7,
 'response_received_count': 3,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 10, 19, 4, 56, 17, 649000)}
2016-10-19 10:26:19 [scrapy] INFO: Spider closed (finished)

C:\Users\MinorMiracles\Desktop\tutorial>python -m scrapy.cmdline shell 'http://q
uotes.toscrape.com/page/1/'
2016-10-19 11:11:40 [scrapy] INFO: Scrapy 1.2.0 started (bot: tutorial)
2016-10-19 11:11:40 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
torial.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy.dupefilters
.BaseDupeFilter', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'
, 'LOGSTATS_INTERVAL': 0}
2016-10-19 11:11:40 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2016-10-19 11:11:40 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-10-19 11:11:40 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-10-19 11:11:40 [scrapy] INFO: Enabled item pipelines:
[]
2016-10-19 11:11:40 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-10-19 11:11:40 [scrapy] INFO: Spider opened
2016-10-19 11:11:42 [scrapy] DEBUG: Retrying <GET http://'http:/robots.txt> (fai
led 1 times): DNS lookup failed: address "'http:" not found: [Errno 11004] getad
drinfo failed.
2016-10-19 11:11:45 [scrapy] DEBUG: Retrying <GET http://'http:/robots.txt> (fai
led 2 times): DNS lookup failed: address "'http:" not found: [Errno 11004] getad
drinfo failed.
2016-10-19 11:11:47 [scrapy] DEBUG: Gave up retrying <GET http://'http:/robots.t
xt> (failed 3 times): DNS lookup failed: address "'http:" not found: [Errno 1100
4] getaddrinfo failed.
2016-10-19 11:11:47 [scrapy] ERROR: Error downloading <GET http://'http:/robots.
txt>: DNS lookup failed: address "'http:" not found: [Errno 11004] getaddrinfo f
ailed.
DNSLookupError: DNS lookup failed: address "'http:" not found: [Errno 11004] get
addrinfo failed.
2016-10-19 11:11:49 [scrapy] DEBUG: Retrying <GET http://'http://quotes.toscrape
.com/page/1/'> (failed 1 times): DNS lookup failed: address "'http:" not found:
[Errno 11004] getaddrinfo failed.
2016-10-19 11:11:51 [scrapy] DEBUG: Retrying <GET http://'http://quotes.toscrape
.com/page/1/'> (failed 2 times): DNS lookup failed: address "'http:" not found:
[Errno 11004] getaddrinfo failed.
2016-10-19 11:11:54 [scrapy] DEBUG: Gave up retrying <GET http://'http://quotes.
toscrape.com/page/1/'> (failed 3 times): DNS lookup failed: address "'http:" not
 found: [Errno 11004] getaddrinfo failed.
Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 161, in <module>
    execute()
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 88, in _run_print
_help
    func(*a, **kw)
  File "C:\Python27\lib\site-packages\scrapy\cmdline.py", line 149, in _run_comm
and
    cmd.run(args, opts)
  File "C:\Python27\lib\site-packages\scrapy\commands\shell.py", line 71, in run

    shell.start(url=url)
  File "C:\Python27\lib\site-packages\scrapy\shell.py", line 47, in start
    self.fetch(url, spider)
  File "C:\Python27\lib\site-packages\scrapy\shell.py", line 112, in fetch
    reactor, self._schedule, request, spider)
  File "C:\Python27\lib\site-packages\twisted\internet\threads.py", line 122, in
 blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.internet.error.DNSLookupError: DNS lookup failed: address "'http:" not f
ound: [Errno 11004] getaddrinfo failed.

谁能告诉我怎么回事

【问题讨论】:

  • 在这里粘贴一些代码会很有用。例如你的scrapy.cfg 和你的蜘蛛的内容。您可能会像上一个问题一样尝试错误地运行爬虫。没有数据就很难为您提供帮助。
  • 新项目命令本身不起作用。我不能自己创建一个新文件夹,那么只有我可以编写我的蜘蛛。正如我在上一个问题中所说的那样,它昨天以某种方式创造了我面临的问题
  • scrapy version -vscrapy shell 工作吗?你也可以使用替代的python -m scrapy.cmdline &lt;command&gt; &lt;arguments&gt;(例如python -m scrapy.cmdline version -v)调用scrapy命令
  • 但是为什么我每次都应该使用命令python -m scrapy.cmdline而不是简单的命令scrapy

标签: python python-2.7 web-scraping scrapy scrapy-spider


【解决方案1】:

使用替代命令 python -m scrapy.cmdline &lt;command&gt; &lt;arguments&gt;(例如 python -m scrapy.cmdline version -v)有效

谢谢保罗

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-10-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-05-04
    • 2011-02-16
    • 2019-09-17
    相关资源
    最近更新 更多