【问题标题】:Scrapy: ERROR: Spider error processingScrapy:错误:蜘蛛错误处理
【发布时间】:2018-08-19 17:05:21
【问题描述】:

我是 python 和 scrapy 的新手。 我尝试运行现有代码,但每个地址都出现此错误:

>     2015-07-02 01:52:19 [scrapy] DEBUG: Crawled (200) <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)2015-07-02
> 01:52:19 
>     [scrapy] ERROR: Spider error processing <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)
> 
    > Traceback (most recent call last):   File
    > "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line
    > 102, in iter_errback
    >     yield next(it)   File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py",
    > line 28, in process_spider_output
    >     for x in result:   File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py",
    > line 22, in <genexpr>
    >     return (_set_referer(r) for r in result or ())   File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py",
    > line 37, in <genexpr>
    >     return (r for r in result or () if _filter(r))   File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py",
    > line 54, in <genexpr>
    >     return (r for r in result or () if _filter(r))   File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line
    > 67, in _parse_response
    >     cb_res = callback(response, **cb_kwargs) or ()   File "/home/talmosko/Documents/scrapy/tripAdvisor/spiders/tripAdvisor.py",
    > line 30, in parse_item
    >      item['state'] =  hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii',
    > errors='ignore')
    > 
    > IndexError: list index out of range

这是我的代码: http://pastebin.com/XzM5DrDD

有什么问题?蜘蛛似乎没有得到答案..

谢谢!

【问题讨论】:

    标签: python scrapy


    【解决方案1】:

    你正在尝试访问一个不存在的元素,错误就在这一行

    item['state'] =  hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')
    

    可能

    item['state'] =  hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()
    

    为空,您正在尝试访问第一个元素。你有两个选择:

    【讨论】:

    • 问题不在于我没有得到任何回应?
    • 你正在编写一个爬虫,在同一个站点中,有些页面可能有一条信息,而另一些则没有。我不会检查tripadvisor中的所有页面是否都有'状态'
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-10-19
    • 1970-01-01
    • 1970-01-01
    • 2010-12-20
    • 1970-01-01
    • 2011-02-20
    相关资源
    最近更新 更多