【问题标题】:scrapy return empty json filescrapy返回空的json文件
【发布时间】:2016-02-06 11:34:08
【问题描述】:

我正在使用 scrapy 从网站中提取数据。 当我打开 json 结果文件时,它总是返回空。 附上我的scrapy代码:

from scrapy import Spider


class StackSpider(Spider):
    name = "stack"
    allowed_domains = ["youtube.com"]
    start_urls = ["https://www.youtube.com/results?search_query=Motorcycle+Accident+Stunt+Rider+Knocks+Himself+Out+Stunt+Fail+2015"]

    def parse(self,response):
        questions = Selector(response).xpath('//a')
        for question in questions:
            item = StackItem()
            item['title'] = question.xpath(
                'a/text()').extract()
            item['url'] = question.xpath('//@href]').extract()
            yield item

【问题讨论】:

    标签: json python-2.7 xpath scrapy


    【解决方案1】:

    我猜你正在抓取节点的文本元素和 href 属性。您只需更改 xpath 即可获得结果。

    试试下面的代码

    item['title'] = question.xpath('./text()').extract()
    item['url'] = question.xpath('./@href]').extract()
    

    这是我在 scrapy shell 中尝试的一些输出

    In [38]: questions = Selector(response).xpath('//a')
    In [39]: for question in questions:
                 print question.xpath('./text()').extract()
    [u'Motorcycle Accident Crash During Wheelie on the Highway Crash 2015']
    [u'STREETFIGHTERZ']
    []
    [u'Motorcycle Crash Compilation 2015 || Ep.#15 of October']
    [u'Car Crash Weekly']
    []
    [u'Motorcycle Accident Burnout On Highway Crash 2015']
    [u'STREETFIGHTERZ']
    []
    [u'Streetfighterz Ride The Murder Biz Ride 2015 Insane Motorcycle Stunts']
    [u'STREETFIGHTERZ']
    In [40]: for question in questions:
                 print question.xpath('./@href').extract()
    [u'/results?filters=movie&lclk=movie&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=show&lclk=show&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=short&lclk=short&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=long&lclk=long&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=4k&lclk=4k&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=hd&lclk=hd&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=cc&lclk=cc&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=creativecommons&lclk=creativecommons&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=3d&lclk=3d&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=live&lclk=live&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=purchased&lclk=purchased&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?filters=spherical&lclk=spherical&search_query=motorcycle+accident+stunt+rider+knocks+himself+out+stunt+fail+2015']
    [u'/results?search_sort=video_date_uploaded&search_query=Motorcycle+Accident+Stunt+Rider+Knocks+Himself+Out+Stunt+Fail+2015']
    

    您已经在 <a> 节点内,所以请使用 ./ 选择其中的元素。

    【讨论】:

      猜你喜欢
      • 2021-06-22
      • 2018-02-16
      • 2017-06-14
      • 2016-11-12
      • 2020-06-03
      • 2021-12-20
      • 2021-06-03
      • 2019-06-26
      • 2021-05-01
      相关资源
      最近更新 更多