【问题标题】:How do I get my returned values into a specific order when using Scrapy?使用 Scrapy 时,如何将返回值按特定顺序排列?
【发布时间】:2019-07-12 16:02:30
【问题描述】:

我有一个可以正常工作的刮刀,可以轻松地将其放入 CSV 文件,但它总是以奇怪的顺序返回值。

我检查以确保 items.py 字段的顺序正确,并尝试在蜘蛛中移动字段,但我无法弄清楚为什么它会以一种奇怪的方式产生它们。

import scrapy
from scrapy.spiders import CrawlSpider
from scrapy import Selector
from scrapy.loader import ItemLoader
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from sofifa_scraper.items import Player


class FifaInfoScraper(scrapy.Spider):
    name = "player2_scraper"
    start_urls = ["https://www.futhead.com/19/players/?level=all_nif&bin_platform=ps"]


    def parse(self,response):
        for href in response.css("li.list-group-item > div.content > a::attr(href)"):
            yield response.follow(href, callback = self.parse_name)



    def parse_name(self,response):
        item = Player()

        item['name'] = response.css("div[itemprop = 'child'] > span[itemprop = 'title']::text").get() #Get player name

        club_league_nation = response.css("div.col-xs-5 > a::text").getall()    #club, league, nation are all stored under same selectors, so pull them all at once

        item['club'],item['league'],item['nation'] = club_league_nation         #split the selected info from club_league_nation into 3 seperate categories
        yield item

我希望爬虫在第一列中返回玩家姓名,并且不太关心之后的顺序。不过,玩家名称总是在另一列中结束,并且当我只提取名称和另一个值时也会发生这种情况。

【问题讨论】:

    标签: python scrapy


    【解决方案1】:

    只需在您的settings.py (documentation) 中添加FEED_EXPORT_FIELDS

    FEED_EXPORT_FIELDS = ["name", "club", "league", "nation"]
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-17
    • 1970-01-01
    • 2012-02-01
    • 2022-09-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多