Python：Scrapy CSV 导出不正确？答案

【问题标题】：Python: Scrapy CSV exports incorrectly?Python：Scrapy CSV 导出不正确？
【发布时间】：2015-04-22 20:29:20
【问题描述】：

我只是想写入 csv。但是我有两个单独的 for 语句，因此每个 for 语句的数据独立导出并破坏顺序。有什么建议吗？

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

如下图所示，第二个 for 语句打印在其他语句下方，而不是像标题那样“在”其他语句中打印。

附上CSV图片：

完整文件的 github 链接：https://github.com/nchlswtsn/scrapy/blob/master/items.csv

【问题讨论】：

你能上传 CSV 文件吗？
@boussouira 是的！更新

标签： python csv export scrapy

【解决方案1】：

您导出元素的顺序与您在 CSV 文件中找到的内容是合乎逻辑的，首先导出所有标题，然后导出所有潜文本元素。
我猜你是想废掉 HN 文章，这是我的建议：

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

我没有测试它，但它会给你一个想法。

【讨论】：

【解决方案2】：

Python 2.7 的 CSV 模块不支持 Unicode，因此建议使用 unicodecsv。

$pip install unicodecsv

unicodecsv 是 Python 2 的 csv 模块的直接替代品，它支持 unicode 字符串，没有任何麻烦。

然后用这个代替import csv

import unicodecsv as csv

【讨论】：

我通常在启动时通过参数导出到 CSV（scrapy crawl myspider -o items.csv）。我可以以某种方式使用 unicodecsv 来适应吗？
您必须检查 scrapy 使用的是哪个 CSV 模块。 Seems they are using regular csv module. 转到 python 3，这个问题就消失了。
@Charles Watson 你知道如何在启动时使用 unicodecsv 吗？我和你有同样的问题。