【问题标题】:Python CSV to JSON parser add quotes to outputPython CSV 到 JSON 解析器在输出中添加引号
【发布时间】:2012-12-11 08:31:30
【问题描述】:

感谢用户 Petri,我有一个 CSV 到 JSON Python 脚本,让我可以将 Geonames CSV 转储转换为 MongoImport 友好的 JSON。

问题在于 Geonames 有一个名为 alternatenames 的字段,该字段当前被引用并视为一个长字符串。因此无法在 MongoDB 中正确查询。我想将该字段更改为字符串数组,例如:"alternatenames":["name1", "name2"]

Python 脚本如下所示:

import csv, simplejson, decimal, codecs

data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')

with codecs.open("cities.json", "w", encoding="utf-8") as out:
   for r in reader:
      for k, v in r.items():
         # make sure nulls are generated
         if not v:
            r[k] = None
         # parse and generate decimal arrays
         elif k == "loc":
            r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
         # generate a number
         elif k == "geonameid":
            r[k] = int(v)
      out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")

我的 CSV 包含以下字段:

"geonameid","name","asciiname","alternatenames","loc","feature_class","feature_code","country_code","cc2","admin1_code","admin2_code","admin3_code","admin4_code"
3,"Zamīn Sūkhteh","Zamin Sukhteh","Zamin Sukhteh,Zamīn Sūkhteh","[48.91667,32.48333]","P","PPL","IR",,"15",,,
5,"Yekāhī","Yekahi","Yekahi,Yekāhī","[48.9,32.5]","P","PPL","IR",,"15",,,
7,"Tarvīḩ ‘Adāī","Tarvih `Adai","Tarvih `Adai,Tarvīḩ ‘Adāī","[48.2,32.1]","P","PPL","IR",,"15",,,

我当前的 JSON 输出如下所示:

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}

我想更改JSON输出以添加一个字符串数组如下(向右滚动到alternatenames):

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}

另外,我是否应该将 Access 2010 导出的 CSV 中的 quotechar 更改为 ^ 而不是 " 以避免双引号?

感谢您的帮助。

【问题讨论】:

    标签: python json mongodb csv geonames


    【解决方案1】:

    在现有的基础上添加另一个“elif”来处理“alternatenames”:

         elif k == "alternatenames":
            r[k] = [name.strip() for name in v.split(",")]
    

    所以首先用逗号分割字符串,然后去掉开头/结尾的空格。

    【讨论】:

      【解决方案2】:

      我不认为您的 quotechar 是这里的问题。您必须手动指定您希望将该字段转换为字符串列表。

      警告:后面是未经测试的代码

      elif k == "alternatenames":
          r[k] = unicode.split(v, ',')
      

      我假设 v 是基于字符的 unicode,但是如果它是 ascii,请调整。

      【讨论】:

        【解决方案3】:

        尝试包括以下内容:

        elif k == "alternatenames":
           r[k] = [v.split(",")]
        

        【讨论】:

          猜你喜欢
          • 2019-10-02
          • 1970-01-01
          • 2013-06-09
          • 1970-01-01
          • 2019-05-18
          • 2021-10-11
          • 2011-12-12
          • 2015-11-15
          • 2019-08-23
          相关资源
          最近更新 更多