【问题标题】:How to add commas in between JSON objects present in a .txt file and then convert it into JSON array in Python如何在 .txt 文件中的 JSON 对象之间添加逗号,然后在 Python 中将其转换为 JSON 数组
【发布时间】:2019-08-28 12:29:52
【问题描述】:

我正在读取一个包含 JSON 对象的 txt 文件,其中的对象没有用逗号分隔。我想在 json 对象之间添加逗号,并将它们全部放入 JSON 列表或数组中。

我尝试过 JSON.loads,但收到 JSON 解码错误。所以我意识到我应该在 .txt 文件中存在的不同对象之间添加逗号

以下是.txt中文件内容的示例

{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
}{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}

''''''''''''''''''''''''''''''''''

预期结果:

''''''''''''''''''''''''''''''''''

[
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
},
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}
]

''''''''''''''''''''

【问题讨论】:

标签: python json python-3.x


【解决方案1】:

您可以使用 reqexp 在对象之间添加逗号:

import re

with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
    output.write("[\n")
    for line in input:
        line = re.sub('}{', '},{', line)
        output.write('    '+line)
    output.write("]\n")

【讨论】:

  • 使用正则表达式改变分层/嵌套结构是 very, very bad idea - 您可以通过这种方式无意中更改更深的嵌套结构或值(即 "ftext": "I contain {this}{that}" 将获得一个额外的逗号)。此外,如果你想做简单的字符串替换,正则表达式是一种矫枉过正——str.replace() 可以很好地完成这项工作。
【解决方案2】:

如果您始终可以保证您的 JSON 将按照您的示例进行格式化,即新的 JSON 对象从最后一个结束的同一行开始并且没有缩进,您只需将 JSON 读入一个缓冲区,直到遇到这样的行,然后发送缓冲区以进行 JSON 解析 - 冲洗并重复:

import json

parsed = []  # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
    buffer = ''
    for line in f:
        if line[0] == '}':  # end of the current JSON object
            parsed.append(json.loads(buffer + '}'))
            buffer = line[1:]
        else:
            buffer += line

print(json.dumps(parsed, indent=2))  # just to make sure it all went well

这会产生:

[
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
      "ftail": "\n",
      "ftext": "Sanjeev Saxena"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "607-619"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1996"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "33"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "number": {
      "ftail": "\n",
      "ftext": "7"
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
  },
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
      "ftail": "\n",
      "ftext": "Hans-Ulrich Simon"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "227-248"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1983"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "20"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
  }
]

如果您的情况不是很明确(即您无法预测格式),您可以尝试一些基于迭代/事件的 JSON 解析器(例如ijson),它们能够告诉您一次'root' 对象已关闭,因此您可以将已解析的 JSON 对象“拆分”成一个序列。

更新:再想一想,除了内置的 json 模块之外,您不需要任何东西,即使您的连接 JSON 没有正确或缩进 - 您可以使用 @ 987654322@(及其未记录的第二个参数)遍历您的数据并以迭代方式查找有效的 JSON 结构,直到您遍历整个文件(或遇到错误)。例如:

import json

parser = json.JSONDecoder()
parsed = []  # a list to hold individually parsed JSON structures
with open('test.json') as f:
    data = f.read()
head = 0  # hold the current position as we parse
while True:
    head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
    try:
        struct, head = parser.raw_decode(data, head)
        parsed.append(struct)
    except (ValueError, json.JSONDecodeError):  # no more valid JSON structures
        break

print(json.dumps(parsed, indent=2))  # make sure it all went well

应该给您与上述相同的结果,但这次将不依赖于 } 在您的 JSON 对象“关闭”时作为新行的第一个字符。它也应该适用于背靠背堆叠的 JSON 数组。

【讨论】:

  • 非常感谢您的帮助。但是我收到的 JSON 格式不像我提供的那样好。如何使用 ijson 根据根节点拆分 json ?
猜你喜欢
  • 2019-01-26
  • 1970-01-01
  • 1970-01-01
  • 2021-06-12
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-12-12
  • 1970-01-01
相关资源
最近更新 更多