如果值不存在，则追加 Python JSON答案

【问题标题】：Python JSON append if value doesn't exist如果值不存在，则追加 Python JSON
【发布时间】：2020-07-03 08:57:09
【问题描述】：

我有一个 json 文件，其中包含 30 个左右的“dicts”块，其中每个块都有和 ID，如下所示：

{
      "ID": "23926695",
      "webpage_url": "https://.com",
      "logo_url": null,
      "headline": "aewafs",
      "application_deadline": "2020-03-31T23:59:59",
}

由于我的脚本不止一次以相同的方式从 API 中提取信息，因此我想仅在 JSON 文件中不存在 ID 的情况下将新的“块”附加到 json 文件中。

到目前为止，我有类似的东西：

import os

check_empty = os.stat('pbdb.json').st_size
if check_empty == 0:
    with open('pbdb.json', 'w') as f:
        f.write('[\n]')    # Writes '[' then linebreaks with '\n' and writes ']'
output = json.load(open("pbdb.json"))

for i in jobs:
    output.append({
        'ID': job_id, 
        'Title': jobtitle, 
        'Employer' : company, 
        'Employment type' : emptype, 
        'Fulltime' : tid, 
        'Deadline' : deadline, 
        'Link' : webpage
    })

with open('pbdb.json', 'w') as job_data_file:
    json.dump(output, job_data_file)

但如果 ID 在 Json 文件中不存在，我只想执行“output.append”部分。

【问题讨论】：

这不是有效的 JSON。它是 JSON 行，您必须每次扫描文件以查看 ID 是否存在；这将是 O(N)，因此随着数据集的增长会逐渐变慢。或者您可以尝试在内存中保留一些内容以跟踪看到的 ID？您是否有理由不使用数据库？
实际上，它甚至不是 JSON Lines，因为您将回车符放在列表中。我不知道你想做什么
它将 output.append 中的数据写入一个 json 文件，它看起来很像这个图像：cloud.google.com/bigquery/images/create-schema-array.png 我不知道这是否有效，但它不会抛出任何错误 atm，这对我来说只是一个有趣的项目，而且由于数据集永远不会那么大，我认为 JSON 或 CSV 文件可以很好地工作，我对此很陌生，数据库会让这更容易吗？跨度>
这是有效的 JSON，但您没有显示换行符。是的，数据库会让这更容易
有什么好的、轻量级的数据库可以很好地与 python 配合使用的建议吗？

标签： python json python-3.x loops if-statement

【解决方案1】：

我无法完成您提供的代码，但我添加了一个示例来展示如何实现无重复作业列表（希望它有所帮助）：

# suppose `data` is you input data with duplicate ids included
data = [{'id': 1, 'name': 'john'}, {'id': 1, 'name': 'mary'}, {'id': 2, 'name': 'george'}]

# using dictionary comprehension you can eliminate the duplicates and finally get the results by calling the `values` method on dict.
noduplicate = list({itm['id']:itm for itm in data}.values())

with open('pbdb.json', 'w') as job_data_file:
    json.dump(noduplicate, job_data_file)

【讨论】：

这没有意义，因为它只是覆盖了重复数据
你是对的，但是有一个 id 意味着数据是重复的，从问题来看，似乎每个副本都有一个副本就足够了，覆盖应该无关紧要。
这不是真的。通常，使用 JSON，您无法保证记录的顺序。所以你不知道重复键的哪个值会被保存
这是问题Since my script pulls information in the same way from an API more than once, I would like to append new "blocks" to the json file only if the ID doesn't already exist in the JSON file.的引用，这意味着重复的数据正在进入。
太棒了。而您的答案与此无关。以写入模式打开文件的行为只是擦除其内容

【解决方案2】：

我只是和数据库一起去，谢谢你的时间，我们现在可以关闭这个帖子

【讨论】：