【问题标题】:Merge/update json objects from two different json files合并/更新来自两个不同 json 文件的 json 对象
【发布时间】:2021-12-05 11:28:38
【问题描述】:

我有两个 JSON 文件,都有相同数量的对象,每个对象都有一个 ID 键 "DOCN",问题是某些对象有不同的键,例如在 file1 中,对象 "DOCN": "000093019" 有 4 个键,而在 file2 中,同一个对象有 5 个 ..

我正在尝试创建一个包含两个文件中相同对象的新文件(在 file1 和 file2 中找到丢失的对象并将它们添加到对象中)

例子:

文件1:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    },
    {
        "DOCN": "000093085",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]

文件2:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    }
]

我想要达到的目标: 文件3:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]

【问题讨论】:

    标签: python json python-3.x


    【解决方案1】:

    我会这样做 - 使用 pandas、concat 数据帧加载 2 个文件,按DOCN 分组并获取第一条记录(这将采用 none None 值),然后将其转换为列表并删除 None 条目-

    df1 = pd.read_json("my_file1.json")
    df2 = pd.read_json("my_file2.json")
    df = pd.concat([df1, df2])
    grp = df.groupby("DOCN").first().reset_index()
    [{k: v for k, v in record if v} for record in grp.to_dict(orient='records')]
    

    【讨论】:

    • 非常有趣的方法,但是,它在我的情况下不起作用,因为两个文件中的某些键都有值None
    【解决方案2】:

    我会在两个不同的数组中读取它们并将其映射以创建新的。

    // read file1 instead using `fs`
    const arr1 = [
        {
            "DOCN": "000093019",
            "A": "blabla",
            "B": "blabla",
            "C": "blabla"
        },
        {
            "DOCN": "000093085",
            "B": "blabla",
            "C": "blabla",
            "D": "blabla"
        }
    ]
    // read file2 instead
    const arr2 = [
        {
            "DOCN": "000093019",
            "A": "blabla",
            "C": "blabla",
            "D": "blabla",
            "E": "blabla"
        },
        {
            "DOCN": "000093085",
            "A": "blabla",
            "B": "blabla",
            "C": "blabla"
        }
    ]
    
    const arr3 = arr1.map(
        x => {
          const val = arr2.find(y => y.DOCN === x.DOCN)
          x= {
            ...x,
            ...val
          }
          return x
        })
    
    //write arr3 to new file
        ```
    

    【讨论】:

      【解决方案3】:

      嗯,这是对字典的简单操作。我不能说这对于大型数据集表现最好。但是您可以根据键“DOCN”合并字典。 (可能有更好的方法!;-))

      f1 = [
          {
              "DOCN": "000093019",
              "A": "blabla",
              "B": "blabla",
              "C": "blabla"
          },
          {
              "DOCN": "000093085",
              "B": "blabla",
              "C": "blabla",
              "D": "blabla"
          }
      ]
      
      f2 = [
          {
              "DOCN": "000093019",
              "A": "blabla",
              "C": "blabla",
              "D": "blabla",
              "E": "blabla"
          },
          {
              "DOCN": "000093085",
              "A": "blabla",
              "B": "blabla",
              "C": "blabla"
          }
      ]
      
      f1 = {item.get("DOCN"): item for item in f1}
      f2 = {item.get("DOCN"): item for item in f2}
      
      keys = set(list(f1.keys())+list(f2.keys()))
      
      output = []
      for key in keys:
          output.append({**f1.get(key), **f2.get(key)})
      
      print(output)
      

      输出是:

      [
          {
              "DOCN": "000093019",
              "A": "blabla",
              "B": "blabla",
              "C": "blabla",
              "D": "blabla",
              "E": "blabla"
          },
          {
              "DOCN": "000093085",
              "B": "blabla",
              "C": "blabla",
              "D": "blabla",
              "A": "blabla"
          }
      ]
      

      【讨论】:

        猜你喜欢
        • 2018-01-14
        • 1970-01-01
        • 2016-03-22
        • 2021-09-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-12-17
        相关资源
        最近更新 更多