如何正确合并 2 个 JSON 文件，包括使用 jq 的数组？答案

【问题标题】：How to correctly merge 2 JSON files including arrays using jq?如何正确合并 2 个 JSON 文件，包括使用 jq 的数组？
【发布时间】：2022-06-11 20:37:14
【问题描述】：

我正在使用 jq 尝试将 2 个 json 文件合并为一个唯一文件。

结果与我想要的很接近，但并不完全正确。

文件 1：

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ]
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ]
    }
  ]
}

文件 2：

{
  "series": "Harry Potter Movie Series",
  "producer": "David Heyman",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2002"
    }
  ]
}

预期结果：

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ]
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2001",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ]
    }
  ],
  "producer": "David Heyman"
}

目前为止我得到的最好结果（仅缺少带有演员的数组）：

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "year": "2002"
    }
  ],
  "producer": "David Heyman"
}

使用以下命令之一：

jq -s '.[0] * .[1]' file1 file2

jq --slurp 'add' file1 file2

jq '. * input' file1 file2

如果我切换文件的顺序，我最终会丢失文件 1 中的“演员”或文件 2 中的“年份”。

它应该如何工作：

文件 2 中的元素将处于领先地位，并应替换文件 1 中的匹配元素。
不应删除文件 1 中不存在于文件 2 中的元素（如 writer 和 movies[].actors 元素）
将添加文件 2 中尚不存在于文件 1 中的元素（如 producer 和 movies[].year）。
标题是唯一的，默认情况下不应出现多次，但如果确实删除重复项。

我认为有一种解决方案可以让这些电影数组与 jq 完美合并。

【问题讨论】：

所有三个命令都会产生您预期的结果。也许您颠倒了文件的顺序（顺序很重要）。如果对象中的字段顺序困扰您，请尝试添加 ` | {系列，作家，电影，制片人}`到您喜欢的任何命令（尽管比较而言，没有对象中的字段顺序之类的东西）。您可能还想看看this 问题。
请注意，三个命令中的第一个和最后一个使用*（不是+）进行深度合并，而中间一个使用add，它使用@987654332 遍历数组@，因此它只是一个顶级合并。使用 * 遍历 slurped 文件将是 jq --slurp 'reduce .[] as $i ({}; . * $i)' file1 file2（仅对两个以上或可变数量的文件有用，否则 .[0] * .[1] 也一样好）。
感谢您与我们联系 - 我在 file2 中添加了“年份”以更具体地指出问题。如果我切换文件顺序，我要么从 file1 中丢失“actors”，要么从 file2 中丢失“year”。
您肯定会这样做，因为后者会覆盖前者。如果您希望合并数组（而不是对象），请描述您为此类操作设想的机制。是否应该添加元素（给你两次标题），是否应该删除重复项（如果一个文件已经包含重复项怎么办），‌...？
文件 2 中的值将是前导的（除了 writer 和 movies[].actors 元素）。文件 2 中的所有电影元素应替换文件 1 中的匹配元素。如果文件 1 中尚不存在“年份”，则应添加它。标题应该是唯一的，并且可能不会出现多次，但如果确实出现重复，则应删除。

标签： jq

【解决方案1】：

您正在寻找一种“合并”对象和数组的解决方案。对于前者，您已经找到了用于顶级合并的+（或add）和用于递归合并的*，但是合并数组（即两个.movies 字段）需要更多的规范因为没有规范的解决方案。

在comment你声明

.movies[0] 始终对应两个文件中的同一部电影

这使您可以使用transpose 来对齐两个数组中的项目，然后对每对对应的项目应用对象合并。这是一种使用add 合并数组项以及其他顶级字段的方法：

jq -s 'add + {movies: map(.movies) | transpose | map(add)}' file1 file2

{
  "series": "Harry Potter Movie Series",
  "writer": "J.K. Rowling",
  "movies": [
    {
      "title": "Harry Potter and the Philosopher's Stone",
      "actors": [
        {
          "names": [
            "Emma Watson",
            "Other actor"
          ],
          "other": "Some value"
        }
      ],
      "year": "2001"
    },
    {
      "title": "Harry Potter and the Chamber of Secrets",
      "actors": [
        {
          "names": [
            "Emma Watson"
          ],
          "other": "Some value"
        }
      ],
      "year": "2002"
    }
  ],
  "producer": "David Heyman"
}

Demo

【讨论】：