如何在一个查询中检索树中每个节点的前 n 个子节点答案

【问题标题】：How to retrieve top n children of each node in a tree in one query如何在一个查询中检索树中每个节点的前 n 个子节点
【发布时间】：2018-07-15 22:07:29
【问题描述】：

我最近正在根据一项特定要求评估图形数据库或任何其他数据库：

在一个查询中通过节点的直接子节点及其所有直接和间接子节点的聚合属性检索每个节点的前 n 个子节点的能力。结果应返回正确的层次结构。

例子

Each node has a property of how many direct children it has. And the tree has no more than 8 levels。假设我想通过每个级别的所有节点运行整个树的查询，其前 2 个子节点具有最多的直接和间接子节点。它会给我们以下信息：

我想知道是否有任何图形数据库或任何其他有效支持此类查询的数据库，如果有，如何？

【问题讨论】：

标签： database neo4j orientdb arangodb

【解决方案1】：

使用 Neo4j

您可以使用 Neo4j 执行此操作，但您需要确保使用 APOC Procedures 插件来访问某些地图和收集功能和过程。

首先要注意的一点。当子节点的后代节点计数之间存在联系时，您没有定义在子节点之间进行选择时要使用的任何标准。因此，以下结果可能与您的结果不完全匹配，因为可能已选择备用节点（具有绑定计数）。如果您确实需要额外的订购和选择标准，则必须将其添加到您的描述中，以便我可以相应地修改查询。

创建测试图

首先，让我们创建测试数据集。我们可以通过 Neo4j 浏览器做到这一点。

首先让我们设置创建图表所需的参数：

:param data => [{id:11, children:[111, 112, 113]}, {id:12, children:[121, 122, 123]}, {id:13, children:[131,132,133,134]}, {id:14, children:[]}, {id:111, children:[1111]}, {id:112, children:[1121, 1122, 1123]}, {id:122, children:[1221,1222,1223]}]

现在我们可以使用这个查询来使用这些参数来创建图表：

UNWIND $data as row
MERGE (n:Node{id:row.id})
FOREACH (x in row.children |
 MERGE (c:Node{id:x})
 MERGE (n)-[:CHILD]->(c))

我们正在使用 :Node 类型的节点，这些节点通过 :CHILD 关系相互连接，向叶节点传出。

让我们在顶层添加一个 :Root:Node 以使我们后面的一些查询更容易一些：

MERGE (r:Node:Root{id:0})
WITH r
MATCH (n:Node)
WHERE NOT ()-[:CHILD]->(n)
MERGE (r)-[:CHILD]->(n)

:Root 节点现在已连接到顶部节点（11、12、13、14），我们的测试图已准备就绪。

实际查询

因为你想要的聚合需要一个节点的所有后代的计数，而不仅仅是它的直接子节点，所以我们不能使用一个节点有多少个直接子节点的子节点计数属性。或者更确切地说，我们可以将节点的所有后代的计数相加，但由于这需要我们遍历所有后代，因此更容易获得所有后代的计数并完全避免属性访问。

下面是完整的查询，您应该能够在测试图上运行完整的查询。我将它分成带有换行符和 cmets 的部分，以更好地显示每个部分的作用。

// for each node and its direct children, 
// order by the child's descendant count
MATCH (n:Node)-[:CHILD]->(child)
WITH n, child, size((child)-[:CHILD*]->()) as childDescCount
ORDER BY childDescCount DESC
// now collect the ordered children and take the top 2 per node
WITH n, collect(child)[..2] as topChildren

// from the above, per row, we have a node and a list of its top 2 children.
// we want to gather all of these children into a single list, not nested
// so we collect the lists (to get a list of lists of nodes), then flatten it with APOC
WITH apoc.coll.flatten(collect(topChildren)) as topChildren

// we now have a list of the nodes that can possibly be in our path
// although some will not be in the path, as their parents (or ancestors) are not in the list
// to get the full tree we need to match down from the :Root node and ensure
// that for each path, the only nodes in the path are the :Root node or one of the topChildren
MATCH path=(:Root)-[:CHILD*]->()
WHERE all(node in nodes(path) WHERE node:Root OR node in topChildren)

RETURN path

没有 cmets，这只是一个 8 行查询。

现在，这实际上返回了多条路径，每行一条路径，如果您查看图形结果，所有路径的整体都会创建您所追求的可视化树。

在 JSON 中以树的形式获取结果

但是，如果您不使用可视化工具以图形方式查看结果，您可能需要树的 JSON 表示。我们可以通过收集所有结果路径并使用来自 APOC 的过程来生成 JSON 树结构来获得它。以下是对这些更改稍作修改的查询：

MATCH (n:Node)-[:CHILD]->(child)
WITH n, child, size((child)-[:CHILD*]->()) as childDescCount
ORDER BY childDescCount DESC
WITH n, collect(child)[..2] as topChildren
WITH apoc.coll.flatten(collect(topChildren)) as topChildren
MATCH path=(:Root)-[:CHILD*]->()
WHERE all(node in nodes(path) WHERE node:Root OR node in topChildren)
// below is the new stuff to get the JSON tree
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) YIELD value as map
RETURN map

结果会是这样的：

{
  "_type": "Node:Root",
  "_id": 52,
  "id": 0,
  "child": [
    {
      "_type": "Node",
      "_id": 1,
      "id": 12,
      "child": [
        {
          "_type": "Node",
          "_id": 6,
          "id": 122,
          "child": [
            {
              "_type": "Node",
              "_id": 32,
              "id": 1223
            },
            {
              "_type": "Node",
              "_id": 31,
              "id": 1222
            }
          ]
        },
        {
          "_type": "Node",
          "_id": 21,
          "id": 123
        }
      ]
    },
    {
      "_type": "Node",
      "_id": 0,
      "id": 11,
      "child": [
        {
          "_type": "Node",
          "_id": 4,
          "id": 111,
          "child": [
            {
              "_type": "Node",
              "_id": 26,
              "id": 1111
            }
          ]
        },
        {
          "_type": "Node",
          "_id": 5,
          "id": 112,
          "child": [
            {
              "_type": "Node",
              "_id": 27,
              "id": 1121
            },
            {
              "_type": "Node",
              "_id": 29,
              "id": 1123
            }
          ]
        }
      ]
    }
  ]
}

【讨论】：