从随机种子节点遍历图数据库答案

【问题标题】：Traverse graph database from random seed nodes从随机种子节点遍历图数据库
【发布时间】：2019-07-28 23:13:31
【问题描述】：

我的任务是为可视化 Neptune Graph 数据库的前端应用程序编写查询。假设第一个顶点是项目，而第二个顶点是用户。用户可以创建项目。有项目到项目的关系来显示从另一个项目派生的项目，就像从原始媒体剪辑中剪下的媒体剪辑的情况一样。创建的第一组项目应该在一个顶点中创建，例如 SERVER，它们在 UI 中被分组。

以下是要求：

    Find (Y) seed nodes that are not connected by any ITEM-ITEM relationships on the graph (relationships via USERs etc... are fine)
    Populate the graph with all relationships from these (Y) seed nodes with no limits on the relationships that are followed (relationships through USERs for example is fine).
    Stop populating the graph once the number of nodes (not records limit) hits the limit specified by (X)

这是图表的直观表示。

https://drive.google.com/file/d/1YNzh4wbzcdC0JeloMgD2C0oS6MYvfI4q/view?usp=sharing

重现此图的示例代码如下。这张图甚至可以变得更深。这只是一个简单的例子。请看图：

g.addV('SERVER').property(id, 'server1')
g.addV('SERVER').property(id, 'server2')
g.addV('ITEM').property(id, 'item1')
g.addV('ITEM').property(id, 'item2')
g.addV('ITEM').property(id, 'item3')
g.addV('ITEM').property(id, 'item4')
g.addV('ITEM').property(id, 'item5')
g.addV('USER').property(id, 'user1')


g.V('item1').addE('STORED IN').to(g.V('server1'))
g.V('item2').addE('STORED IN').to(g.V('server2'))
g.V('item2').addE('RELATED TO').to(g.V('item1'))
g.V('item3').addE('DERIVED FROM').to(g.V('item2') )
g.V('item3').addE('CREATED BY').to(g.V('user1'))
g.V('user1').addE('CREATED').to(g.V('item4'))
g.V('item4').addE('RELATED TO').to(g.V('item5'))

如果可能，结果应为以下形式：

[
 [
   {
     "V1": {},
     "E": {},
     "V2": {}
   }
 ]
]

我们有一个带有端点的 API，它允许开放式 gremlin 查询。我们在客户端应用程序中调用此端点来获取可视化呈现的数据。我写了一个我认为不太正确的查询。另外，我想知道如何过滤遍历的节点数并停在X个节点。

g.V().hasLabel('USER','SERVER').sample(5).aggregate('v1').repeat(__.as('V1').bothE().dedup().as('E').otherV().hasLabel('USER','SERVER').as('V2').aggregate('x').by(select('V1', 'E', 'V2'))).until(out().count().is(0)).as('V1').bothE().dedup().as('E').otherV().hasLabel(without('ITEM')).as('V2').aggregate('x').by(select('V1', 'E', 'V2')).cap('v1','x','v1').coalesce(select('x').unfold(),select('v1').unfold().project('V1'))

如果可能的话，如果我能得到一个单独的查询来获取这个数据集，我将不胜感激。如果结果中的顶点没有连接到任何东西，我想检索它们并在 UI 上呈现它们。

【问题讨论】：

@daniel-kuppitz，你能帮忙解决这个问题吗？我也将非常感谢限制记录数的查询变体。非常感谢。

标签： graph-databases gremlin amazon-neptune

【解决方案1】：

我又看了一遍，想出了这个查询

g.V().hasLabel(without('ITEM')).sample(2).aggregate('v1').
  repeat(__.as('V1').bothE().dedup().as('E').otherV().as('V2').
      aggregate('x').by(select('V1', 'E', 'V2'))).
    until(out().count().is(0)).
  as('V1').bothE().dedup().as('E').otherV().as('V2').
  aggregate('x').
    by(select('V1', 'E', 'V2')).
  cap('v1','x','v1').
  coalesce(select('x').unfold(),select('v1').unfold().project('V1')).limit(5)

为了满足节点计数而不是记录计数（或限制）的标准，我可以通过限制用户传入的数字的一半作为节点计数的输入，然后排除边缘 E 和顶点 V2将在 UI 上呈现的内容的最后一条记录。

我会以更好的方式处理任何建议。

【讨论】：

@daniel-kuppitz，您能否就上述解决方案提出任何意见或建议？非常感谢。