Gremlin 查询最后一个值的组数答案

【问题标题】：Gremlin query for groupcount of last valueGremlin 查询最后一个值的组数
【发布时间】：2023-08-21 16:58:01
【问题描述】：

我有一个包含两组顶点的泰坦图，一组用于用户 ID，另一组用于产品。 UserID 和 Product 顶点之间的边包含购买产品时的“日期”。我正在寻找一个 gremlin 查询，它可以通过最后购买的产品为我提供一组用户 ID

例如，如果有如下数据

UserID,Date,Product
A,2016-01-01,'Razor'
A,2016-01-02,'Toothpaste'
B,2016-02-01,'Toothpaste'
B,2016-02-02,'Razor'
C,2016-01-04,'Toothpaste'

我正在寻找如下输出

Product, Count
'Toothpaste',2
'Razor',1

不胜感激。

【问题讨论】：

标签： titan gremlin

【解决方案1】：

此解决方案适用于您的示例数据：

g.V().hasLabel('Product').as('p').inE('Purchase').order().by('Date', decr).outV().dedup().select ('p').groupCount().by('Name')

这是算法：

从产品做起
遍历购买边
按日期降序排列边缘
遍历用户
对用户进行重复数据删除；由于排序，每个用户只会保留最近的边缘
返回产品
按产品名称分组计数

这是一个 Gremlin 控制台转储，展示了它的运行情况：

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = graph.addVertex(T.label, 'User', 'UserID', 'A')
==>v[0]
gremlin> b = graph.addVertex(T.label, 'User', 'UserID', 'B')
==>v[2]
gremlin> c = graph.addVertex(T.label, 'User', 'UserID', 'C')
==>v[4]
gremlin> r = graph.addVertex(T.label, 'Product', 'Name', 'Razor')
==>v[6]
gremlin> t = graph.addVertex(T.label, 'Product', 'Name', 'Toothpaste')
==>v[8]
gremlin> a.addEdge('Purchase', r, 'Date', new Date(2016, 0, 1))
==>e[10][0-Purchase->6]
gremlin> a.addEdge('Purchase', t, 'Date', new Date(2016, 0, 2))
==>e[11][0-Purchase->8]
gremlin> b.addEdge('Purchase', t, 'Date', new Date(2016, 1, 1))
==>e[12][2-Purchase->8]
gremlin> b.addEdge('Purchase', r, 'Date', new Date(2016, 1, 2))
==>e[13][2-Purchase->6]
gremlin> c.addEdge('Purchase', t, 'Date', new Date(2016, 0, 4))
==>e[14][4-Purchase->8]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:5 edges:5], standard]
gremlin> g.V().hasLabel('Product').as('p').inE('Purchase').order().by('Date', decr).outV().dedup().select('p').groupCount().by('Name')
==>[Toothpaste:2,Razor:1]

【讨论】：

非常感谢。这对我来说在 OLTP 模式下工作得很好，就像你所做的那样。但我在 OLAP 模式下收到错误Global traversals on GraphComputer may not contain mid-traversal barriers: OrderGlobalStep([decr(Date)])，即g = graph.traversal(computer())。我需要在一个非常大的图表上运行类似的东西——如果你有任何指示，那就太好了。我正在运行 titan-1.0.0-hadoop1 版本

【解决方案2】：

以下查询在 OLTP 和 OLAP 中有效，并且不会触及不必要的顶点：

g.V().hasLabel("User").
  local(outE("purchased").order().by("date", decr).limit(1)).inV().
  groupCount().by("name")

此外，当您在 date 上创建以顶点为中心的索引时，Titan 可以完美优化此查询。

【讨论】：

谢谢！这完美地工作。我只需要更改属性名称以与 Jason 的代码保持一致。还要感谢关于以顶点为中心的索引的建议 - 将在我拥有的更大的图表上进行尝试。