【发布时间】:2025-11-26 21:05:01
【问题描述】:
我对 neo4j 和密码查询语言非常陌生。
我的节点/关系数据集基本上如下所示:
- 我的数据库中有大约 27000 个用户节点
- 我的数据库中有大约 8000 个问题节点
- 问题节点可以由用户节点回答,因此基本上存在类似 (user)-[:ANSWERED]->(Question) 的关系
- 一些 Question 节点会为用户触发一个属性,因此存在 (user)-[:HAS_PROPERTY]->(Property) 之类的关系
- 此外,一些问题节点需要一些属性才能得到回答。所以有像 (Question)-[:REQUIRES]->(Property) 这样的关系
现在我的查询是查找特定用户尚未回答的问题,同时考虑问题属性要求,限制为 50 个问题。
折腾了一段时间后,我想出了以下问题:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
上面的查询给了我预期的行并且非常快(大约 150 毫秒),这太棒了。
我不明白的是:
当我用用户变量替换查询中的第二行而不是进行标签查找时,查询变得非常慢。特别是对于回答了很多甚至所有问题的用户。
所以下面的查询要慢很多:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
为什么会这样,因为我真的不明白?实际上,我认为重新使用已经匹配的用户作为第二个可选匹配的基础,查询会更便宜。
在研究 cypher 性能时,我看到很多文章告诉我,如果可能,您应该尽量避免可选匹配。所以我的第一个查询如下所示:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)
WITH q, user
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
同样的问题。上面的查询比第一个要慢很多。大约慢 20-30 倍。
最后我想问一下我是否遗漏了什么,是否有更好的方法来实现我的目标。
任何帮助将不胜感激。
问候,
亚历克斯
编辑
以下是一些分析详细信息:
使用以下查询:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 26979 total db hits in 169 ms.
使用 Michael Hunger 的建议查询:
MATCH (user:User {code: 'abc'})
MATCH (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2622 ms.
所以我当前的查询更快更高效。
我真正不明白的是,为什么我将帖子命名为“奇怪的 neo4j 密码行为”是事实,当我修改我的有点快速查询的第二行时:
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
到:
OPTIONAL MATCH (user)-[a:ANSWERED]->(q)
这对我来说会更简单和合乎逻辑,我得到以下信息:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
WHERE NOT (user)-[:ANSWERED]->(q)
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(:Property)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = 0 or rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 2337573 total db hits in 2391 ms.
所以我得到的数据库命中数与前面提到的慢查询完全一样。
有人对此有解释吗?
此外,当我修改第一行时它没有任何区别
来自:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q:Question)
到:
MATCH (user:User {code: 'xyz'})
MATCH (:ActiveQuestions)-[]->(q:Question)
所以我基本上有两个问题:
与使用
(user:User {code: 'xyz'})相比,为什么重用已定义的用户节点变量(用户)时查询要慢得多-
在我的第二行中,我使用了外连接的准等效项。根据我提出的所有建议,这比使用
MATCH (q) WHERE NOT (q)<-[:ANSWERED]->(user)快得多,我认为后者也在进行外部连接,但似乎并非如此。编辑
经过进一步分析后,我想出了一个更便宜的查询。请参阅下面的分析详细信息:
使用以下密码查询:
MATCH (user:User {code: 'xyz'}), (:ActiveQuestions)-[]->(q)
OPTIONAL MATCH (:User {code: 'xyz'})-[a:ANSWERED]->(q)
WITH q, user
WHERE a IS NULL
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)
WITH q, user, count(r) as rCount
OPTIONAL MATCH (q)-[r:REQUIRES]->(p)<-[h:HAS_PROPERTY]-(user)
WITH q, rCount, count(h) as hCount
WHERE rCount = hCount
RETURN q ORDER BY q.priority DESC LIMIT 50
Cypher version: CYPHER 2.2, planner: COST. 21669 total db hits in 120 ms.
所以我基本上摆脱了示例中的显式节点标签 (:Question) 和 (:Property),这对我来说听起来很合乎逻辑,因为不再需要显式标签扫描。这为我节省了大约 5300 次 DB 点击。
还有什么可以对这个查询进行调整的吗?
【问题讨论】:
标签: performance neo4j cypher