【发布时间】:2020-10-11 07:39:29
【问题描述】:
我有一个带有 jsonb 列 interests 的人的数据库。在我的应用程序中,用户可以通过提供他们的爱好来搜索人,这些爱好是一组预定义的值。我想为他提供最佳匹配,为此我想将匹配视为兴趣的交叉点/联合。这样一来,最好的结果就不会是在我的数据库中有很多爱好的人。
示例:
数据库记录:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
应用中的用户输入:
["reading", "swimming", "knitting", "cars"]
我的脚本应该输出这个:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
现在我正在使用
SELECT name
FROM people
WHERE interests @>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
但这给了我什至有很多兴趣的记录,没有办法订购它。 有什么方法可以在合理的时间内实现它 - 比如说在数据库中最多 5 秒,大约有 40 万条记录?
编辑: 我添加了另一个例子来澄清我的计算。我的计算需要过滤有很多爱好的人。因此 match 应该计算为 Intersection(input, db_record)/Union(input, db_record)。
示例:
输入 = ["reading"]
数据库记录:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Mary 的匹配将计算为 (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])),即 0.3333
对于 Carl 来说,它是 (LENGTH(["reading"]))/LENGTH([("reading")]),即 1
更新:我设法做到了
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
这是我的小提琴https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
但是速度不够快,如果有任何改进,我将不胜感激。
【问题讨论】:
标签: sql arrays postgresql union jsonb