SQL 统计推荐系统的好恶，协同过滤 User-Based答案

【问题标题】：SQL counting likes-dislikes for recommendation system, collaborative filtering User-BasedSQL 统计推荐系统的好恶，协同过滤 User-Based
【发布时间】：2019-05-06 16:47:57
【问题描述】：

这个想法是用户对不同的项目留下喜欢-不喜欢，我需要获取与所选用户（USER_ID = 1），确定它们的相似性。

RATING Column:
1 = like,
0 = dislike

全表：

+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING |                      -EXAMPLE-                   |
+---------+---------+--------+--------------------------------------------------+
|       1 |       1 |      1 |-+
|       1 |       2 |      1 | |
|       1 |       3 |      1 | +-[1,1,1,0,0] user_1 vector of ratings
|       1 |       4 |      0 | |  |     | | 
|       1 |       5 |      0 |-+  |     | |     
|       3 |       1 |      1 |----+     + + total_match with user_1 = 3 [1,0,0]
|       3 |       2 |      0 |          | |        
|       3 |       3 |      0 |          | |       
|       3 |       4 |      0 |----------+ |
|       3 |       5 |      0 |------------+
|       4 |       1 |      1 |
|       4 |       2 |      1 |
|       4 |       3 |      1 |
|       4 |       4 |      0 |
|       4 |       5 |      0 |
+---------+---------+--------+

匹配计算：

user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3

如何进行 SQL 查询以获得以下结果：

+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
|       3 |           1 |              2 |           3 |
|       4 |           3 |              2 |           5 |
+---------+-------------+----------------+-------------+

有什么想法吗？

【问题讨论】：

您熟悉自联接的概念吗？

标签： sql vector recommendation-engine collaborative-filtering

【解决方案1】：

（这使用 sqlite，但它应该不需要太多，如果有什么可以在其他数据库上工作）：

给定下表：

CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                   , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);

这个查询：

SELECT r1.user_id AS user_id
     , sum(r1.rating) AS likes_match
     , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
     , count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
                  AND r1.item_id = r2.item_id
                  AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;

产品：

user_id     likes_match  dislikes_match  total_match
----------  -----------  --------------  -----------
3           1            2               3          
4           3            2               5

【讨论】：

【解决方案2】：

你可能需要多个子查询来达到想要的结果，请看下面的代码：

select  res1.user_id,
        sum(res1.likes_match1) as likes_match,
        sum(res1.dislikes_match1) as dislikes_match,
        sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
  from(
select res.user_id, 
case 
     when res.rating=1 then count(res.rating)
     else 0
 end as likes_match1,
case 
     when res.rating=0 then count(res.rating) 
     else 0
 end as dislikes_match1
 from
(
select b.user_id as user_id, 
case
       when a.rating=1 and b.rating=1 then 1
       else 0
  end as rating
from have a 
inner join have b
   on a.item_id=b.item_id 
  and a.user_id=1 
  and b.user_id <>1
  and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;

【讨论】：