【问题标题】:SQL counting likes-dislikes for recommendation system, collaborative filtering User-BasedSQL 统计推荐系统的好恶,协同过滤 User-Based
【发布时间】:2019-05-06 16:47:57
【问题描述】:

这个想法是用户对不同的项目留下喜欢-不喜欢,我需要获取与所选用户(USER_ID = 1),确定它们的相似性。

RATING Column:
1 = like,
0 = dislike

全表:

+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING |                      -EXAMPLE-                   |
+---------+---------+--------+--------------------------------------------------+
|       1 |       1 |      1 |-+
|       1 |       2 |      1 | |
|       1 |       3 |      1 | +-[1,1,1,0,0] user_1 vector of ratings
|       1 |       4 |      0 | |  |     | | 
|       1 |       5 |      0 |-+  |     | |     
|       3 |       1 |      1 |----+     + + total_match with user_1 = 3 [1,0,0]
|       3 |       2 |      0 |          | |        
|       3 |       3 |      0 |          | |       
|       3 |       4 |      0 |----------+ |
|       3 |       5 |      0 |------------+
|       4 |       1 |      1 |
|       4 |       2 |      1 |
|       4 |       3 |      1 |
|       4 |       4 |      0 |
|       4 |       5 |      0 |
+---------+---------+--------+

匹配计算:

user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3

如何进行 SQL 查询以获得以下结果:

+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
|       3 |           1 |              2 |           3 |
|       4 |           3 |              2 |           5 |
+---------+-------------+----------------+-------------+

有什么想法吗?

【问题讨论】:

  • 您熟悉自联接的概念吗?

标签: sql vector recommendation-engine collaborative-filtering


【解决方案1】:

(这使用 sqlite,但它应该不需要太多,如果有什么可以在其他数据库上工作):

给定下表:

CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                   , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);

这个查询:

SELECT r1.user_id AS user_id
     , sum(r1.rating) AS likes_match
     , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
     , count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
                  AND r1.item_id = r2.item_id
                  AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;

产品:

user_id     likes_match  dislikes_match  total_match
----------  -----------  --------------  -----------
3           1            2               3          
4           3            2               5      

【讨论】:

    【解决方案2】:

    你可能需要多个子查询来达到想要的结果,请看下面的代码:

    select  res1.user_id,
            sum(res1.likes_match1) as likes_match,
            sum(res1.dislikes_match1) as dislikes_match,
            sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
      from(
    select res.user_id, 
    case 
         when res.rating=1 then count(res.rating)
         else 0
     end as likes_match1,
    case 
         when res.rating=0 then count(res.rating) 
         else 0
     end as dislikes_match1
     from
    (
    select b.user_id as user_id, 
    case
           when a.rating=1 and b.rating=1 then 1
           else 0
      end as rating
    from have a 
    inner join have b
       on a.item_id=b.item_id 
      and a.user_id=1 
      and b.user_id <>1
      and a.rating=b.rating
    ) as res
    group by res.user_id, res.rating) as res1
    group by res1.user_id
    ;
    

    【讨论】:

      猜你喜欢
      • 2010-12-03
      • 2016-05-08
      • 2012-02-11
      • 2012-12-19
      • 1970-01-01
      • 2012-07-15
      • 1970-01-01
      • 2014-06-10
      • 2015-12-26
      相关资源
      最近更新 更多