【问题标题】:MySQL inclusion/exclusion of postsMySQL 包含/排除帖子
【发布时间】:2010-10-11 23:24:56
【问题描述】:

这篇文章需要花费大量时间来输入,因为我试图尽可能清楚,所以如果仍然不清楚,请多多包涵。

基本上,我拥有的是数据库中的一个帖子表,用户可以在其中添加隐私设置。

ID | owner_id | post | other_info | privacy_level (int value)

从那里,用户可以添加他们的隐私详细信息,允许所有 [privacy_level = 0)、朋友 (privacy_level = 1)、没有人 (privacy_level = 3) 或特定的人或过滤器 (privacy_level = 4) 查看)。对于指定特定人员的隐私级别 (4),查询将在子查询中引用表“post_privacy_includes_for”,以查看用户(或用户所属的过滤器)是否存在于表中的一行中。

ID | post_id | user_id | list_id

此外,用户可以通过排除某些人来阻止他们在更大的组中查看他们的帖子(例如,将其设置为所有人都可以查看,但对跟踪用户隐藏)。为此,添加了另一个参考表“post_privacy_exclude_from” - 它看起来与“post_privacy_includes_for”的设置相同。

我的问题是这无法扩展。完全没有。目前,大约有 1-2 百万个帖子,其中大部分设置为所有人都可以查看。对于页面上的每个帖子,它必须检查是否有一行将帖子排除在显示给用户之外 - 这在可以填充 100-200 个帖子的页面上移动非常慢。最多可能需要 2-4 秒,尤其是在向查询中添加了其他约束时。

这也创建了非常庞大和复杂的查询,只是......尴尬。

SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
         AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             ( SELECT i.id
              FROM PostPrivacyIncludeFor i
              WHERE i.user_id = ?
                AND i.thought_id = t.id)
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             (SELECT i2.id
              FROM PostPrivacyIncludeFor i2
              WHERE i2.thought_id = t.id
                AND EXISTS
                  (SELECT r.id
                   FROM FriendFilterIds r
                   WHERE r.list_id = i2.list_id
                     AND r.friend_id = ?))
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 1
           AND EXISTS
             (SELECT G.id
              FROM Following G
              WHERE follower_id = t.owner_id
                AND following_id = ?
                AND friend = 1)
           OR t.privacy_level = 1
           AND t.owner_id = ?)
       OR (NOT EXISTS
             (SELECT e.id
              FROM PostPrivacyExcludeFrom e
              WHERE e.thought_id = t.id
                AND e.user_id = ?
                AND NOT EXISTS
                  (SELECT e2.id
                   FROM PostPrivacyExcludeFrom e2
                   WHERE e2.thought_id = t.id
                     AND EXISTS
                       (SELECT l.id
                        FROM FriendFilterIds l
                        WHERE l.list_id = e2.list_id
                          AND l.friend_id = ?)))
           AND t.privacy_level IN (0, 1, 4))
  AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100

(模拟查询,类似于我现在在 Doctrine ORM 中使用的查询。这是一团糟,但你明白我在说什么。)

我想我的问题是,您将如何处理这种情况来优化它?有没有更好的方法来设置我的数据库?我愿意完全放弃我目前建立的方法,但我不知道该怎么做。

谢谢大家。

更新:修复查询以反映我为上述隐私级别定义的值(我忘记更新它,因为我简化了值)

【问题讨论】:

  • 您可能应该在查询中添加一些换行符和缩进,因为它是非常难以阅读的。
  • privacy_level = 7 是什么意思?
  • 对不起,我更新了查询以反映示例中的值(在实际应用中,隐私值不同)

标签: php mysql optimization doctrine


【解决方案1】:

您的查询太长,无法给出明确的解决方案,但我将遵循的方法是通过将子查询转换为连接来简单地进行数据查找,然后将逻辑构建到 where 子句和列列表中选择语句:

select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ? 

(这可能需要扩展:我无法遵循最后一个子句的逻辑。)

如果您可以让简单的选择快速运行并包含所有需要的信息,那么您需要做的就是在选择列表和 where 子句中建立逻辑。

【讨论】:

    【解决方案2】:

    在不过多地重新设计原始设计的情况下快速简化了这一点。

    使用此解决方案,您的网页现在可以简单地调用以下存储过程来获取给定用户在指定时间段内的过滤帖子列表。

    call list_user_filtered_posts( <user_id>, <day_interval> );
    

    整个脚本可以在这里找到:http://pastie.org/1212812

    我尚未对所有这些进行全面测试,您可能会发现此解决方案的性能不足以满足您的需求,但它可以帮助您微调/修改现有设计。

    表格

    删除了您的 post_privacy_exclude_from 表并添加了一个 user_stalkers 表,该表的工作方式与 user_friends 的倒数非常相似。根据您的设计保留原始 post_privacy_includes_for 表,因为这允许用户将特定帖子限制为一部分人。

    drop table if exists users;
    create table users
    (
    user_id int unsigned not null auto_increment primary key,
    username varbinary(32) unique not null
    )
    engine=innodb;
    
    
    drop table if exists user_friends;
    create table user_friends
    (
    user_id int unsigned not null,
    friend_user_id int unsigned not null,
    primary key (user_id, friend_user_id)
    )
    engine=innodb;
    
    
    drop table if exists user_stalkers;
    create table user_stalkers
    (
    user_id int unsigned not null,
    stalker_user_id int unsigned not null,
    primary key (user_id, stalker_user_id)
    )
    engine=innodb;
    
    
    drop table if exists posts;
    create table posts
    (
    post_id int unsigned not null auto_increment primary key,
    user_id int unsigned not null,
    privacy_level tinyint unsigned not null default 0,
    post_date datetime not null,
    key user_idx(user_id),
    key post_date_user_idx(post_date, user_id)
    )
    engine=innodb;
    
    
    drop table if exists post_privacy_includes_for;
    create table post_privacy_includes_for
    (
    post_id int unsigned not null,
    user_id int unsigned not null,
    primary key (post_id, user_id)
    )
    engine=innodb;
    

    存储过程

    存储过程相对简单 - 它最初选择指定时间段内的所有帖子,然后根据您的原始要求过滤掉帖子。我没有对这个存储过程进行大容量的性能测试,但由于初始选择相对较小,它应该足够高性能,并且可以简化您的应用程序/中间层代码。

    drop procedure if exists list_user_filtered_posts;
    
    delimiter #
    
    create procedure list_user_filtered_posts
    (
    in p_user_id int unsigned,
    in p_day_interval tinyint unsigned
    )
    proc_main:begin
    
     drop temporary table if exists tmp_posts;
     drop temporary table if exists tmp_priv_posts;
    
     -- select ALL posts in the required date range (or whatever selection criteria you require)
    
     create temporary table tmp_posts engine=memory 
     select 
      p.post_id, p.user_id, p.privacy_level, 0 as deleted 
     from 
      posts p
     where
      p.post_date between now() - interval p_day_interval day and now()  
     order by 
      p.user_id;
    
     -- purge stalker posts (0,1,3,4)
    
     update tmp_posts 
     inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
     set
      tmp_posts.deleted = 1
     where
      tmp_posts.user_id != p_user_id;
    
     -- purge other users private posts (3)
    
     update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;
    
     -- purge friend only posts (1) i.e where p_user_id is not a friend of the poster
    
     /*
      requires another temp table due to mysql temp table problem/bug
      http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
     */
    
     -- the private posts (1) this user can see
    
     create temporary table tmp_priv_posts engine=memory 
     select
      tp.post_id
     from
      tmp_posts tp
     inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
     where
      tp.user_id != p_user_id and tp.privacy_level = 1;
    
     -- remove private posts this user cant see
    
     update tmp_posts 
     left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
     set 
      tmp_posts.deleted = 1
     where 
      tpp.post_id is null and tmp_posts.privacy_level = 1;
    
     -- purge filtered (4)
    
     truncate table tmp_priv_posts; -- reuse tmp table
    
     insert into tmp_priv_posts
     select
      tp.post_id
     from
      tmp_posts tp
     inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
     where
      tp.user_id != p_user_id and tp.privacy_level = 4;
    
     -- remove private posts this user cant see
    
     update tmp_posts 
     left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
     set 
      tmp_posts.deleted = 1
     where 
      tpp.post_id is null and tmp_posts.privacy_level = 4;
    
     drop temporary table if exists tmp_priv_posts;
    
     -- output filtered posts (display ALL of these on web page)
    
     select 
      p.* 
     from 
      posts p
     inner join tmp_posts tp on p.post_id = tp.post_id
     where
      tp.deleted = 0
     order by
      p.post_id desc;
    
     -- clean up
    
     drop temporary table if exists tmp_posts;
    
    end proc_main #
    
    delimiter ;
    

    测试数据

    一些基本的测试数据。

    insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');
    
    insert into user_friends values 
    (1,2),(1,3),(1,5),
    (2,1),(2,3),(2,4),
    (3,1),(3,2),
    (4,5),
    (5,1),(5,4);
    
    insert into user_stalkers values (4,1);
    
    insert into posts (user_id, privacy_level, post_date) values
    
    -- public (0)
    
    (1,0,now() - interval 8 day),
    (1,0,now() - interval 8 day),
    (2,0,now() - interval 7 day),
    (2,0,now() - interval 7 day),
    (3,0,now() - interval 6 day),
    (4,0,now() - interval 6 day),
    (5,0,now() - interval 5 day),
    
    -- friends only (1)
    
    (1,1,now() - interval 5 day),
    (2,1,now() - interval 4 day),
    (4,1,now() - interval 4 day),
    (5,1,now() - interval 3 day),
    
    -- private (3)
    
    (1,3,now() - interval 3 day),
    (2,3,now() - interval 2 day),
    (4,3,now() - interval 2 day),
    
    -- filtered (4)
    
    (1,4,now() - interval 1 day),
    (4,4,now() - interval 1 day),
    (5,4,now());
    
    insert into post_privacy_includes_for values (15,4), (16,1), (17,6);
    

    测试

    正如我之前提到的,我尚未对此进行全面测试,但从表面上看,它似乎可以正常工作。

    select * from posts;
    
    call list_user_filtered_posts(1,14);
    call list_user_filtered_posts(6,14);
    
    call list_user_filtered_posts(1,7);
    call list_user_filtered_posts(6,7);
    

    希望你能从中找到一些有用的东西。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-07
      • 1970-01-01
      • 2015-11-17
      • 2018-05-12
      • 1970-01-01
      • 2013-01-09
      相关资源
      最近更新 更多