【问题标题】:Rails ordering query optimizationRails 排序查询优化
【发布时间】:2021-11-09 03:54:52
【问题描述】:

我有一个模型 Activity,它有很多 ActivitySecondaryUser。我正在尝试优化此查询:

2.6.3 :015 > Activity.left_joins(:activity_secondary_users).where("activity_secondary_users.user_id = :id OR (primary_user_id = :id AND activity_type != '#{Activity::MENTION}')", id: 10000).order(created_at: :desc).limit(10).explain
  Activity Load (812.7ms)  SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1  [["LIMIT", 10]]
 => EXPLAIN for: SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1 [["LIMIT", 10]]
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=1000.87..19659.54 rows=10 width=138) (actual time=79.769..737.253 rows=10 loops=1)
  Buffers: shared hit=2013672
  ->  Gather Merge  (cost=1000.87..202514.52 rows=108 width=138) (actual time=79.768..737.245 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2013672
        ->  Nested Loop Left Join  (cost=0.84..201502.03 rows=45 width=138) (actual time=36.208..351.256 rows=5 loops=3)
              Filter: ((activity_secondary_users.user_id = 10000) OR ((activities.primary_user_id = 10000) AND ((activities.activity_type)::text <> 'mention'::text)))
              Rows Removed by Filter: 181610
              Buffers: shared hit=2013672
              ->  Parallel Index Scan using index_activities_on_created_at on activities  (cost=0.42..28991.70 rows=370715 width=138) (actual time=0.027..52.295 rows=181615 loops=3)
                    Buffers: shared hit=137766
              ->  Index Scan using index_activity_secondary_users_on_activity_id on activity_secondary_users  (cost=0.42..0.45 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=544845)
                    Index Cond: (activity_id = activities.id)
                    Buffers: shared hit=1875906
Planning Time: 0.216 ms
Execution Time: 737.288 ms

索引:

  • 活动:created_at、primary_user_id
  • ActivitySecondaryUser:activity_id

我已尝试添加其他索引并更改排序属性,但似乎没有什么能让它更快。该表的记录少于 100 万条,平均耗时超过 500 毫秒。有关如何优化查询的任何建议?

【问题讨论】:

  • 当您多次运行每个查询时,您是否看到相同的差异?我相信第一次运行查询时可能会有一些开销,因为查询计划是制定然后缓存的。
  • @LesNightingill asc 查询的整体性能确实更好。我确实发现,对于更高的 id,asc 和 desc 的查询都非常慢(有时超过 400 毫秒)
  • 请显示EXPLAIN (ANALYZE, BUFFERS),而不仅仅是解释。
  • @jjanes 添加到问题

标签: ruby-on-rails postgresql optimization activerecord rails-activerecord


【解决方案1】:

我会尝试按降序添加第二个索引。默认情况下,索引将按升序排列,如果您有大量数据,并且您经常希望按降序查看它,则可能值得拥有一个专用索引。

迁移看起来像这样:

def change
  add_index(:activities, :created_at, order: {created_at: :desc})
end

上面的 Rails 文档在这里:https://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/add_index

里面有一个注释 - 如果您使用的是旧版本的 MySQL,请注意 Note: MySQL only supports index order from 8.0.1 onwards (earlier versions accepted the syntax but ignored it).

【讨论】:

  • 我已尝试更改一些索引的排序顺序,但性能并没有提高
【解决方案2】:

您正在寻找的用户 10000 似乎不再处于活动状态。它必须遍历所有数据,544845 行活动,从最新的开始,然后才找到对该用户的 10 次引用。

这可能是一个很难优化的查询,因为 WHERE 的 ORed 分支在一个表上,但 ORDER BY 在另一个表上。

您能否只检测非活动用户并拒绝为他们运行此类查询?

【讨论】:

  • 在活跃用户上,查询仍然需要超过 200 毫秒。有没有办法重新架构数据库以达到类似的结果?为用户找到传出和传入活动的目标。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-07-15
  • 2012-09-12
  • 1970-01-01
  • 1970-01-01
  • 2016-10-27
  • 1970-01-01
相关资源
最近更新 更多