【发布时间】:2016-02-05 17:50:07
【问题描述】:
我有一个自制的(不是我自己的)版本控制系统,其数据结构如下:
create_table "activities", :force => true do |t|
t.string "source"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
t.integer "head_revision_id"
end
add_index "activities", ["head_revision_id"], :name => "index_activities_on_head_revision_id"
add_index "activities", ["source"], :name => "index_activities_on_source"
create_table "activity_revisions", :force => true do |t|
t.integer "activity_id"
t.string "activity_type"
t.string "title"
t.text "content"
t.text "comment"
t.integer "modified_by_id"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
add_index "activity_revisions", ["activity_id"], :name => "index_activity_revisions_on_activity_id"
add_index "activity_revisions", ["activity_type"], :name => "index_activity_revisions_on_activity_type"
add_index "activity_revisions", ["title"], :name => "index_activity_revisions_on_title"
应用程序显示从最新到最旧的活动列表,分页 (will_paginate) 20 到一个页面。这是用于生成列表的查询:
Activity.where(conditions)
.joins(:head_revision)
.includes(:head_revision)
.order('activities.id DESC')
conditions 根据从搜索表单传递的值而有所不同。对于初始列表显示,conditions 为空白。
从表面上看,这个查询很简单,但在执行过程中,对于大型数据集,它的速度非常慢。我们目前有大约 102,000 条活动记录和 512,000 条 activity_revision 记录。在我们的生产服务器上,查询需要将近 2 秒才能提供计数。在开发环境中,这很糟糕。
我觉得数据模型本身就有问题,我希望有人能告诉我一个更好的方法。
编辑:解释在没有条件的基本查询上运行:
mysql> explain SELECT * FROM `activities` INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
| 1 | SIMPLE | activities | ALL | index_activities_on_head_revision_id | NULL | NULL | NULL | 106590 | |
| 1 | SIMPLE | activity_revisions | eq_ref | PRIMARY | PRIMARY | 4 | cms_production.activities.head_revision_id | 1 | |
+----+-------------+--------------------+--------+--------------------------------------+---------+---------+--------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
关于 count(*) 查询:
mysql> explain SELECT count(*) FROM `activities` INNER JOIN `activity_revisions` ON `activity_revisions`.`id` = `activities`.`head_revision_id`;
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
| 1 | SIMPLE | activities | index | index_activities_on_head_revision_id | index_activities_on_head_revision_id | 5 | NULL | 106590 | Using index |
| 1 | SIMPLE | activity_revisions | eq_ref | PRIMARY | PRIMARY | 4 | cms_production.activities.head_revision_id | 1 | Using index |
+----+-------------+--------------------+--------+--------------------------------------+--------------------------------------+---------+--------------------------------------------+--------+------------- +
2 rows in set (0.00 sec)
【问题讨论】:
-
更新:虽然我仍然觉得数据模型本质上很糟糕,但我已经确定了实时和开发之间巨大的时间差异的原因是由于开发 percona 调整不当。设置
innodb_buffer_pool_size=7GB大大提高了查询性能。
标签: mysql ruby-on-rails activerecord