【发布时间】:2018-12-01 03:49:08
【问题描述】:
我有以下疑问:
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
分析表有 60M 行,事务表有 3M 行。
当我对此查询运行 EXPLAIN 时,我得到:
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| # id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| '1' | 'SIMPLE' | 'analytics' | 'ref' | 'analytics_user_id | analytics_source' | 'analytics_user_id' | '5' | 'const' | '337662' | 'Using where; Using temporary; Using filesort' |
| '1' | 'SIMPLE' | 'transactions' | 'ref' | 'tran_analytics' | 'tran_analytics' | '5' | 'dijishop2.analytics.id' | '1' | NULL | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
我不知道如何优化这个查询,因为它已经很基础了。运行此查询大约需要 70 秒。
以下是存在的索引:
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'analytics' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_user_id' | '1' | 'user_id' | 'A' | '130583' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_product_id' | '1' | 'product_id' | 'A' | '490812' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_affil_user_id' | '1' | 'affil_user_id' | 'A' | '55222' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_source' | '1' | 'source' | 'A' | '24604' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_country_name' | '1' | 'country_name' | 'A' | '39510' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '2' | 'user_id' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '3' | 'source' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'transactions' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '2436151' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_user_id' | '1' | 'user_id' | 'A' | '56654' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'transaction_id' | '1' | 'transaction_id' | 'A' | '2436151' | '191' | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_analytics' | '1' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_status' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '2' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
在按照建议添加任何额外索引之前简化了两个表的架构,因为它并没有改善这种情况。
CREATE TABLE `analytics` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`affil_user_id` int(11) DEFAULT NULL,
`product_id` int(11) DEFAULT NULL,
`medium` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`terms` varchar(1024) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`is_browser` tinyint(1) DEFAULT NULL,
`is_mobile` tinyint(1) DEFAULT NULL,
`is_robot` tinyint(1) DEFAULT NULL,
`browser` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`mobile` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`robot` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`platform` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrer` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`domain` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`ip` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`continent_code` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`country_name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `analytics_user_id` (`user_id`),
KEY `analytics_product_id` (`product_id`),
KEY `analytics_affil_user_id` (`affil_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=64821325 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transaction_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`pay_key` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sender_email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`amount` decimal(10,2) DEFAULT NULL,
`currency` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`analytics` int(11) DEFAULT NULL,
`ip_address` varchar(46) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`session_id` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`eu_vat_applied` int(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tran_user_id` (`user_id`),
KEY `transaction_id` (`transaction_id`(191)),
KEY `tran_analytics` (`analytics`),
KEY `tran_status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10019356 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
如果以上无法进一步优化。任何关于汇总表的实施建议都会很棒。我们在 AWS 上使用 LAMP 堆栈。上述查询在 RDS (m1.large) 上运行。
【问题讨论】:
-
你的声望分数很高,所以你不是新手。您现在应该知道,您应该在查询中包含每个表的
SHOW CREATE TABLE,以便我们可以看到您的表中有哪些数据类型、索引和约束。帮助我们帮助您! -
抱歉比尔,它们是巨大的表格(很多列)。在我尝试 Gordon 的建议后会明白的。
-
我建议使用
SHOW CREATE TABLE的原因是,如果有人想在沙箱实例上试用您的表,他们必须通过猜测您的列和索引来煞费苦心地重新创建表。可以从你的 SHOW INDEXES 中拼凑出与你的故事相似的东西,但这需要太多的工作,我不能确定它是否正确。我不会花时间做那件事。祝你好运! -
如果省略
GROUP BY子句,查询性能会怎样? (我知道它不会产生您想要的结果;关键是要弄清楚GROUP BY ... LIMIT...是否占用了很多时间。) -
你能解释一下你想要什么更好一点吗?
COUNT(a.id)在查询中执行a LEFT JOIN b有点奇怪。它计算来自b的匹配行,并为a中的每一行计算1,而b中没有匹配的行。那是你要的吗?对我来说,这听起来像是很难向用户解释的事情。COUNT操作的完美性至关重要,因为您稍后会将它用于GROUP BY ... LIMIT ...操作。
标签: mysql sql query-optimization amazon-rds sql-optimization