【发布时间】:2016-03-19 08:54:20
【问题描述】:
我对 MySQL 执行计划不够熟悉,因此我需要帮助来了解和了解如何在可能的情况下对 MySQL 中的数据子集进行操作。我有两张桌子:
表用户:
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| user_id | int(11) | NO | PRI | NULL | auto_increment |
| msisdn | bigint(20) | NO | UNI | NULL | |
| activation_date | datetime | NO | | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
表 log_archive:
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| msisdn | bigint(11) | NO | MUL | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
| date | date | NO | | NULL | |
| action | varchar(32) | NO | | NULL | |
+-------------+--------------+------+-----+---------+-------+
在表 users 中 msisdn 是唯一的,但在 log_archive 中不是。
在这里您可以找到将为您生成这两个表的测试数据的 PHP 脚本:
Test data generation script helper
我需要选择:
1) All distinct records by msisdn from table log_archive;
2) By earliest date per msisdn for one specific action only;
3) For a specific date range from table log_archive;
4) And to join activation_date from users table with msisdn from both tables.
让我举个例子。假设这是来自 log_archive 表的示例数据:
+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2016-02-11 | 2014-10-07 00:00:00 | all_services |
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 977129764170 | 2015-05-08 | 2014-10-07 00:00:00 | widget |
| 986629508626 | 2015-07-12 | 2016-02-05 00:00:00 | app_start |
| 986629508626 | 2015-03-02 | 2016-02-05 00:00:00 | number_connect |
| 986629508626 | 2015-05-08 | 2016-02-05 00:00:00 | widget |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2016-02-20 | 2014-10-06 00:00:00 | all_services |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 933563888440 | 2015-04-26 | 2014-10-06 00:00:00 | number_connect |
| 933563888440 | 2015-10-17 | 2014-10-06 00:00:00 | all_services |
| 943730853721 | 2015-06-19 | 2015-05-01 00:00:00 | widget |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
| 943730853721 | 2016-02-09 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
这里不同的 msisdns 是 977129764170、986629508626、933563888440、943730853721;
action 列等于“app_start”的不同 msisdn 值的最早日期是:
977129764170 is 2015-09-05
986629508626 is 2015-01-08
933563888440 is 2015-03-12
943730853721 is 2015-06-19
我需要编写这样的 SQL 来给我这个输出:
+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
因此,我需要为 app_start 操作发生的最早日期选择所有不同的 msisdns,并通过该不同的 msisd 从 users 表中加入 activation_date。并且只从日期列中查找特定的日期范围。
我用这个 sql 试了一下,没有结果:
SELECT DISTINCT(log_archive.msisdn) as msisdn, DATE(log_archive.date) AS actionDate, users.activation_date
FROM log_archive
INNER JOIN users on log_archive.msisdn = users.msisdn
WHERE log_archive.action = 'app_start' && log_archive.date BETWEEN '2015-01-08' AND '2016-03-15'
ORDER BY actionDate ASC;
即使我使用了 DISTINCT,我也不止一次获得相同的 msisdn。
我需要使用子查询吗?
【问题讨论】:
-
删除 DISTINCT 并使用 GROUP BY log_archive.msisdn
-
@BerndBuffen 我不能这样做,因为我不会获得 log_archive.action = 'app_start' 发生的最早记录。请阅读我解释所需输出的部分。