【问题标题】:Group by IDs and TIMESTAMPDIFF one column in same table按 ID 和 TIMESTAMPDIFF 分组同一表中的一列
【发布时间】:2015-01-09 12:54:48
【问题描述】:

我试图找出“在一个时间范围内,有多少独特的消息已发送给特定船上的人,这些文本之间的最短天数是多少”并显示它,包括计数。

人用“id”表示,船用“id2”表示,消息用“text”表示。

CREATE TABLE `stacktable` (
`timestamp` DATETIME NOT NULL,
`id` VARCHAR(15) NOT NULL,
`id2` VARCHAR(3) NULL DEFAULT NULL,
`text` VARCHAR(255) NULL DEFAULT NULL,
`id3` INT(10) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id3`)
);

insert into stacktable (timestamp,id,id2,text) VALUES
('2015-01-01 00:00:01',1,10,'ABC'),
('2015-01-01 00:00:01',2,11,'ABC'),
('2015-01-01 00:00:01',3,12,'ABC'),
('2015-01-01 00:00:02',3,12,'ABC'),
('2015-01-01 00:00:02',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'ABC'),
('2015-01-04 00:00:01',1,10,'BCD'),
('2015-01-04 00:00:01',2,11,'ABC'),
('2015-01-04 00:00:01',2,11,'BCD'),
('2015-01-04 00:00:01',3,12,'ABC'),
('2015-01-04 00:00:01',3,12,'BCD'),
('2015-01-04 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',2,11,'BCD'),
('2015-01-07 00:00:01',3,12,'BCD'),
('2015-01-07 00:00:01',3,13,'CDE'),
('2015-01-07 00:00:01',3,13,'DEF'),
('2015-01-08 00:00:01',3,12,'ABC'),
('2015-01-08 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:01',4,14,'EFG'),
('2015-01-09 00:00:02',4,15,'FGH'),
('2015-01-10 00:00:01',4,14,'EFG'),
('2015-01-10 00:00:01',4,14,'FGH'),
('2015-01-10 00:00:01',4,15,'FGH'),
('2015-01-11 00:00:01',4,14,'EFG'),
('2015-01-15 00:00:01',4,14,'EFG');

展示我想要实现的目标:

select * from stacktable where id = 1

timestamp           id id2 text id3     
2015-01-01 00:00:01 1  10  ABC  1    First entry for id+id2+text (ABC)
2015-01-01 00:00:02 1  10  ABC  5    Second entry for same keys id+id2+text 1 second later
2015-01-04 00:00:01 1  10  ABC  6    Third entry for same keys id+id2+text 2 days later
2015-01-04 00:00:01 1  10  BCD  7    First entry for id+id2+text (BCD)

我只想计算“在 2 天内具有相同 id、id2 和文本”的记录,但也显示“命中之间的最小 diffdate”。

我想要的输出是:

id id2 text count(*) mindiffdatebetweenhits
-------------------------------------------
1  10  ABC  3        0                      count id3s 1,5 and 6, minimumdaydiff is between id3 1 and 5 = 0 days
3  12  ABC  3        0                      count id3s 3,4 and 10, minimumdaydiff is between id3 3 and 4 = 0 days
4  14  EFG  4        1                      count id3s 18,19,21 and 24, minimumdaydiff is equal between all hits = 1 day
4  15  FGH  2        0                      count id3s 20 and 23, minimumdaydiff is between id3 20 and 23 = 0 days

我怎样才能得到想要的输出?

【问题讨论】:

  • 您确定这将是您的输出吗?因为 id3 = 6 和 1 之间的时间差异超过 2 天。
  • 我希望它为 id=1 (id3 1,5,6) 计数 3。这是因为 id3 1 和 id3 5
  • 记录 3 和 4 的日期为 2015-01-01,记录 10 的日期为 2015-01-04。所以,间隔超过2天。但是你想计算 3 条记录。似乎与“自上次点击后 2 天内”不一致
  • 对不起。在 id+id2+text 上的“最后一次”点击后 2 天内,不要从第一次点击开始持续检查 timediff,而是从最近的一次.. 更改它 1 秒,你是正确的,这将是 3 天。
  • id3 = 3 = 2015-01-01 00:00:01 id3 = 4 = 2015-01-01 00:00:02 .. 1 秒后 id3 = 10 = 2015-01-04 00:00:01 .. 2 天后。 SELECT TIMESTAMPDIFF(day, '2015-01-01 00:00:02', '2015-01-04 00:00:01') from dual

标签: mysql sql group-by datediff


【解决方案1】:

应该这样做,假设只有一行的序列要被丢弃:

select id, id2, text, seq, count(id) as total, min(diff) as mindiff
from (
      select t1.row, t2.row row2, t1.id, t1.id2, t1.text, t1.id3, 
             TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) as diff,
             IF (TIMESTAMPDIFF(DAY, t1.timestamp, t2.timestamp) > 2, @seq * (1 and @seq := @seq +1), @seq) as seq
      from (select (@row := @row + 1) as row, id, id2, text, id3, timestamp
            from (select   id, id2, text, id3, timestamp
                  from     stacktable
                  order by id, id2, text) sorted, 
                  (select @row := 0) setup) t1
            left join (select (@row2 := @row2 + 1) as row, id, id2, text, id3, timestamp
                       from (select id, id2, text, id3, timestamp
                             from stacktable
                             order by id, id2, text) sorted, 
                             (select @row2 := 0) setup) t2
            on  (t1.id = t2.id and t1.id2 = t2.id2 and t1.text=t2.text and t1.row = t2.row - 1),
            (select @seq := 1) setup_sequence
     ) t3
group by id, id2, text, seq
having total > 1

为方便阅读,查询使用相同的子查询两次,t1 和 t2,它所做的只是对表的行进行排序和编号:

select (@row := @row + 1) as row, id, id2, text, id3, timestamp
from (select   id, id2, text, id3, timestamp
      from     stacktable
      order by id, id2, text) sorted, 
     (select @row := 0) setup

fiddle。请注意,序列计数器在所有序列之间实际上并不是唯一的。这不是一个错误。它仅在相同 id,id2,text 的序列之间是唯一的。

序列计数器更新有点棘手:@seq * (1 and @seq := @seq +1)。它依赖于在更新之前为乘法设置的第一个 @seq。我不确定这是跨引擎的确定性或一致的。但是,也可以通过将 t1 的记录与前一个记录而不是下一个记录(在 t2 中)连接来更改查询以避免它。 (没试过)

【讨论】:

  • 您会为上述查询推荐哪些索引?所有来自(在大约 200 万条记录的桌子上的人都变得相当沉重
  • 尝试 id、id2、text 的复合索引(按此顺序)。
猜你喜欢
  • 2021-05-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-05-26
  • 1970-01-01
  • 2023-02-17
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多