【发布时间】:2021-02-04 16:00:17
【问题描述】:
我正在尝试根据当前地址和更改日志记录在某个时间点重新创建地址。我们在line1 到line4 字段中有一个客户地址表。以下是表格和数据的近似值:
create table qtemp.customers (
id int,
line1 char(40),
line2 char(40),
line3 char(40),
line4 char(40));
insert into qtemp.customers (id, line1, line2, line3, line4)
with cust_temp(id, line1, line2, line3, line4) as(
select 1, 'Line1', 'Line2', 'Line3', 'Line4'
from sysibm.sysdummy1
union all
select id+1, line1, line2, line3, line4 from cust_temp where id<15000)
select * from cust_temp;
然后我们有一个记录更改的表,包括各个地址行的更改日志。我感兴趣的更改类型以“Line”和数字开头。它们混合在其他变化中。再次粗略估算一下表格:
create table qtemp.changes (
seq int,
dt int,
cid int,
change_type char(40),
change char(40));
insert into qtemp.changes (seq, dt, cid, change_type, change)
with changes_temp(seq, dt, cid, change_type, change) as(
select 1, 1, 1, 'not a real change', 'just a bogus line' from sysibm.sysdummy1
union all
select seq+1,
dt + int(rand() + 0.005), --about 175 changes per day on average
int(rand() * 15000 + 1),
case int(rand() * 13) --little less then 3000 changes to address line
when 1 then 'Line ' || int(rand() * 4 + 1)
else trim(TRANSLATE ( CHAR(BIGINT(RAND() * 50 )), 'abcdefghij', '1234567890' )) || ' Some other change'
end,
TRANSLATE ( CHAR(BIGINT(RAND() * 10000000000 )), 'abcdefghij', '1234567890' )
from changes_temp where seq < 35000)
select * from changes_temp;
我的解决方案是只选择 'Line%' 记录,将它们转置到相应的 line1 到 line4 列,然后使用窗口函数填充空值。
with
changes_filtered as (
select * from changes
where change_type like 'Line%'),
--only show the last change for any particular customer id, date and line
changes_latest as (
select a.*
from changes_filtered a
left join changes_filtered b on a.cid = b.cid and a.dt = b.dt and a.change_type = b.change_type and a.seq<b.seq
where b.seq is null),
changes_pivoted as (
select cid, dt,
max(case when change_type = 'Line 1' then change end) line1,
max(case when change_type = 'Line 2' then change end) line2,
max(case when change_type = 'Line 3' then change end) line3,
max(case when change_type = 'Line 4' then change end) line4
from changes_latest
group by cid, dt
union all
select id, 99999, line1, line2, line3, line4 from customers
where id in (select cid from changes_filtered)
),
changes_filled as (
select cid, dt,
first_value(line1) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line1,
first_value(line2) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line2,
first_value(line3) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line3,
first_value(line4) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line4
from changes_pivoted
)
select * from changes_filled order by cid, dt;
但是,当我尝试运行它时,我立即收到以下错误
[SQL0666] SQL 查询超出指定的时间限制或存储限制。原因 。 . . . . : 即将开始一个数据库查询,其估计运行时间 2147352576 超过了指定的限制 600
注意估计这个词。这是先发制人的打击。 600 秒的限制是通过系统值设置的。如果我用CHGQRYA 覆盖它,查询将在 150 毫秒内运行。所以估计的运行时间完全是假的。当我查看视觉解释时,每个 OLAP 的累积时间呈指数增长。第一个估计时间是1134s,第二个4M s,第三个1400M s,第四个50000000M s。
我发现这个关于 ODBC Query Timeout Property: SQL0666 Estimated Query Processing Time Exceeds Limit 的 IBM 文档说明了这一点
如果没有合适的索引,估计可能会很差 (并且查询可能表现不佳)。
但 150 毫秒 vs 160 万年甚至没有错。查询执行良好,但估计甚至不在这个星系中。
编辑:我想我想问的问题是,在不更改运行时系统值 (QQRYTIMLMT) 并且不为此查询构建专用索引的情况下,是否有解决此问题的方法。
EDIT2:创建索引无效。我尝试自己创建索引以及索引顾问建议的索引,并且在估计的运行时间上我没有遇到任何差异。我有一个 IBM 已立案。
CREATE INDEX QTEMP/CHANGES_IDX
ON QTEMP/CHANGES (CID ASC, DT ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CHANGES_IDX2
ON QTEMP/CHANGES (CHANGE_TYPE ASC, DT ASC, CID ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CUSTOMERS_IDX
ON QTEMP/CUSTOMERS (ID ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CHANGES_IDX3
ON QTEMP/CHANGES (CID ASC, CHANGE_TYPE ASC) UNIT ANY KEEP IN MEMORY NO;
【问题讨论】:
标签: ibm-midrange db2-400