【问题标题】:update table from another table multiple rows从另一个表更新表多行
【发布时间】:2017-05-10 14:45:30
【问题描述】:

我的上下文是PostgreSQL 8.3

我需要加快这个查询,因为两个表都有数百万条记录。

对于表 Calls 中的每一行,Trunks 表中有两行。对于每个 call_id,当trunk_id 是两行中最低的trunk_id 时,我想将值从trunks.trunk 复制到calls.orig_trunk。 ...当trunk_id 是两行中最高的trunk_id 时,将值从trunks.trunk 复制到calls.orig_trunk。

表调用的初始内容:

Call_ID | dialed_number | orig_trunk | dest_trunk
--------|---------------|------------|-----------
1       | 5145551212    |    null    |   null
2       | 8883331212    |    null    |   null
3       | 4164541212    |    null    |   null

表干:

Call_ID | trunk_id | trunk
--------|----------|-------
1       | 1        |  116
1       | 2        |  9
2       | 3        |  168
2       | 4        |  3
3       | 5        |  124
3       | 6        |  9 

表格调用的最终内容:

Call_ID | dialed_number | orig_trunk| dest_trunk
--------|---------------|-----------|----------
1       | 5145551212    |    116    |   9
2       | 8883331212    |    168    |   3
3       | 4164541212    |    124    |   9

我为每一列都创建了索引。

update calls set orig_trunk = t2.trunk 
from ( select call_id,trunk_id from trunks 
     order by trunk_id ASC ) as t2 
where (calls.call_id=t2.call_id );

update calls set dest_trunk = t2.trunk 
from ( select call_id,trunk_id from trunks 
     order by trunk_id DESC ) as t2 
where (calls.call_id=t2.call_id );

有什么想法吗?

【问题讨论】:

  • 如果您向我们提供带有数据样本的 sql fiddle。它使人们更容易尝试一些东西并将其与执行进行比较。如果您还没有这样做,那么加快进程的一种方法是向 id 列添加索引。

标签: sql postgresql multiple-records


【解决方案1】:

从发布的示例中,似乎正在执行许多不必要的更新。以下是获取您要查找的结果的查询示例:

select distinct c.call_id, c.dialed_number
      ,first_value(t.trunk) over w as orig_trunk
      ,last_value(t.trunk)  over w as dest_trunk
  from calls c
  join trunks t on (t.call_id = c.call_id)
  window w as (partition by c.call_id
               order by trunk_id
               range between unbounded preceding
                         and unbounded following
              )

不用解析函数还有其他方法,例如:

select x.call_id
      ,x.dialed_number
      ,t1.trunk as orig_trunk
      ,t2.trunk as dest_trunk
  from (select c.call_id, c.dialed_number
              ,min(t.trunk_id) as orig_trunk_id
              ,max(t.trunk_id) as dest_trunk_id
          from calls c
          join trunks t on (t.call_id = c.call_id)
          group by c.call_id, c.dialed_number
        ) x
  join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
  join trunks t2 on (t2.trunk_id = x.dest_trunk_id)

进行实验,看看哪种方法最适合您的情况。可能希望在连接列上被索引。

如何处理结果集取决于应用程序的性质。这是单机吗?那为什么不直接从结果集中创建一个新表呢:

CREATE TABLE trunk_summary AS
  SELECT ...

它是不断变化的吗?是否经常访问?仅仅创建一个视图就足够了吗?或者,可能要根据结果集执行更新。也许一次可以更新一个范围。这真的取决于,但这可能是一个开始。

【讨论】:

    【解决方案2】:

    这是测试条件为 cmets 的最终代码。 子查询非常高效和快速。然而测试表明,对表进行分区对执行时间的影响比对子查询效率的影响更大。在有 100 万行的表上,更新需要 80 秒。在有 1200 万行的表上,更新需要 580 秒。

    update calls1900 set orig_trunk = a.orig_trunk, dest_trunk = a.dest_trunk   
    from (select 
      x.call_id,
          t1.trunk as orig_trunk, t2.trunk as dest_trunk 
      from (select calls1900.call_id
                  ,min(t.trunk_id) as orig_trunk_id
                  ,max(t.trunk_id) as dest_trunk_id
              from calls1900
              join trunks t on (t.call_id = calls1900.call_id)
              -- where calls1900.call_id between 43798930 and 43798950
              group by calls1900.call_id
            ) x
      join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
      join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
      ) a
    
    where (calls1900.call_id = a.call_id); -- and (calls1900.call_id between 43798930 and 43798950)<code> 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-02-15
      • 1970-01-01
      • 1970-01-01
      • 2011-08-06
      • 2018-07-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多