如何减少大数据表的查询执行时间答案

【问题标题】：How to reduce query execution time for table with huge data如何减少大数据表的查询执行时间
【发布时间】：2013-10-08 19:24:32
【问题描述】：

我在生产（Oracle）中运行这个查询，它需要超过 3 分钟。有没有办法减少执行时间？ svc_order 和 event 表都包含近 100 万条记录。

select 0 test_section, count(1) count, 'DD' test_section_value  
from svc_order so, event e  
where so.svc_order_id = e.svc_order_id  
  and so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')  
  and e.event_type = 230 and e.event_level = 'O'  
  and e.current_sched_date between 
      to_date( '09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
      and to_date('09/29/2013 23:59:59', 'MM/DD/YYYY HH24:MI:SS')  
  and (((so.sots_ta = 'N') and (so.action_type = 0)) 
       or  ((so.sots_ta is null) and (so.action_type = 0)) 
       or  ((so.sots_ta = 'N') and (so.action_type is null)))
  and so.company_code = 'LL'

【问题讨论】：

“我无法创建索引”。为什么不？而且 50 到 60k 条记录当然不是一个庞大的数据量。
Java 与它有什么关系？
你想打钉子，你不能用锤子。这很好（我们在现实生活中都面临着荒谬的限制），但您应该更明确地说明导致您这样做的情况，以便我们提供有用的建议。例如。 “不允许使用铁工具，因为我们在强磁场下工作”。
@FlorinGhita：不需要。
哎呀，我没有观察到他实际上没有从表格中选择任何东西。这很重要。

标签： sql database performance oracle

【解决方案1】：

看看你说你不能创建索引。我希望查询正在对表进行全表扫描。请尝试并行提示。

select /*+ full(so) parallel(so, 4) */ 0 test_section, count(1) count, 'DD' test_section_value  
from svc_order so, event e  
where so.svc_order_id = e.svc_order_id  
  and so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')  
  and e.event_type = 230 and e.event_level = 'O'  
  and e.current_sched_date between 
      to_date( '09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
      and to_date('09/29/2013 23:59:59', 'MM/DD/YYYY HH24:MI:SS')  
  and (((so.sots_ta = 'N') and (so.action_type = 0)) 
       or  ((so.sots_ta is null) and (so.action_type = 0)) 
       or  ((so.sots_ta = 'N') and (so.action_type is null)))
  and so.company_code = 'LL'

【讨论】：

此解决方案大大减少了查询执行时间。我们还有一个额外的索引。你能帮助做得更好吗？

【解决方案2】：

您可以至少通过使用 COALESCE()（或其 oracle 等价物 IFNULL()）避免三重 AND/OR 列表注意：这不符合 both em> sots_ta 和 action_type 为 NULL。

SELECT 0 test_section, count(1) count, 'DD' test_section_value
FROM svc_order so 
JOIN event e  ON so.svc_order_id = e.svc_order_id
WHERE e.event_type = 230 and e.event_level = 'O'  
  AND so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')
  AND e.current_sched_date >= to_date('09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
  AND e.current_sched_date  < to_date('10/01/2013 00:00:00', 'MM/DD/YYYY HH24:MI:SS') 
  AND  COALESCE(so.sots_ta, 'N') = 'N'
  AND  COALESCE(so.action_type, 0) = 0   
  AND so.company_code = 'LL'

我用普通的t >= low AND t. < high) 测试替换了中间，因为我不喜欢betweens 语义。我用JOIN 替换了FROM kommalist，因为我更喜欢加入。

【讨论】：

谢谢，但我没有看到执行时间有太大差异。
难怪。基本上是一样的。当您不回答我们在 cmets 中对您的问题提出的问题时，我们将无法真正帮助您。

【解决方案3】：

我们不能有额外的索引，但表必须至少有完整的主键，对吗？这应该至少导致索引，非/聚集，任何东西。看看它并尝试使用它。

如果表是一个堆，并且我们想按原样处理它，那么我们应该通过应用相应的 where 过滤器来单独减少每个表中的行数，然后组合该结果集。在您的查询中，仅表示完整结果列取决于基表是 count(1)。其他两列是常数。因为还有 JOIN/Cartesian Product 等...... .. 将导致数据库引擎寻找索引，所以改为使用 INTERSECT ，我觉得在你的情况下应该更好。您可以进行的其他一些更改：避免在 WHERE 条件列的右侧使用 TO_DATE 或任何类型的函数。在局部变量中准备数据并在查询中使用局部变量。您还需要检查使用 >= 是否有比 BETWEEN 更好的性能增益？

我已经修改了查询并且还结合了一个多余的 where 条件。请记住，如果此更改现在对您有效，但这并不意味着它会一直有效。作为儿子，您的表开始访问更多符合这些 WHERE 条件的数据，这将再次作为慢查询返回。所以短期内这可能会奏效，但从长远来看，你必须考虑替代选项

    1)  for example Indexed Views on top of this tables
    2)  Create same tables with different name and sync data 
        between new and original table using  “Insert/Update/Delete Trigger”.




    SELECT COUNT(1) AS [COUNT], 'DD' test_section_value  ,0 test_section
    FROM
    (
        SELECT  so.svc_order_id
        FROM    svc_order so
        WHERE   so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')
                AND so.company_code = 'LL'

        INTERSECT

        SELECT  e.svc_order_id
        FROM    event e
        WHERE   e.event_type = 230
                AND e.event_level = 'O'
                AND e.current_sched_date BETWEEN
                    to_date('09/01/2010 00:00:00','MM/DD/YYYY HH24:MI:SS')
                    AND to_date('09/29/2013 23:59:59','MM/DD/YYYY HH24:MI:SS')
                AND ( 
                        (( so.sots_ta = 'N' ) AND ( so.action_type IS NULL OR so.action_type = 0))
                        OR 
                        (( so.sots_ta IS NULL ) AND ( so.action_type = 0 )) 
                        --or ((so.sots_ta = 'N') and (so.action_type is null))
                    )
    )qry1

【讨论】：

【解决方案4】：

首先，确保统计数据是最新的。

begin
    dbms_stats.gather_table_stats('[schema]', 'svc_order');
    dbms_stats.gather_table_stats('[schema]', 'event');
end;
/

此查询是两个小表之间的非常简单的连接，但具有复杂的谓词。您几乎可以肯定不想要显着重写所有查询以寻找一些可以使一切快速运行的神奇语法。是的，在极少数情况下，BETWEEN 无法正常工作，或者将谓词移动到内联视图中会有所帮助，或者将连接替换为 INTERSECT 可能会有所帮助。但这听起来像cargo-cult programming 对我来说。问问自己，为什么这些更改会产生任何影响？如果这些类型的更改总能提高性能，那么 Oracle 为什么不直接在内部翻译查询？

通常，您应该尝试向优化器提供更好的信息，以便它做出更好的决策。通常这就像使用默认设置收集统计数据一样简单。有些谓词太复杂了，为此你应该尝试使用 dynamic sampling，如/*+ dynamic_sampling(6) */。或许添加一些histograms。或者也许像这样添加expression statistic：

SELECT 
    DBMS_STATS.CREATE_EXTENDED_STATS(null,'SVC_ORDER',
        '(((so.sots_ta = 'N') and (so.action_type = 0)) 
        or  ((so.sots_ta is null) and (so.action_type = 0)) 
        or  ((so.sots_ta = 'N') and (so.action_type is null)))'
    ) 
FROM DUAL;
--Don't forget to re-gather statistics after this.

优化器可能低估了行数，并使用嵌套循环而不是哈希连接。在向它提供更多信息后，理想情况下它将开始使用散列连接。但是在某些时候，在您尝试了上述方法和可能的许多其他功能之后，您可以告诉它使用哪种连接。这将是@Florin Ghita 的建议，/*+use_hash(so e)*/。

【讨论】：