【问题标题】:SQL - What is the performance impact of having multiple CASE statements in SELECT - TeradataSQL - 在 SELECT 中有多个 CASE 语句对性能有什么影响 - Teradata
【发布时间】:2012-08-14 07:34:12
【问题描述】:

所以我有一个查询需要 SELECT 中的一堆 CASE 语句。这不是最初的设计,而是妥协的一部分。

所以查询看起来像这样:

SELECT
  CONT.TABLE.FINC_ACCT_NM,
  CONT.TABLE.FINC_ACCT_ID,
  CONT.TABLE.CURR_END_OF_PERD_ACTL_VAL,
  CONT.TABLE.PREV_END_OF_PERD_ACTL_VAL,
  CONT.TABLE.VARNC_PLAN_VAL,
  CONT.TABLE.OUTLOOK_BDGT_PLAN_VAL,
  CONT.TABLE.PERD_END_RPT_DT,
  CONT.TABLE.PLAN_VERS_NM,
  CONT.TABLE.FRMT_ACTL_CD,
  CONT.TABLE.FRMT_PLAN_CD,
  CONT.TABLE.RPT_PERD_TYPE_CD,
  CASE 
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Net Interest Income'  
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Non Interest Income'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Non-Interest Expense'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Total Marketing Expense'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Total Operating Expense'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Pre-Provision Earnings (before tax)'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Net Charge-offs'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Other'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      '  Allowance Build (Release)'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Provision Expense'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Pretax Income'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Tax Expense'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'NIAT'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'EPS'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Ending Loans - HFI'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'avg'       then      'Average Loans - HFI'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'avg'       then      'Average Earning Assets'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Ending Deposits'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'avg'       then      'Average Deposits'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'NIM on Loans'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Revenue Margin'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'AC579'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Charge off rate'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Efficiency ratio'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'ROA'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'ROE'
                WHEN ( CONT.TABLE.FINC_ACCT_ID )=           'XXXX'        and ( CONT.TABLE.BAL_TYPE_CD ) =             'EOP'      then      'Return on Allocated Capital (ROAC)'



  ELSE ( CONT.TABLE.FINC_ACCT_NM ) end
FROM
  CONT.TABLE
WHERE
  (
   (
    ( ( CONT.TABLE.PERD_END_RPT_DT ) = (

SELECT Max(Perd_END_RPT_DT) 
FROM CONT.TABLE
Where VERS_NM='Actual'
   AND RPT_PERD_TYPE_CD = 'Q'
   AND DATA_VLDTN_IND='Y'
)
   AND RPT_PERD_TYPE_CD = 'Q'
  AND DATA_VLDTN_IND='Y'  )
    OR
    ( ( CONT.TABLE.PERD_END_RPT_DT ) = (

SELECT Max(Perd_END_RPT_DT) 
FROM CONT.TABLE
Where VERS_NM='Actual'
   AND RPT_PERD_TYPE_CD = 'M'
   AND DATA_VLDTN_IND='Y'
) 

  AND RPT_PERD_TYPE_CD = 'M'
  AND DATA_VLDTN_IND='Y'  )
   )
   AND
   ( ( CONT.TABLE.DATA_VLDTN_IND )='Y'  )
   AND
   ( ( CONT.TABLE.FINC_ACCT_ID )IN ('AC0006470','AC8000199','AC8002145','AC0006586','AC8000094')  AND ( CONT.TABLE.DEPT_ID )='OR80637'  )
  )

我的问题是,将所有这些 CASE 语句更改为直接列引用会对性能产生什么影响。

换句话说:如果我将每个 CASE 语句都更改为一个列名,并从查询中删除所有 CASE 语句,会对性能产生很大影响吗?为什么?

我正在对此进行测试,以便确定性能是否受到影响,但我对 WHY 的详细信息同样感兴趣? (原因的技术细节)

感谢您的帮助!

【问题讨论】:

  • 为什么不使用 JOIN?无论如何,“找出”的方法是运行不同的查询,然后查看执行计划..(这应该在就 SO 提出这样一个开放性问题之前完成;稍微更多信息将使问题不再需要和/或更集中/有趣)
  • 为什么不将 select 子句更改为 SELECT CONT.TABLE.FINC_ACCT_NM 并查看它的比较情况。如果差异很小或没有差异,您就会知道您需要寻找其他地方

标签: sql performance teradata


【解决方案1】:

与 WHERE 子句中的连接相比,case 语句的影响要小得多。

SQL 性能的主要驱动因素是 I/O——从磁盘读取数据。我认为它比按行进行的处理重要两个数量级。这只是一种启发式方法,并非基于对数据库的特定测试。

您正在执行自联接,这将需要大量读取表的工作或大量处理索引的工作。

另一方面,case 语句变成了非常原始的硬件命令——equals、gotos 等。数据驻留在最靠近处理器的内存中,因此它会快速传输。您在 case 语句中没有做任何花哨的事情(例如喜欢或子查询)。我想如果您删除语句中的大部分行,查询会一样快。

如果您遇到性能问题,请在(VERS_NM、RPT_PERD_TYPE_CD、DATA_VLDTN_IND、Perd_END_RPT_DT)上添加索引。这个由四部分组成的索引应该允许您在不调用原始表上的 I/O 请求的情况下获取最大日期。

【讨论】:

    【解决方案2】:

    编辑:实际上,您可以将这两个子查询重新分解为JOIN,无论如何这可能会更快。它也消除了很多重复!

    这实际上与查询的性能无关(@Gordon 已经很好地涵盖了这一点),但是那个巨大的 case 语句看起来就像是维护的噩梦。也许更好的处理方法是将其转换为表格

    CREATE TABLE ACCT_DISPLAY_NAME (
        FINC_ACCT_ID CHAR(10),
        BAL_TYPE_CD  CHAR(3),
        DISPLAY_NAME VARCHAR(100)
    );
    
    CREATE INDEX ACCT_DISPLAY_INDEX ON ACCT_DISPLAY_NAME (
        FINC_ACCT_ID,
        BAL_TYPE_CD
    );
    
    INSERT INTO ACCT_DISPLAY_NAME VALUES
    ('AC99800'  , 'EOP', '  Net Interest Income'               ),
    ('AC12993'  , 'EOP', '  Non Interest Income'               ),
    ('AC667999' , 'EOP', 'Non-Interest Expense'                ),
    ('AC996587' , 'EOP', '  Total Marketing Expense'           ),
    ('AC659986' , 'EOP', '  Total Operating Expense'           ),
    ('AC69678'  , 'EOP', 'Pre-Provision Earnings (before tax)' ),
    ('AC09994'  , 'EOP', '  Net Charge-offs'                   ),
    ('AC20977'  , 'EOP', '  Other'                             ),
    ('AC19979'  , 'EOP', '  Allowance Build (Release)'         ),
    ('AC7094'   , 'EOP', 'Provision Expense'                   ),
    ('AC6997'   , 'EOP', 'Pretax Income'                       ),
    ('AC0994'   , 'EOP', 'Tax Expense'                         ),
    ('AC9999'   , 'EOP', 'NIAT'                                ),
    ('AC7990'   , 'EOP', 'EPS'                                 ),
    ('AC9995'   , 'EOP', 'Ending Loans - HFI'                  ),
    ('AC9995'   , 'avg', 'Average Loans - HFI'                 ),
    ('AC2991'   , 'avg', 'Average Earning Assets'              ),
    ('AC2999'   , 'EOP', 'Ending Deposits'                     ),
    ('AC9999'   , 'avg', 'Average Deposits'                    ),
    ('AC0379'   , 'EOP', 'NIM on Loans'                        ),
    ('AC6999'   , 'EOP', 'Revenue Margin'                      ),
    ('AC579'    , 'EOP', 'Charge off rate'                     ),
    ('AC5899'   , 'EOP', 'Efficiency ratio'                    ),
    ('AC629'    , 'EOP', 'ROA'                                 ),
    ('AC359'    , 'EOP', 'ROE'                                 ),
    ('AC619'    , 'EOP', 'Return on Allocated Capital (ROAC)'  );
    

    然后在上面写一个LEFT JOIN(因为你在CASE中有ELSE),类似于:

    SELECT T.FINC_ACCT_NM,
           T.FINC_ACCT_ID,
           T.CURR_END_OF_PERD_ACTL_VAL,
           T.PREV_END_OF_PERD_ACTL_VAL,
           T.VARNC_PLAN_VAL,
           T.OUTLOOK_BDGT_PLAN_VAL,
           T.PERD_END_RPT_DT,
           T.PLAN_VERS_NM,
           T.FRMT_ACTL_CD,
           T.FRMT_PLAN_CD,
           T.RPT_PERD_TYPE_CD,
           COALESCE(N.DISPLAY_NAME, T.FINC_ACCT_NM)
    
    FROM CONT.TABLE T
    JOIN (
        SELECT RPT_PERD_TYPE_CD, DATA_VLDTN_IND, Max(Perd_END_RPT_DT) AS PERD_END_RPT_DT
        FROM CONT.TABLE
        WHERE VERS_NM='Actual'
          AND DATA_VLDTN_IND='Y'
        GROUP BY RPT_PERD_TYPE_CD, DATA_VLDTN_IND
    ) AS MAX_DATES
      ON T.RPT_PERD_TYPE_CD = MAX_DATES.RPT_PERD_TYPE_CD
     AND T.DATA_VLDTN_IND   = MAX_DATES.DATA_VLDTN_IND 
     AND T.PERD_END_RPT_DT  = MAX_DATES.PERD_END_RPT_DT 
    
    LEFT JOIN ACCT_DISPLAY_NAME N
      ON T.FINC_ACCT_ID = N.FINC_ACCT_ID
     AND T.BAL_TYPE_CD  = N.BAL_TYPE_CD
    
    WHERE T.DEPT_ID = 'OR80637'
    
      AND T.RPT_PERD_TYPE_CD IN ('Q', 'M')
    
      AND T.FINC_ACCT_ID IN (
        'AC0006470',
        'AC8000199',
        'AC8002145',
        'AC0006586',
        'AC8000094'
      )
    

    【讨论】:

    • 这有一个额外的好处,即确保使用相同逻辑并使用表的任何其他进程在添加新记录时也会发生变化,而不是通过搜索 1000 个存储的过程来查找那些可能需要更改。特别是因为数据看起来像是可能相对经常变化的类型。
    • @Ben,我刚刚编辑了这个问题,实际上在注意到一些重复后重新考虑了其中的大部分内容。
    猜你喜欢
    • 2019-05-22
    • 1970-01-01
    • 2020-03-15
    • 2018-01-06
    • 1970-01-01
    • 1970-01-01
    • 2014-11-18
    • 2014-12-05
    • 1970-01-01
    相关资源
    最近更新 更多