【问题标题】:SQL Oracle script optimizationSQL Oracle 脚本优化
【发布时间】:2026-01-14 21:20:03
【问题描述】:

我有一个表 TRANSACTIONS,其中包含近 3000 万个事务(13 列)。如何优化以下代码?我尝试了自我加入,但它似乎不太有效。

逻辑:如果 receiver_2 存在,我想通过 sender-receiver_2 获取最后交易,否则通过 sender-receiver + 计算一些统计信息( 10/30/90 天)

SELECT T.* FROM
(SELECT T.*, row_number() over (partition by T.SENDER, (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END) order by T.DATE_ACCEPT desc) as seqnum 
FROM 
(
SELECT T.*
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) CNT_10
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_30
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_90 
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
             AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_10
      ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_30
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_90
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) MAX_AMOUNT_10
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_30
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_90
FROM TRANSACTIONS T
) T ) T
WHERE T.SEQNUM = 1

我还在 (SENDER, DATE_ACCEPT) 上创建了索引。

Query plan

TABLE EXAMPLE

【问题讨论】:

  • 请添加带索引的查询计划
  • 您的 SQL 中可能存在复制粘贴错误:您的子查询中总是有 WHEN T.RECEIVER_2 IS NULL THEN,但是,也应该使用 WHEN T2.RECEIVER_2 IS NULL THEN
  • 您能否提供所需的结果(基于您的样本数据)?请将表格数据提供为表格,而不是截图,请参阅meta.*.com/q/285551

标签: sql oracle optimization


【解决方案1】:

你知道Analytic Functions Windowing Clause吗?

我不明白您查询的逻辑,但我想可能没有任何自联接。看看这个查询,它可能是一个起点:

SELECT 
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS CNT_10,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS CNT_30,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS CNT_90,
    AVG(NVL(T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS AVG_30,
    AVG(NVL2(T.RECEIVER_2, T.AMOUNT_2, T.AMOUNT)) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS AVG_90
FROM TRANSACTIONS

注意,RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) 等于 RANGE INTERVAL '10' DAY PRECEDING)

另一个注意事项,当我对示例数据运行查询时,我得到了

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID_TRAN|SENDER|RECEIVER|RECEIVER_2|AMOUNT|AMOUNT_2|DATE_ACCEPT        |CNT_10|CNT_30|CNT_90|AVG_AMOUNT_10|AVG_AMOUNT_30|AVG_AMOUNT_90|MAX_AMOUNT_10|MAX_AMOUNT_30|MAX_AMOUNT_90|SEQNUM|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1      |00010 |22222   |1112      |3000  |1000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|1      |00010 |22222   |2114      |3000  |2000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|2      |01236 |45872   |          |4000  |        |01.04.2021 22:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|3      |45872 |00010   |          |5000  |        |17.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

这看起来毫无意义。

【讨论】:

  • 我尝试使用分析函数,但平均值计算不正确,我将按照您的示例尝试。并且示例表只是用来理解表中数据的一块,不是用来检查计算的
  • “计算一些统计数据(10/30/90 天)”并不是很精确。但是窗口函数实际上是为这样的计算而设计的。
  • 这是 windowing_clause 的一个很好的用法(不确定是谁以及为什么 dw),不幸的是它在 ORA-30487 计算时失败了 count distinct .但是,如果事务表包含 unique ID_TRANS @SunMoon,则 count distinct 可能是 overkill
  • OP 的示例实际上实现的不是AND CURRENT ROW ,而是AND CURRENT ROW EXCLUDE TIES(因为他只考虑lover 日期的交易而不是last 交易)。这就是样本数据结果中出现nulls 的原因。不确定这是有意还是仅已实现EXCLUDE TIES 需要 21c。 @SunMoon 另请参阅我的回答中的相关说明。
  • @MarmiteBomber,使用RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND INTERVAL '0.001' SECOND PRECEDING 也应该可以。
【解决方案2】:

查询中的主要问题是谓词中的CASE。它使任何索引的使用无效。因此,您需要使用虚拟列:

ALTER TABLE Transactions ADD rec AS (
     CASE WHEN RECEIVER_2 IS NULL 
     THEN RECEIVER ELSE RECEIVER_2 END
);

第二步是用该列创建索引:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept)

但是,由于查询语法的原因,可能不会使用索引。将CASE 语法替换为新创建的列rec,并将greatest per group 解决方案重写为自联接。我添加了简化的 SQL 示例来说明如何操作。

select t.*,
    (
           select count(DISTINCT T2.id_tran)
           from transactions T2
           where T2.date_accept > T.date_accept - 10
                 AND T2.date_accept < T.date_accept
                 AND T2.rec = T.rec
                 AND T2.sender = T.sender
    ) CNT_10
from (
    select sender, rec, max(date_accept)
    from transactions
    group sender, rec
) tmax 
join transactions t on t.sender = tmax.sender and
                       t.rec = tmax.rec and
                       t.date_accept = tmax.date_accept

如果您希望统计子查询速度超快,还可以添加其中使用的其他列:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept, id_tran, amount)

【讨论】:

  • 问题是receiver_2可以在receiver的位置。我需要将它们分开,因为总和会明显不同。 Receiver - 组织主管,receiver_2 - 员工
  • @SunMoon 我知道您需要将它们分开。我的建议包括一个虚拟列,它简化了查询并启用了索引使用。我不建议摆脱 receiver_2receiver 属性。
  • 我明白了。谢谢我试试
【解决方案3】:

在 (SENDER, DATE_ACCEPT) 上建立索引可能会有所帮助。

您可以通过使用带有条件聚合的LATERAL JOIN 来简化和加速查询。
它允许计算超过 1 个 COUNT/AVG/MAX。

例如:

SELECT T.*, LT.*
FROM (
  SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
  FROM (
    SELECT SENDER
    , RECEIVER, RECEIVER_2
    , DATE_ACCEPT
    , AMOUNT, AMOUNT_2 
    , ROW_NUMBER() OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT DESC) AS RN
    FROM TRANSACTIONS
  ) TRANS
  WHERE RN = 1
) T
CROSS JOIN LATERAL (
  SELECT 
    COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_10
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_30
  , COUNT(DISTINCT
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN T2.ID_TRAN
    END) AS CNT_90
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_10
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_30
  , NVL(AVG(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS AVG_AMOUNT_90
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_10
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_30
  , NVL(MAX(
    CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 
          AND T2.DATE_ACCEPT < T.DATE_ACCEPT
    THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END
    END), 0) AS MAX_AMOUNT_90
  FROM TRANSACTIONS T2
  WHERE T2.SENDER = T.SENDER 
    AND T2.DATE_ACCEPT > T.DATE_ACCEPT - 90
    AND NVL(T2.RECEIVER_2, T2.RECEIVER) = NVL(T.RECEIVER_2, T.RECEIVER)
) LT;
SENDER RECEIVER RECEIVER_2 DATE_ACCEPT AMOUNT AMOUNT_2 CNT_10 CNT_30 CNT_90 AVG_AMOUNT_10 AVG_AMOUNT_30 AVG_AMOUNT_90 MAX_AMOUNT_10 MAX_AMOUNT_30 MAX_AMOUNT_90
1 2 3 30-MAR-21 10 20 1 2 3 11.2 21.65 45.5 11.2 32.1 93.2

dbfiddle here

上的演示

【讨论】:

    【解决方案4】:

    您与receiver / receiver2 的逻辑使它只是令人困惑,这不是性能问题原因

    对于我在示例中使用的 SENDER, RECEIVER, AMOUNTDATE_ACCEPT 的简单模型,您会遇到同样的问题 - 适应您的目的。

    首先你应该明白问题的原因是什么

    您正在加入一个带有历史记录的大型事务表,产生的结果很大 聚合它并计算聚合度量。

    关键思想是先聚合,第二步加入事务表。

    下面的查询首先计算每个发送者/接收者的max_date_accept,以计算 使用历史窗口在下一步中聚合度量(例如窗口 10 天 - 根据需要进行调整)。

    请注意,我通过添加谓词 DATE_ACCEPT &lt; max_date_accept 复制了您在计算中忽略 last 事务的逻辑。

    如果在计算的时间间隔内只有一笔交易,这将导致NULL 上的结果,这可能不是您想要的

    with trans as (
    select 
     ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
     max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
    from TRANSACTIONS t
    )
    select 
      SENDER, RECEIVER,
      count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
      avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
      max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
    from trans  
    group by SENDER, RECEIVER; 
    

    可能这个结果已经是你想要的了,但是如果你真的想要完整的集合 来自事务表的列第一个事务的值,简单地将聚合结果连接到事务表:

    with trans as (
    select 
     ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
     max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
    from TRANSACTIONS t
    ),
    agg as (
    select 
      SENDER, RECEIVER,
      count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
      avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
      max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
    from trans  
    group by SENDER, RECEIVER),
    trans2 as (
    select
     t.*,
     row_number() over (partition by SENDER, RECEIVER order by DATE_ACCEPT desc) as seqnum
    from TRANSACTIONS t)
    select
     trans2.*,
     agg.CNT_10, agg.AVG_AMOUNT_10, agg.MAX_AMOUNT_10
    from trans2
    join agg on trans2.SENDER = agg.SENDER and trans2.RECEIVER = agg.RECEIVER
    where seqnum = 1; 
    

    性能说明 - 检查查询的 execution plan

    您应该只看到TABLE ACCESS FULLHASH JOIN。如果他们使用NESTED LOOPSFILTERINDEX ACCESS 连接,您的类型的查询经常会出现问题。

    【讨论】: