【发布时间】:2021-06-22 04:22:05
【问题描述】:
我有一个使用 ROW_NUMBER() 的查询。我有这样的事情:
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
查询运行得非常快。但是,如果我添加对“rnum”列的任何引用,查询就会变慢。所以,看起来只有 ROW_NUMBER() 不是问题,但是当我在实际查询中使用“rnum”时,它会爬行大约 30 秒。
有什么想法吗?
供参考,这里是查询:
WITH aquire AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT d.trans_id, d.source, 'AquireMedia' AS provider,
d.trans_time AS publish_date, '/research/get_news.php?id=' || d.trans_id AS story_link,
i.name AS industry_name, s.sector_name, d.headline AS subject, NULL AS teaser,
NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(d.trans_id,'AQUIREMEDIA') AS tickers,
ROW_NUMBER() OVER (PARTITION BY d.trans_id ORDER BY d.trans_time DESC) as rtnum
FROM story_descriptions_3m d, story_tickers_3m t, uber_master_mv m, industry i, ind_sector ix, sectors s, comp_ind c
WHERE d.trans_id = t.trans_id
AND t.m_ticker = m.m_ticker
AND t.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1),
partner AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT CAST(n.story_id AS VARCHAR2(20)) trans_id, n.provider AS source, 'Partner News' AS provider,
n.story_date AS publish_date, n.link AS story_link, i.name AS industry_name, s.sector_name, n.title AS subject,
CAST(substr(n.teaser,1,4000) AS VARCHAR2(4000)) AS teaser, NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(n.story_id,'OTHER') AS tickers,
ROW_NUMBER() OVER (PARTITION BY n.story_id ORDER BY n.story_date DESC) as rtnum
FROM news_stories_3m n, news_stories_lookup_3m t, comp_ind c, uber_master_mv m, industry i, ind_sector ix, sectors s
WHERE t.story_id = n.story_id
AND t.ticker = m.ticker
AND m.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1)
SELECT trans_id, source, provider,
TO_CHAR(publish_date,'MM/DD/YYYY HH24:MI:SS') AS publish_date,
UNIX_TIMESTAMP(publish_date) AS timestamp,
story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers,
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM aquire WHERE rtnum <= 5
UNION ALL
SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM partner WHERE rtnum <= 5))
WHERE rnum BETWEEN 1 AND 1 * 5;
【问题讨论】:
-
优化器非常擅长进行可能节省大量时间的琐碎更改。如果您在子查询中将
rnum定义为row_number(...)(或其他任何内容),但是您没有在主查询中引用它,那么优化器会在子查询中简单地忽略rnum。你没有产生幻觉——你注意到的是优化器在做它的工作。现在,也许,你会改变你的问题:“假设我确实需要rnum,有没有办法让查询更快”?答案是“可能,但这取决于你的查询在做什么”——你没有告诉我们。 -
这是一个有点长的查询。我可以发一下,也许有人有想法.......
-
您当前形式的问题几乎太宽泛了。如果你有性能问题,你应该在上面的查询上运行
EXPLAIN并找到瓶颈。完成此操作后,Stack Overflow 是一个很好的地方,可以就如何提高查询性能提出一些建议。 -
性能,只要我不引用“rnum”列,就可以了。但是,如果我引用“rnum”列,无论是在条件中使用它还是只是将它作为列返回,那么查询非常慢............
-
如果要“将其作为列返回”,那么就没有进一步优化的可能——Oracle 已经掌握了所有信息,而且它的优化比我们做得更好。如果您在主查询中的条件中引用
rnum(例如where rnum = 1之类的条件),则可以通过其他方式重写查询以使用其他方式,而不是row_number(),从而以更好的性能获得相同的结果;但具体如何做到这一点取决于查询必须做什么,再说一次 - 我们不知道那是什么。
标签: oracle row-number