【发布时间】:2011-06-17 09:04:43
【问题描述】:
我正在尝试优化包含交叉连接的查询。我有大型查询,我继续与派生表交叉连接。
将派生表转为视图会提高查询速度吗?或者甚至在永久表中捕获这些信息?
这是我的查询
SELECT VIEWER_ID,
QUESTION_ID,
ANSWER_ID,
sum(ANSWER_SCORE) AS ANSWER_SCORE_SUMMED
FROM(SELECT cr.COMMUNICATIONS_ID AS ANSWER_ID,
cr.CONSUMER_ID as VIEWER_ID,
nc.PARENT_COMMUNICATIONS_ID AS QUESTION_ID,
case when cr.CONSUMER_ID= nc.SENDER_CONSUMER_ID then 3*((24/(((UNIX_TIMESTAMP(NOW())-UNIX_TIMESTAMP(cal.LAST_MOD_TIME)+3600)/3600))*(ces.EXPERT_SCORE * cirm.CONSUMER_RATING) + (12.5 * scs.SIMILARITY)* (1 - EXP(-0.5 * (cal.TIPS_AMOUNT / ATV.AVG_TIPS)) + .15)))
else ((24/(((UNIX_TIMESTAMP(NOW())-UNIX_TIMESTAMP(cal.LAST_MOD_TIME)+3600)/3600))*(ces.EXPERT_SCORE * cirm.CONSUMER_RATING) + (12.5 * scs.SIMILARITY)* (1 - EXP(-0.5 * (cal.TIPS_AMOUNT / ATV.AVG_TIPS)) + .15)))
end as ANSWER_SCORE
FROM (SELECT 234 AS CONSUMER_ID,
ACTION_LOG_ID,
COMMUNICATIONS_ID
FROM consumer_action_log
WHERE COMM_TYPE_ID=4) AS cr
JOIN network_communications AS nc
ON cr.COMMUNICATIONS_ID=nc.COMMUNICATIONS_ID
JOIN consumer_action_log AS cal
ON cr.ACTION_LOG_ID=cal.ACTION_LOG_ID
JOIN communication_interest_mapping AS cim
ON nc.PARENT_COMMUNICATIONS_ID=cim.COMMUNICATION_ID
JOIN consumer_interest_rating_mapping AS cirm
ON cr.CONSUMER_ID=cirm.CONSUMER_ID
AND cim.CONSUMER_INTEREST_EXPERT_ID=cirm.CONSUMER_INTEREST_ID
JOIN consumer_expert_score AS ces
ON nc.SENDER_CONSUMER_ID=ces.CONSUMER_ID
AND cim.CONSUMER_INTEREST_EXPERT_ID=ces.CONSUMER_EXPERT_ID
JOIN survey_customer_similarity AS scs
ON cr.CONSUMER_ID=scs.CONSUMER_ID_2
AND cal.SENDER_CONSUMER_ID=scs.CONSUMER_ID_1
OR cr.CONSUMER_ID=scs.CONSUMER_ID_1
AND cal.SENDER_CONSUMER_ID=scs.CONSUMER_ID_2
CROSS JOIN
(
SELECT AVG(cal.TIPS_AMOUNT) AS AVG_TIPS
FROM CONSUMER_ACTION_LOG AS cal
JOIN (SELECT 234 AS CONSUMER_ID,
ACTION_LOG_ID,
COMMUNICATIONS_ID
FROM consumer_action_log
WHERE COMM_TYPE_ID=4) AS cr
ON cal.SENDER_CONSUMER_ID=cr.consumer_id
) ATV) AS ASM
GROUP BY ANSWER_ID
ORDER BY ANSWER_SCORE_SUMMED DESC;
这是一个很长的查询,所以不需要全部阅读。要点很简单,就是有一个交叉连接。我是 sql 新手,但有人告诉我交叉连接会减慢速度。
【问题讨论】:
-
交叉连接减慢速度的原因是它们返回了很多很多的记录。假设你的交叉连接是一个有 100 条记录的表,它连接到一个有 1000 条记录的数据集,结果数据集将是 100,000 条记录,这显然比返回原始 1000 条记录需要更多的时间。但如果你需要数据,你就需要数据。
-
同意,另外一个原因是它阻止了整个查询被 rdbms 引擎优化。我已经看到 oracle 的 CBO 与交叉连接的斗争。 (通常交叉连接也意味着设计有问题;)
标签: sql mysql join cross-join sql-optimization