【发布时间】:2015-02-15 02:26:03
【问题描述】:
我在 Hbase 上通过 phoenix 制作了两张表。
一个是 ORIGIN_LOG,另一个是 ORIGIN_LOG_INDEX。
在 ORIGIN_LOG 中,key 是 info_key。 在ORIGIN_LOG_INDEX中,key是(log_t, zone)
并且我们将log_t、zone、info_key保存在ORIGIN_LOG_INDEX中,这样我们就可以通过ORIGIN_LOG_INDEX中的log_t和zone快速搜索info_key。然后使用info_key,我们可以通过info_key从ORIGIN_LOG获取详细的日志信息,因为info_key是ORIGIN_LOG的key。
但是当我们解释下面的sql时。我们发现它需要对 ORIGIN_LOG 进行全面扫描。
explain select "log_t", "app_ver", "device_id", "mobage_uid", "param1","param2","param3", "param4" , "param5", "user_id", "a_typ", "a_tar", "a_rst" from "ORIGIN_LOG" where "info_key" in (select distinct "info_key" from "ORIGIN_LOG_INDEX" where "log_t">='1423956600' and "log_t"<'1423956601' and "zone" ='18')
CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER ORIGIN_LOG
CLIENT MERGE SORT |
| SKIP-SCAN-JOIN TABLE 0 |
| CLIENT 2-CHUNK PARALLEL 2-WAY SKIP SCAN ON 2 RANGES OVER
ORIGIN_LOG_INDEX [0,'1423956600','18'] - [1,'1423956601','18'] |
| SERVER FILTER BY FIRST KEY ONLY |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [info_key] |
| CLIENT MERGE SORT |
| DYNAMIC SERVER FILTER BY info_key IN ($5.$7) |
如果我们只使用带有条件 log_t 和 zone 的 ORIGIN_LOG,如下:
select "log_t", "app_ver", "device_id", "mobage_uid", "param1","param2","param3", "param4" , "param5", "user_id", "a_typ", "a_tar", "a_rst" from "ORIGIN_LOG" where "log_t">='1423956600' and "log_t"<'1423956601' and "zone" ='18';
我们还可以进行全面扫描。
CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER ORIGIN_LOG |
| SERVER FILTER BY (log_t >= '1423956600' AND log_t < '1423956601' AND zone = '18') |
| CLIENT MERGE SORT |
那么两个sql有什么区别。以及哪个sql的性能更好。
谢谢。
BR
【问题讨论】: