【问题标题】:hive join select very slow蜂巢加入选择非常慢
【发布时间】:2020-09-22 10:49:38
【问题描述】:

你好,我有两个表:user_info,ip_location,一个是50,000,另一个是100,000。 现在需要使用user表的ip来查看归属地,将ip转换为int并与ip_location比较interval。

我的hive版本是3.0.0,这个版本没有索引。

ip_location: enter image description here

这个操作在pg中非常快:

set search_path=res;
select * from(
select ip,
(split_part(ip,'.',1)::bigint*256*256*256
+split_part(ip,'.',2)::bigint*256*256
+split_part(ip,'.',3)::bigint*256
+split_part(ip,'.',4)::bigint)::int8 as ipvalue
from user_info) t1
left join ip_location t2 on 
ipv4_val_begin=(select max(ipv4_val_begin) from ip_location where ipv4_val_begin <= ipvalue);

但我没有在 hive 上找到此语法的替代方法:

select ip,
t2.location_country
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) as ipvalue
from source.v_dm_vip_user t1
left join res.ip_location t2 on 
ipv4_val_begin=(select max(ipv4_val_begin) from res.ip_location where ipv4_val_begin <= ipvalue);

错误: enter image description here

改成如下sql,可以查询成功,但是很慢,需要1天:

select ip,
t2.location_country
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) as ipvalue
from source.v_dm_vip_user t1
left join res.ip_location t2 on 
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) > ipv4_val_begin
and 
cast(split(ip,"\\.")[0] as bigint)*256*256*256
+cast(split(ip,"\\.")[0] as bigint)*256*256
+cast(split(ip,"\\.")[0] as bigint)*256
+cast(split(ip,"\\.")[0] as bigint) < ipv4_val_end;

有没有更好更快的sql?试了很多次还是不行,谢谢。

【问题讨论】:

  • 你有支持with子句的hive版本吗?
  • with子句提速不了,还是需要一天左右,postgres只用了20秒

标签: sql hadoop hive hiveql


【解决方案1】:

我尝试了视图和行组索引,但它不能加快很多。请问像这样用hive如何加快IP地址范围,hive在spark上的速度也慢。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-02-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多