【问题标题】:Graceful HiveQL query优雅的 HiveQL 查询
【发布时间】:2018-03-07 01:16:18
【问题描述】:

我有一个这样的文件:

232404812.913232|1248|ip:tcp:jxta
232404812.913238|66|ip:udp:data
232404812.913615|98|ip:udp:l2tp:ppp:ip:tcp

我执行了以下 HiveQL 命令:

CREATE EXTERNAL TABLE b_packet (timestamp string, packet_length int, protocol string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|" 
LOCATION 's3://b-file/input/'; 

CREATE EXTERNAL TABLE b_packet_out (protocol string, cnt int) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" 
LOCATION 's3://b-file/output/1/'; 

INSERT OVERWRITE TABLE b_packet_out SELECT 'overall', 
COUNT(*) FROM b_packet GROUP BY protocol; 

INSERT INTO TABLE b_packet_out SELECT 'tcp', 
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:tcp'; 

INSERT INTO TABLE b_packet_out SELECT 'udp', 
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:udp'; 

INSERT INTO TABLE b_packet_out SELECT 'icmp', 
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:icmp'; 

这样我在输出表中有以下内容。

hive> select * from b_packet_out;
OK
udp 2241
overall 10000
icmp    64
tcp 7633

HiveQL 查询是否有更优雅的方式,以便我可以减少行数以获得相同的输出?

【问题讨论】:

    标签: sql hadoop hive hiveql


    【解决方案1】:
    select 
    count(*) as overall,
    sum( if(protocol like '^ip:tcp',1,0) as tcp,
    sum( if(protocol like '^ip:udp',1.0) as udp,
    sum( if(protocol like '^ip:icmp'1,0) as icmp 
    from b_packet  
    

    这会在一次数据传递中生成相同的计数。

    如果你有更多的协议,你也可以说 选择 拆分(协议,':')[1], 数数(*) 按拆分分组(协议,':')[1] 但这不会给出总体计数。

    【讨论】:

      【解决方案2】:

      这是一个不同的解决方案,但它会多次传递数据,并不能真正为您节省代码行数:

      SELECT          CASE WHEN GROUPING__ID = 0 THEN 'overall' ELSE 
                              CASE WHEN protocol LIKE 'ip:tcp%' THEN 'tcp'
                                   WHEN protocol LIKE 'ip:udp%' THEN 'udp'
                                   WHEN protocol LIKE 'ip:icmp%' THEN 'icmp'   END  END    AS protocol 
                      , COUNT(1)                                                           AS cnt 
      FROM            b_packet  
      GROUP BY        CASE WHEN protocol LIKE 'ip:tcp%' THEN 'tcp'
                           WHEN protocole LIKE 'ip:udp%' THEN 'udp'
                           WHEN protocol LIKE 'ip:icmp%' THEN 'icmp'   END
      GROUPING SETS   (
                          (CASE WHEN protocol LIKE 'ip:tcp%' THEN 'tcp'
                                WHEN protocol LIKE 'ip:udp%' THEN 'udp'
                                WHEN protocol LIKE 'ip:icmp%' THEN 'icmp'  END)
                          , () 
                      ) 
      

      【讨论】:

        猜你喜欢
        • 2011-03-24
        • 2017-03-19
        • 2014-10-17
        • 2010-11-27
        • 2013-09-24
        • 2018-08-15
        • 1970-01-01
        • 2017-11-27
        • 1970-01-01
        相关资源
        最近更新 更多