【发布时间】:2018-03-07 01:16:18
【问题描述】:
我有一个这样的文件:
232404812.913232|1248|ip:tcp:jxta
232404812.913238|66|ip:udp:data
232404812.913615|98|ip:udp:l2tp:ppp:ip:tcp
我执行了以下 HiveQL 命令:
CREATE EXTERNAL TABLE b_packet (timestamp string, packet_length int, protocol string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
LOCATION 's3://b-file/input/';
CREATE EXTERNAL TABLE b_packet_out (protocol string, cnt int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION 's3://b-file/output/1/';
INSERT OVERWRITE TABLE b_packet_out SELECT 'overall',
COUNT(*) FROM b_packet GROUP BY protocol;
INSERT INTO TABLE b_packet_out SELECT 'tcp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:tcp';
INSERT INTO TABLE b_packet_out SELECT 'udp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:udp';
INSERT INTO TABLE b_packet_out SELECT 'icmp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:icmp';
这样我在输出表中有以下内容。
hive> select * from b_packet_out;
OK
udp 2241
overall 10000
icmp 64
tcp 7633
HiveQL 查询是否有更优雅的方式,以便我可以减少行数以获得相同的输出?
【问题讨论】: