Hive-03-2 常用函数

Hive常用函数大全一览

查看函数详情
   desc function extended from_unixtime;

hive中split、coalesce及collect_list函数的用法（可举例）

Split将字符串转化为数组。

split('a,b,c,d' , ',') ==> ["a","b","c","d"]

collect_list 列出该字段所有的值，不去重 select collect_list(id) from table;

判空的处理

NVL（表达式1，表达式2）

如果表达式1为空值，NVL返回值为表达式2的值，否则返回表达式1的值。

该函数的目的是把一个空值（null）转换成一个实际的值。其表达式的值可以是数字型、字符型和日期型。但是表达式1和表达式2的数据类型必须为同一个类型。

hive> select nvl(1,0);
　　1
hive> select nvl(null,"hello");
　　hello

COALESCE(T v1, T v2, …) 返回参数中的第一个非空值；如果所有值都为 NULL，那么返回NULL。

COALESCE(T v1, T v2, …)

返回第一非null的值，如果全部都为NULL就返回NULL

1. 日期时间函数

将mongodb中时区转换过来：由UTC时区转换为 GMT时区，差8个小时

date_format(from_utc_timestamp( CONCAT_WS(' ',substring(updatetime,1,10),substring(updatetime,12,8) ) ,'GMT+8'),'yyyy-MM-dd HH:mm:ss') updatetime

select date_format(from_utc_timestamp(create_time,"UTC"),'yyyy-MM-dd HH:mm:ss') as local_time

select date_format(from_utc_timestamp(create_time,"GMT+8"),'yyyy-MM-dd HH:mm:ss') as local_time


　　时间戳 秒S是10位；  毫秒ms是13位;
1. 时间戳转日期format格式 
    from_unixtime(bigint unixtime, string format),将时间的秒值转换成format格式（format可为"yyyy-MM-dd HH:mm:ss","yyyy-MM-dd HH","yyyy-MM-dd HH:mm"等等）
    select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss'); <==> select from_unixtime(unix_timestamp())
   date_format(from_unixtime(cast(h.updatetime as int)),'yyyy-MM-dd HH:mm:ss')

  今天:   select date_format(current_timestamp,'yyyy-MM-dd')
  前一天: select date_sub(current_date,1);


2. 日期转时间戳 
select unix_timestamp();  //获得当前时区的UNIX时间戳，10位单位为妙 
select unix_timestamp('2020-12-18 12','yyyy-MM-dd HH');
select unix_timestamp('2020-12-18 09:42:30'); <==> unix_timestamp('2020-12-18 09:42:30','yyyy-MM-dd HH:mm:ss');

to_date(string timestamp)  STRING/TIMESTAMP/DATEWRITABLE types, got LONG, 返回时间字符串的日期部分 
select to_date('2020-09-10 10:31:31');   -> 2020-09-10; 

year(string date) 返回时间字符串的年份部分
month(string date) 返回时间字符串的月份部分
day(string date) 返回时间字符串的天
hour（stirng str）日期转换为小时, str必须是 yyyy-MM-dd HH:mm:ss 格式 
-----------------------------------------------------------------------------------------------------------------
1）date_format函数（根据格式整理日期）
    select date_format('2020-06-14 12:20:15','yyyy-MM-dd HH:mm:ss');
        2020-06-14 12:20:15
    select date_format('2020-06-14 12:20:15','yyyy-MM-dd HH');
        2020-06-14 12

2） date_add 函数（加减日期）
    date_add(string startdate, int days) 从开始时间startdate加上days
    date_sub(string startdate, int days) 从开始时间startdate减去days
      select date_add('2020-06-14',-1);  ##等同于 select date_sub('2020-06-14',1);
        2020-06-13
      select date_add('2020-06-14',2);
        2020-06-14
3） next_day 函数
    （1）取当前天的下一个周一
    select next_day('2020-06-10','MO'); ##等同于 select next_day('2020-06-10','Monday')
        2020-06-15
        说明：星期一到星期日的英文（ Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday）
    （2）取当前周的周一
    select date_add(next_day('2020-06-14','MO'),-7);
        2020-06-8
    4）last_day函数（求当月最后一天日期）
    select last_day('2020-06-14');
        2020-06-30

2. 字段的合并

1）concat函数
concat函数在连接字符串的时候，只要其中一个是NULL，那么将返回NULL
hive> select concat('a','b','c');
　　  abc
hive> select concat('a','b',null);
　　  NULL
hive> select concat(payid,' ',carlogid)  ##使用' '将两个字段分隔开=> 
　　 01a893092b914703b75941b713767ebf 408693
hive> select order_id,concat(order_status,'=',operate_time)...
　　 1101 1001=2020-01-01 11:20:30
　　 1102 1002=2020-02-02 12:30:40

concat(sum(total_amount_pur),'&&',sum(total_amount_sig),'&&',sum(total_amount_jump))
　　0.0&&16665.0&&0.0
concat(substr(summary_time,9,2),'',substr(summary_time,6,2),'',substr(summary_time,1,4),'_',concat(market_id),'_' ,concat(mid))
09022019_108_0
12022019_108_0
21022019_108_0


2）concat_ws函数
concat_ws函数在连接字符串的时候，只要有一个字符串不是NULL，就不会返回NULL。concat_ws函数需要指定分隔符。
hive> select concat_ws('-','a','b','c');
　　　 a-b-c

hive> select concat_ws('-','a','b',null);
　　　 a-b
CONCAT_WS("/",r.province,r.city,a.area) channel_address  => 北京/北京市/朝阳区 ，字段必须是string；
 
concat_ws("_" ,substr(summary_time,9,2),substr(summary_time,6,2),substr(summary_time,1,4),concat(market_id),concat(mid))
09_03_2019_108_0
13_03_2019_108_0
21_03_2019_108_0

concat_ws('',array('a', 'b', 'c')) 
　　abc

3. 字符串截取函数substr, substring

语法: substr(string A, int start), substring(string A, int start)
      substr(string A, int start, int len), substring(string A, int start, int len)
返回值: string
说明：返回字符串A从start位置到结尾的字符串 或 返回字符串A从start位置开始，长度为len的字符串
select substring('abcde',3);   select substr('abcde',3);     返回 cde 
select substring('abcde',3,2); select substr('abcde',3,2);   返回 cd 

substring(h.id, 10, 24) 59409d1d2cdcc90b91c62be5    ObjectId(59409d1d2cdcc90b91c62be5)

4. 数值累加

1. 需求分析

group by 与 sum() over partition by ( ) 的区别：

select  channel_type ,sum(num) from  record group by channel_type;
channel_type   　　  _c1
A                    30
B                    15
C                    19

select channel_type,dt,sum(num) over (partition by channel_type) from  record;  
select channel_type,dt,num, sum(num) over (partition by channel_type order by dt) from record;

channel_type    dt    sum_window_0
A    2015-01-02    30
A    2015-01-01    30
A    2015-01-05    30
A    2015-01-04    30
A    2015-01-02    30
A    2015-01-03    30
B    2015-01-03    15
B    2015-01-01    15
B    2015-01-02    15
B    2015-01-02    15
C    2015-02-01    19
C    2015-01-30    19
C    2015-01-30    19
C    2015-02-02    19


channel_type    dt    num    sum_window_0
A    2015-01-01    8    8
A    2015-01-02    4    17
A    2015-01-02    5    17
A    2015-01-03    2    19
A    2015-01-04    5    24
A    2015-01-05    6    30
B    2015-01-01    1    1
B    2015-01-02    9    13
B    2015-01-02    3    13
B    2015-01-03    2    15
C    2015-01-30    8    15
C    2015-01-30    7    15
C    2015-02-01    1    16
C    2015-02-02    3    19

View Code