如何在蜂巢中获得毫秒精度？答案

【问题标题】：How do I get millisecond precision in hive?如何在蜂巢中获得毫秒精度？
【发布时间】：2013-09-11 09:33:53
【问题描述】：

documentation 表示时间戳支持以下转换：

•浮点数值类型：解释为 UNIX 时间戳，以秒为单位，精度为小数

首先，我不知道如何解释。如果我有一个时间戳 2013-01-01 12:00:00.423，我可以将其转换为保留毫秒的数字类型吗？因为这就是我想要的。

更一般地说，我需要在时间戳之间进行比较，例如

select maxts - mints as latency from mytable

其中 maxts 和 mints 是时间戳列。目前，这给了我NullPointerException 使用 Hive 0.11.0。如果我执行类似的操作，我可以执行查询

select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable

但这仅适用于秒，而不是毫秒精度。

任何帮助表示赞赏。如果您需要更多信息，请告诉我。

【问题讨论】：

标签： hadoop timestamp hive hiveql

【解决方案1】：

如果您想使用毫秒，请不要使用 unix 时间戳函数，因为这些函数将日期视为自纪元以来的秒数。

hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.

相反，将JDBC compliant timestamp 转换为双精度。
例如：

给定一个制表符分隔的数据：

cat /user/hive/ts/data.txt :
a   2013-01-01 12:00:00.423   2013-01-01 12:00:00.433
b   2013-01-01 12:00:00.423   2013-01-01 12:00:00.733

CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';

那么你可以查询 startTime(st) 和 endTime(et) 之间的毫秒差，如下所示：

select 
  txt, 
  cast(
    round(
      cast((e-s) as double) * 1000
    ) as int
  ) latency 
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;

【讨论】：

谢谢。节省了一天:)