【问题标题】:HiveQL join query - NVL not working in where clauseHiveQL 连接查询 - NVL 在 where 子句中不起作用
【发布时间】:2017-10-06 18:26:27
【问题描述】:

我有一个 HiveQL 查询,如下所示:

create table JOINED as select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

但是这个查询不会选择TABLEA.key=TABLEB.key

  1. TABLEA.attr=NULLTABLEB.attr=NULL。 (或)
  2. TABLEA.attr=0TABLEB.attr=NULL。 (或)
  3. TABLEA.attr=NULLTABLEB.attr=0

以上案例均未选中。为什么会发生这种情况?我是否误解了 NVL() 的使用?

如果 attr 属性为 NULL,我希望它默认为 0。什么是正确的查询?

【问题讨论】:

  • 您是否尝试过使用COALESCE
  • ATTR 列的数据类型是什么?
  • 是的,我也尝试过 COALESCE。没有帮助。
  • 数据类型为 BIGINT。
  • 如果您在其中一张表上进行选择,nvl 和 coalesce 会在您认为 attr 列为空的地方返回什么?

标签: null hive hiveql nvl


【解决方案1】:

谢谢,我刚刚报告了一个错误 -
Incorrect results for INNER JOIN ON clause / WHERE involving NVL / COALESCE

如果您检查执行计划,您会发现对于两个表,我们得到了错误的谓词 attr is not null
从两个表中选择列(例如select TABLEA.*,TABLEB.key)似乎可以避免这个问题。

explain
select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $hdt$_0:tablea 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $hdt$_0:tablea 
          TableScan
            alias: tablea
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                HashTable Sink Operator
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tableb
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

【讨论】:

  • 感谢您的回复。但是,从两个表中选择列的临时解决方案似乎并不能阻止我的问题。无论如何,非常感谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-03-31
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多