【问题标题】:Result-set inconsistency between hive and hive-llaphive 和 hive-llap 之间的结果集不一致
【发布时间】:2020-07-30 17:51:45
【问题描述】:

我们在 HDI 4.0 上使用 Hive 3.1.x 集群,其中 1 是 LLAP,另一个是 Just HIVE。

我们在两个集群上创建了一个托管表,行数为272409

在两个集群上合并之前

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date  | col_count  | col_distinct_count  |        min_lmd         |        max_lmd         |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615            | 272409     | 272409              | 2020-06-15 00:00:12.0  | 2020-07-26 23:42:17.0  |
+---------------------+------------+---------------------+------------------------+------------------------+

Based on the delta, we'd perform a merge operation (which updates 17 rows).

在 hive-llap 集群上合并后(压缩前)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date  | col_count  | col_distinct_count  |        min_lmd         |        max_lmd         |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615            | 272409     | 272392              | 2020-06-15 00:00:12.0  | 2020-07-27 22:52:34.0  |
+---------------------+------------+---------------------+------------------------+------------------------+

在 hive-llap 集群上合并后(压缩后)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date  | col_count  | col_distinct_count  |        min_lmd         |        max_lmd         |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615            | 272409     | 272409              | 2020-06-15 00:00:12.0  | 2020-07-27 22:52:34.0  |
+---------------------+------------+---------------------+------------------------+------------------------+

仅在 hive 集群上合并后(不压缩增量)

+---------------------+------------+---------------------+------------------------+------------------------+
| order_created_date  | col_count  | col_distinct_count  |        min_lmd         |        max_lmd         |
+---------------------+------------+---------------------+------------------------+------------------------+
| 20200615            | 272409     | 272409              | 2020-06-15 00:00:12.0  | 2020-07-27 22:52:34.0  |
+---------------------+------------+---------------------+------------------------+------------------------+

这是观察到的不一致

但是,在 hive-llap 上压缩表后,没有看到结果集不一致,两个集群都返回相同的结果。

We thought it might be due to either caching or llap issue, so we restarted the hive-server2 process which will clear the cache. The issue is still persistent.

We also created a dummy table with same schema on just hive cluster and pointed the location of that table to that of llap one, which in turn is producing result as expected.

We even queried on spark using **Qubole spark-acid reader** (direct hive managed table reader), which is also producing expected result

这很奇怪和奇特,有人可以在这里帮忙。

【问题讨论】:

  • This seems to be an LLAP IO issue, once that is disabled the result-set is consistent.

标签: hive azure-hdinsight qubole spark-hive


【解决方案1】:

我们在 HDInsight Hive llap 集群中也遇到了类似的问题。将hive.llap.io.enabled 设置为false 解决了问题

【讨论】:

  • 是的,Hive-LLAP 中使用的电梯 IO 模型似乎存在问题,ORC 读取工作正常,但在读取编码数据和合并期间出现问题。但是,如果在合并完成后立即处理压缩,问题也会得到缓解。
  • 该问题与在映射器中完成的地图端聚合有关。如果我们set hive.map.aggr=false,那么我们不需要禁用llap IO。在大多数情况下,禁用 llap IO 与禁用 hive.map.aggr 会适得其反。
【解决方案2】:

Qubole 还不支持 Hive LLAP。 (但是,我们(在 Qubole)正在评估是否在未来支持此功能)

【讨论】:

    猜你喜欢
    • 2022-07-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-03-07
    • 1970-01-01
    • 1970-01-01
    • 2018-02-09
    • 2022-01-15
    相关资源
    最近更新 更多