【发布时间】:2019-04-05 02:22:51
【问题描述】:
我们最近从 MapReduce 迁移到 TEZ,以便在 EMR 上执行 Hive 查询。我们看到了针对确切的 hive 查询启动非常不同数量的映射器的情况。请参阅下面的地图 3 阶段。在第一次运行时,它请求了 305 个资源,而在另一次运行时,它请求了 4534 个映射器。 (请忽略 KILLED 状态,因为我手动终止了查询。)为什么会发生这种情况?我们如何将其更改为基于基础数据大小?
运行 1
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container KILLED 5 0 0 5 0 0
Map 3 container KILLED 305 0 0 305 0 0
Map 5 container KILLED 16 0 0 16 0 0
Map 6 container KILLED 1 0 0 1 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/06 [>>--------------------------] 0% ELAPSED TIME: 14.16 s
----------------------------------------------------------------------------------------------
运行 2
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 5 5 0 0 0 0
Map 3 container KILLED 4534 0 0 4534 0 0
Map 5 .......... container SUCCEEDED 325 325 0 0 0 0
Map 6 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 container KILLED 333 0 0 333 0 0
Reducer 4 container KILLED 796 0 0 796 0 0
----------------------------------------------------------------------------------------------
VERTICES: 03/06 [=>>-------------------------] 5% ELAPSED TIME: 527.16 s
----------------------------------------------------------------------------------------------
【问题讨论】:
标签: hive amazon-emr apache-tez