Avishek 的回答涵盖了默认值。我将阐明计算最佳值。举个例子,
示例:6 个节点,每个节点有 16 个内核和 64Gb RAM
每个执行器都是 JVM 实例。所以node上可以执行多个executor。
让我们从选择每个执行程序的核心数开始:
Number of cores = Concurrent tasks as executor can run
One may think if there is higher concurrency, performance will be better. However, experiments have shown that spark jobs perform well when the number of cores = 5.
If number of cores > 5, it leads to poor performance.
Note that 1 core and 1 Gb is needed for OS and Hadoop Daemons.
现在,计算执行者的数量:
As discussed earlier, there are 15 cores available for each node and we are planning for 5 cores per executors.
Thus number of executors per node = 15/5 = 3
Total number of executors = 3*6 = 18
Out of all executors, 1 executor is needed for AM management by YARN.
Thus, final executors count = 18-1 = 17 executors.
每个执行器的内存:
Executor per node = 3
RAM available per node = 63 Gb (as 1Gb is needed for OS and Hadoop Daemon)
Memory per executor = 63/3 = 21 Gb.
Some memory overhead is required by spark. Which is max(384, 7% of memory per executor).
Thus, 7% of 21 = 1.47
As 1.47Gb > 384Mb, subtract 1.47 from 21.
Hence, 21 - 1.47 ~ 19 Gb
最终数字:
Executors - 17, Cores 5, Executor Memory - 19 GB
注意:
1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.
2. Memory per executor should be less than 40 else there will be a considerable GC overhead.