Optaplanner：并行求解时随机非常低的“每秒平均计算计数”答案

【问题标题】：Optaplanner: Randomly very low "average calcultate count per second" when solving in parallelOptaplanner：并行求解时随机非常低的“每秒平均计算计数”
【发布时间】：2017-08-16 19:32:02
【问题描述】：

我正在使用 Optaplanner 解决一个相对较小的优化问题。对于我的用例，需要许多这样的优化，这就是我开始并行运行它们的原因。并行性基于 Java 8'parallel stream。它不允许控制要使用的实际线程数，但我相信它是基于可用的 CPU 计数。

对于大多数求解器运行，这似乎工作正常，但我注意到有时我会从一次运行中得到无效的解决方案，当仅单独运行该问题时，这些解决方案无法重现。

检查日志后，我注意到“每秒平均计算计数”对于无效解决方案非常低，而对于其他运行则很好。事实上，无效的解决方案实际上是（天真构建的）初始解决方案：

[rkJoinPool.commonPool-worker-6] (DefaultSolver.java:203)       Solving started: time spent (0), best score (-5hard/-2medium/168soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-6] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-2medium/233soft).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-5hard/-1medium/579soft), environment mode (REPRODUCIBLE), random (JDK with seed 0). 
[rkJoinPool.commonPool-worker-4] (DefaultConstructionHeuristicPhase.java:158) Construction Heuristic phase (0) ended: step total (0), time spent (1), best score (-5hard/-1medium/617soft).
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:203)       Solving started: time spent (1), best score (-6hard/-3medium/137soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
[rkJoinPool.commonPool-worker-7] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (42), time spent (704), best score (0hard/0medium/808soft).
[rkJoinPool.commonPool-worker-4] (DefaultLocalSearchPhase.java:152) Local Search phase (1) ended: step total (22), time spent (218), best score (0hard/0medium/1033soft). 
[rkJoinPool.commonPool-worker-5] (DefaultSolver.java:238)       Solving ended: time spent (210), best score (-6hard/-3medium/137soft), average calculate count per second (4), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-7] (DefaultSolver.java:238)       Solving ended: time spent (746), best score (0hard/0medium/808soft), average calculate count per second (25256), environment mode (REPRODUCIBLE).
[rkJoinPool.commonPool-worker-4] (DefaultSolver.java:238)       Solving ended: time spent (219), best score (0hard/0medium/1033soft), average calculate count per second (30461), environment mode (REPRODUCIBLE).

注意线程 4 和 7 如何以 25-30k accs 产生良好的结果，而线程 5 产生无效结果并且只使用了 4 个 accs（鉴于 200 毫秒的终止超时，我假设实际上只采取了一个步骤）。

使用了以下配置，该配置是使用基准测试器确定的（尽管是在单线程设置中）：

<termination>
    <millisecondsSpentLimit>2000</millisecondsSpentLimit>
    <unimprovedMillisecondsSpentLimit>200</unimprovedMillisecondsSpentLimit>
</termination>
<constructionHeuristic>
    <constructionHeuristicType>FIRST_FIT</constructionHeuristicType>
</constructionHeuristic>
<localSearch>
    <localSearchType>HILL_CLIMBING</localSearchType>
</localSearch>

我认为这个问题与多个求解器在使用基于时间的终止标准时并行运行的事实有关。终止时间是基于“挂墙时间”还是基于实际 CPU 时间？

并行运行时使用基于时间的终止标准不是一个好主意吗？尽管这似乎是使用所有可用计算能力的最佳方式。是什么导致单个求解器看似随机只执行这么少的步骤？

【问题讨论】：

你想做什么？多赌解决？爬山通常不如禁忌搜索、延迟接受等。
我正在尝试从可用的球队中建立一个足球运动员名单。这是模拟的一部分，在该模拟中，针对不同的团队和对决反复进行。我将不得不再次研究提到的元启发式方法 - 我认为它们在基准测试中根本不起作用。
TS 和 LA 可以。 SA 并非没有额外的配置。

标签： multithreading optimization optaplanner

【解决方案1】：

millisecondsSpentLimit 和 unimprovedMillisecondsSpentLimit 基于挂墙时间，而不是实际 CPU 时间。

AFAIK，并行流不会将线程数限制为 CPU 的数量，因为这些作业可能会在 IO 下阻塞（Solver.solve() 调用并非如此）。我更喜欢使用线程池大小为Math.max(1, Runtime.getRuntime().availableProcessors() - 2) 的ExecutorService。

【讨论】：

我会试试这个。老实说，我只使用了基于流的并行性，因为它最容易实现。作为替代方案，使用基于步骤的终止标准会更好，就像这里描述的那样？ docs.optaplanner.org/7.1.0.Final/optaplanner-docs/html_single/…
是的。虽然 scoreCalucationLimit 在 TS 和 LA 之间可能更公平（因为 TS 是慢步，而 LA 是快步）。
我重新运行（并扩展了）我的基准测试，尤其专注于为每个元启发式算法找到一个好的配置。最后，Tabu Search 为我的问题提供了更可靠的结果（尽管爬山和其他问题并没有更糟）。我最终没有选择 scoreCalculationLimit ，而只选择了 500ms 的未改进的MillisecondsLimit。使用该配置，我再也没有看到问题。关于流和线程数的话题，我遵循了这里的建议：stackoverflow.com/a/22269778/2574340