何时终止使用 alpha beta 剪枝和转置表的迭代深化？答案

【问题标题】：When to terminate iterative deepening with alpha beta pruning and transposition tables?何时终止使用 alpha beta 剪枝和转置表的迭代深化？
【发布时间】：2015-10-12 21:20:55
【问题描述】：

我如何知道何时可以停止增加使用 negamax alpha beta 剪枝和转置表的迭代深化算法的深度？以下伪代码取自 wiki 页面：

function negamax(node, depth, α, β, color)
 alphaOrig := α

 // Transposition Table Lookup; node is the lookup key for ttEntry
 ttEntry := TranspositionTableLookup( node )
 if ttEntry is valid and ttEntry.depth ≥ depth
     if ttEntry.Flag = EXACT
         return ttEntry.Value
     else if ttEntry.Flag = LOWERBOUND
         α := max( α, ttEntry.Value)
     else if ttEntry.Flag = UPPERBOUND
         β := min( β, ttEntry.Value)
     endif
     if α ≥ β
         return ttEntry.Value
 endif

 if depth = 0 or node is a terminal node
     return color * the heuristic value of node

 bestValue := -∞
 childNodes := GenerateMoves(node)
 childNodes := OrderMoves(childNodes)
 foreach child in childNodes
     val := -negamax(child, depth - 1, -β, -α, -color)
     bestValue := max( bestValue, val )
     α := max( α, val )
     if α ≥ β
         break

 // Transposition Table Store; node is the lookup key for ttEntry
 ttEntry.Value := bestValue
 if bestValue ≤ alphaOrig
     ttEntry.Flag := UPPERBOUND
 else if bestValue ≥ β
     ttEntry.Flag := LOWERBOUND
 else
     ttEntry.Flag := EXACT
 endif
 ttEntry.depth := depth 
 TranspositionTableStore( node, ttEntry )

 return bestValue

这是迭代深化调用：

while(depth < ?)
{
    depth++;
    rootNegamaxValue := negamax( rootNode, depth, -∞, +∞, 1)
}

当然，当我知道游戏中的总步数时，我可以使用depth < numberOfMovesLeft 作为上限。但是如果没有给出这些信息，我什么时候知道另一个 negamax 调用并没有给出比上一次运行更好的结果？我需要对算法进行哪些更改？

【问题讨论】：

return color * the heuristic value of node 可能就是通常所说的“评估函数”？
是的，完全正确！（取决于玩家）
我的直觉是，当您增加深度并重新评估游戏树时，之前运行的限制应该会失效。这意味着转置表基本上毫无价值（因为它基于不同的（错误的）终端评估。（这被称为评估函数的“水平效应”，IIRC）

标签： algorithm artificial-intelligence alpha-beta-pruning

【解决方案1】：

简短的回答是：时间用完时（换位表与答案/问题无关）

这里我假设你的评估函数是合理的（给出了很好的位置近似值）。

将迭代深化与 alpha beta 相结合的主要思想如下：假设您有 15 秒的时间想出最佳棋步。你能搜索多远？我不知道，也没有人知道。您可以尝试搜索直到 depth = 8 才发现搜索在 1 秒内完成（因此您有 14 秒的可用时间）。通过反复试验，您发现depth = 10 在 13 秒内为您提供结果。所以你决定一直使用它。但是现在出现了严重错误（您的 alpha beta 修剪得不够好，一些位置花费了太多时间来评估）并且您的结果还没有在 15 秒内准备好。所以你要么随机移动，要么输掉了比赛。

为了避免这种情况发生，准备好一个好的结果真是太好了。因此，您执行以下操作。获取depth=1 的最佳结果并将其存储。找到depth=2 的最佳结果，然后覆盖它。等等。不时检查还剩多少时间，如果真的接近时限，请返回您的最佳移动。

现在您无需担心时间，您的方法将给出迄今为止您发现的最佳结果。通过对不同子树的所有这些重新计算，您只会浪费一半的资源（如果您检查整个树，但在 alpha-beta 中您很可能不会）。另一个优点是，现在您可以在每次深度迭代中从最佳到最差重新排序移动，从而使修剪更具侵略性。

【讨论】：

感谢您非常详细的回答。然而，对我来说，它只解释了“迭代深化算法”的概念。我真正想知道的是，是否有一个条件可以让我绝对确定停止搜索并返回“四步将死”之类的内容？
@ZzetT 如果您正在寻找“四步将死”，则不需要迭代深化。 8 层的 Minimax/negamax 会为您找到它。当您只是在寻找最佳移动时，您需要更积极地修剪 ID，从而能够更快地达到更深的深度。你能完全确定结果吗？理论上具有绝对正确的评估函数是的（但是您不需要搜索 - 只需检查 1 plie 之后的所有状态并选择最佳状态）。实际上，您几乎永远不会拥有 100% 正确的评估函数，而且您永远无法确定。