为什么 Alpha/Beta 修剪对我的 MiniMax 算法没有影响？答案

【问题标题】：Why is Alpha/Beta pruning having no effect on my MiniMax algorithm?为什么 Alpha/Beta 修剪对我的 MiniMax 算法没有影响？
【发布时间】：2017-03-23 23:00:33
【问题描述】：

首先我很抱歉标题略有错误，我只是不希望它长 30 个字。当我将它应用到我的井字游戏时，我实施的 alpha/beta 修剪大大减少了评估量，请参阅下面的内容。

每对评估计数都是在与输入相同的游戏状态下测量的。

当我想对我一直在研究的玩神经网络的跳棋实施修剪时，问题就出现了。这就是整个事情的目标，我刚刚启动了井字游戏来试验 MiniMax + Alpha/Beta，因为我以前从未处理过这些算法。

这是使用 NN 进行的同类实验。

现在是代码（跳棋一，如果您想看看井字游戏版本，请告诉我，不过它们几乎相同）。

我只会在两个方法的开头粘贴一次，因为它们完全相同，我将显示两个签名，因为它们略有不同。

小记，让代码更清晰。

Board 是跟踪棋子、可用动作、轮到哪个回合了，如果游戏已经赢了/平局等等......

Move 是包含所有与移动相关的信息的对象，当我做出克隆作为方法的第一行我只是简单地克隆给定棋盘，构造函数对其应用给定的移动。

private double miniMax(Board b, Move m, int depth) {

和

private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {

两种方法的开始：

Testboard clone = new Testboard(b, m);
    // Making a clone of the board in order to
    // avoid making changes to the original one

    if (clone.isGameOver()) {

        if (clone.getLoser() == null) 
            // It's a draw, evaluation = 0
            return 0;   

        if (clone.getLoser() == Color.BLACK)
            // White (Max) won, evaluation = 1
            return 1;

        // Black (Min) won, evaluation = -1
        return -1;  
    } 

    if (depth == 0) 
        // Reached the end of the search, returning current Evaluation of the board
        return getEvaluation(clone);

常规 MiniMax 延续：

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        double max = -1;
        for (Move move : clone.getMoves()) {
            // For each children node (available moves)
            // Their minimax value is calculated
            double score = miniMax(clone, move, depth-1);
            // Only the highest score is stored
            if (score > max)
                max = score;
        }
        // And is returned
        return max;
    } 

    // It's black's turn (Min player)
    double min = 1;
    for (Move move : clone.getMoves()) {
        // For each children node (available moves)
        // Their minimax value is calculated
        double score = miniMax(clone, move, depth-1);
        // Only the lowest score is stored
        if (score < min)
            min = score;
    }
    // And is returned
    return min;
}

带有 Alpha/Beta 修剪延续的 MiniMax：

    // If it's not game over
    if (clone.getTurn() == Color.WHITE) {

        // It's white's turn (Maxing player)
        for (Move move : clone.getMoves()) {

            // For each children node (available moves)
            // Their minimax value is calculated                
            double score = alphaBeta(clone, move, depth-1, alpha, beta);

            if (score > alpha)
                // If this score is greater than alpha
                // It is assigned to alpha as the new highest score
                alpha = score;
            if (alpha >= beta)
                // The cycle is interrupted early if the value of alpha equals or is greater than beta
                break;
        }
        // The alpha value is returned
        return alpha;
    } 

    // It's black's turn (Min player)
    for (Move move : clone.getMoves()) {

        // For each children node (available moves)
        // Their minimax value is calculated            
        double score = alphaBeta(clone, move, depth-1, alpha, beta);

        if (score < beta)
            // If this score is lower than beta
            // It is assigned to beta as the new lowest score
            beta = score;
        if (alpha >= beta)
            // The cycle is interrupted early if the value of alpha equals or is greater than beta
            break;
    }
    // The beta value is returned
    return beta;
}

老实说，我被困住了，我不确定我能做些什么来尝试弄清楚发生了什么。我已经在几个不同的甚至是随机生成的神经网络上尝试了 MiniMax+A/B，但在评估次数方面我从未见过改进。我希望这里的人能够对这种情况有所了解，谢谢！

【问题讨论】：

一个原因可能是移动排序，尽管我怀疑它是唯一的一个。如果您先尝试好的动作，则会修剪更多。
@maraca 您好，感谢您的回复。这些动作是随机排序的，我做了很多实验，每次无论有没有 alpha/beta 的 minimax 的评估次数都是完全相同的。代码中必须有一个更严重的错误。您的建议有时可能会成功，因此评估次数是相同的，但我觉得情况并非总是如此。
您的评价深度是多少？什么是您衡量电路板优劣的稳态评估函数？跳棋树可以很深；井字游戏，没那么多。您对跳棋的评估功能本质上是给所有棋盘0；因此，不会发生修剪。
你好像只评价输赢，没有真正的评价功能。因此，如果您计算的深度没有输赢，那么您将查看所有移动，因为每一片叶子都是平局（从您的评估函数的角度来看）。
我们说的差不多。如果棋盘总是评估为 0 或随机数，那么您可能必须查看所有的移动。您可以尝试仅将检查器的差异作为评估函数，将 +100 和 -100 作为赢/输，您应该可以进行修剪。

标签： java algorithm minimax alpha-beta-pruning

【解决方案1】：

感谢 @maraca 帮我解决这个问题，因为他只回复了评论，所以我会回答自己。

我发布的代码没有问题，问题在于我在搜索达到深度限制时使用的评估函数。

我使用的是一个仍然未经训练的神经网络，它基本上只是吐出随机值，这迫使 MiniMax+A/B 遍历所有节点，因为与答案不一致，事实证明这是必要的修剪发生。

【讨论】：