跟踪 Minimax 的最佳移动答案

【问题标题】：Track best move from Minimax跟踪 Minimax 的最佳移动
【发布时间】：2013-03-11 15:05:23
【问题描述】：

我知道以前有人问过这种问题，但我无法解决我的疑问。我有一个简单的黑白棋引擎（实际上它玩得很好），它使用下面的类来获得最好的移动：

import java.util.*;
import java.util.concurrent.*;

public class MinimaxOthello implements Runnable
{
  private CountDownLatch doneSignal;    
  private int maxDepth;
  private int calls;    
  private OthelloMove bestFound;
  private OthelloBoard board;
  private static float INFINITY = Float.MAX_VALUE/1000;    
  private boolean solve = false;
  private Comparator<OthelloMove> comparator = Collections.reverseOrder(new MoveComparator());

public MinimaxOthello (OthelloBoard board, int maxDepth, CountDownLatch doneSignal, boolean solve)
{
    this.board = board;        
    this.bestFound = new OthelloMove();
    bestFound.setPlayer(board.getCurrentPlayer());
    this.maxDepth = maxDepth; 
    this.doneSignal = doneSignal;                
    this.solve = solve;
}

public OthelloMove getBestFound()
{       
    return this.bestFound;
}
public void run()
{        
    float val = minimax(board, bestFound, -INFINITY, INFINITY, 0);
    System.out.println("calls: " + calls);
    System.out.println("eval: " + val);
    System.out.println();
    doneSignal.countDown();        
}

private float minimax(OthelloBoard board, OthelloMove best, float alpha, float beta, int depth)
{
    calls++;             
    OthelloMove garbage = new OthelloMove();             
    int currentPlayer = board.getCurrentPlayer();

    if (board.checkEnd())
    {                        
        int bd = board.countDiscs(OthelloBoard.BLACK);
        int wd = board.countDiscs(OthelloBoard.WHITE);

        if ((bd > wd) && currentPlayer == OthelloBoard.BLACK)
        {                
            return INFINITY/10;
        }
        else if ((bd < wd) && currentPlayer == OthelloBoard.BLACK)
        {                
            return -INFINITY/10;
        }
        else if ((bd > wd) && currentPlayer == OthelloBoard.WHITE)
        {                
            return -INFINITY/10;
        }
        else if ((bd < wd) && currentPlayer == OthelloBoard.WHITE)
        {                
            return INFINITY/10;
        }
        else 
        {                
            return 0.0f;
        }
    }
    if (!solve)
    {
        if (depth == maxDepth)
            return OthelloHeuristics.eval(currentPlayer, board);
    }

    ArrayList<OthelloMove> moves = board.getAllMoves(currentPlayer);
    if (moves.size() > 1)
    {
        OthelloHeuristics.scoreMoves(moves);        
        Collections.sort(moves, comparator);
    }

    for (OthelloMove mv : moves)
    {                                    
        board.makeMove(mv);            
        float score = - minimax(board, garbage, -beta,  -alpha, depth + 1);           
        board.undoMove(mv);             

        if(score > alpha)
        {  
            alpha = score;                
            best.setFlipSquares(mv.getFlipSquares());
            best.setIdx(mv.getIdx());        
            best.setPlayer(mv.getPlayer());                              
        }

        if (alpha >= beta)
            break;                

    }            
    return alpha;
 }  
}

我有一个 bestFound 实例变量，我的疑问是，为什么必须调用

OthelloMove garbage = new OthelloMove();

然后传递？该代码有效，但对我来说似乎很奇怪！

有没有“更好”的方法来获得最佳移动或主要变化？我真的不是递归专家，这很难调试和可视化。谢谢！

**PS：你可以在https://github.com/fernandotenorio/克隆它

【问题讨论】：

标签： java recursion minimax alpha-beta-pruning

【解决方案1】：

看来您可以将best 参数去掉为minimax，从而无需garbage，然后将best 替换为this.bestFound。仅在 depth = 0 时设置bestFound 的属性。

您可以通过将this.bestFound 设为最初为空的列表来获得主要变体。在moves 循环之前，创建一个新动作。在if (score > alpha) 部分中，将其属性设置为与现在相同。在循环之后立即将移动推送到列表。主要的变化将是列表的倒数。

如果它很重要，您可以进行以下更改以提高类的多线程性：

不要将bestFound 列表存储为实例变量，而是将其作为run 中的局部变量并将其作为参数添加到minimax
使Board.makeMove 不修改棋盘，而是返回一个新的棋盘实例并应用移动。您可以通过克隆板并将移动代码应用于克隆来实现这一点，而不是改变this。然后，将克隆的棋盘传递给 minimax 的下一次调用。

【讨论】：

我稍后再试试，谢谢！主变体呢？我需要更多的结构/编码来获得 PV，还是一个简单的堆栈就可以解决问题？
不这样做的一个原因是它使类对线程不安全。理论上，您可能希望运行 fork 多个线程来遵循不同的起始动作。如果他们都共享 bestFound 变量，这将不起作用。我提到这一点是因为我看到该类实现了Runnable。
最好的举动可以，但PV线不行。我们缺少什么？
我之前的代码可以在同一回合将多个动作推送到列表中。我对答案所做的更改应该可以解决这个问题。

【解决方案2】：

minimax 的第二个参数用于返回最佳移动。

garbage 的业务用于保持每个回合的最佳移动分开。使用您提供的代码，这并不重要。但是，如果您想生成从当前棋盘到游戏结束的一系列移动，则需要将它们设为单独的移动对象。

为每个回合使用单独的最佳移动对象允许您使用线程执行许多技巧。首先，您可能想要限制奥赛罗 AI 的思考时间。在每个级别分别跟踪最佳移动意味着您始终拥有迄今为止可用的最佳移动。这也意味着您可以缓存棋盘的最佳移动，并在未来的极小极大搜索中查找。

其次，您可能希望并行搜索最佳移动，当每个极小极大调用独立时，这很容易实现。

【讨论】：

所以我没有返回最好的移动？？
minimax的返回值是最好棋的得分。第二个参数通过引用返回实际的最佳移动。但在您提供的代码中，在调用 minimax 之后，没有任何内容从第二个参数读取。所以你正在返回最好的举动，但不看它。就像我说的那样，我假设您确实从代码中的其他地方读取了它。
是的，这就是我使用 Thread 信号的原因，当 minmax Thread 完成时，我调用 minmax.getBestFound() 并在棋盘上下棋。
好的，我明白你现在在做什么了。我使用garbage 编辑了我的答案。