我坚持使用 alpha-beta 修剪算法实现答案

【问题标题】：i'm stuck with alpha-beta pruning algorithm implementation我坚持使用 alpha-beta 修剪算法实现
【发布时间】：2012-09-21 20:36:44
【问题描述】：

我正在尝试为 9Men's Morris 游戏实现 Game AI。

到目前为止，我的 board 表示如下：

    public class board 
    {
          public      node []gNode      = null;
          ... // so the table has 24 nodes, for 9 men morris game:
          gNode = new node[24];
          ...
          int evaluateBoard(); // evaluates the current board (tokens)
    }

好的，现在每个节点都表示如下：

    public class node 
    {
     node() // constructor
     {     ...       }

     // setting current node's neighbours (maximum 4 neighbours)
     void setNeighbours(int left, int right, int top, int bottom)
     {      ...      }

     short      gOccupiedByTeam = renderer.TEAM_NOTEAM; // info if this node is occupied by a token (and a wich team this token belongs to) 
     short    []gNeighbourId    = null; // info about this node neighbours (can be max. 4 in a 9Men's morris game)
     short      gInternalID     = -1;   // board's IDs (from 0..23)
     short      gTokenID        = -1;   // this node can be occupied by a token.  (from 0 .. 8) -see below the token class.
     short      gNodeScore      = -1;   // a dummy node score.
     vector3    gLocation       = null; // 3d coordinates for this node.


    }

token 看起来像这样：

public class token 
{
   token(vector3 startpos, short nodeId) // Constructor.
   {     ...     }


   public   physx       gPhysX      = null;  // 3d coordinates , velocity , accel. for this Token.
   public boolean       bIsAlive    = false; // is this token alive ? (or eliminated?)
   public boolean       bFormsMill  = false; // does it form a Mill?

   public short         gNodeID     = -1; // "link" this token with a gNodeID (when placing a token on current board). See above the node class. This represents a link ID to that node.
   public short         gTokenMill1 = -1; // used when this token forms a mill (with gTokenMill1  token!)
   public short         gTokenMill2 = -1; // same.

}

这是我的 Alpha-Beta 修剪算法实现，我遇到了困难：

public int getBestMove(board board, int depth, int alpha, int beta, boolean bIsPlayer)
{
    // if depth reached, return current's board's Evaluation (a score).
    if (depth == 0) return board.evaluateBoard(bIsPlayer);

    // is it Player's turn ? (max?)
    if (bIsPlayer)
    {
        // QUESTIONS: 
        // retrevie all possible "boards" below ! (all new possible token moves)
        // 1. here i should generate a new board with 1st possible move (for player token1) ?? ... then a second new board with 2nd possible move still for token1 ? .. and so on until no possible moves for token1?  
        //   (remembering that a token can move in 4 available spots - wich are a neighbour?) 
        // 
        // 2. the problem is that if i generate 4 new boards as per token 1 above let's say, then it will "eat" lot of memory for all 18 tokens and a function recursion depth of 5 for example, right ? 
        // 3. how do i fix point 2? 


        ArrayList<board> possible_boards = board.getAllPossibleBoards();

        // 4. ok, some possible boards were generated, loop thru them starting with the first one and calling recursively this function, is it right ?
        for(board iterator: possible_boards)
        {
            alpha = Math.max(alpha, getBestMove(iterator, depth - 1, alpha, beta, !bIsPlayer));

            if (beta < alpha)
            {

                break;
            }
        }

        // 5. how do i return best move to main calling function ? (wich token is it best move from all of these board's moves ?
        return alpha;
    }
    else
    {
        ArrayList<board> possible_boards = board.getAllPossibleBoards();

        for(board iterator: possible_boards)
        {

            beta = Math.min(beta, getBestMove(iterator, depth - 1, alpha, beta, !bIsPlayer));


            if (beta < alpha)
            {
                break;
            }


        }

        return beta;
    }


}

好的，这是我目前的功能。我不知道即使我在正确的轨道上 ??!

我的函数出了什么问题？
请回答我上面的问题（getBestMove()函数中的1到5）。

提前谢谢你，请原谅我的语言错误（我的英语不太好）

非常感谢saeedn的回复！！

我以为没有人会回答我 :)。我真的可以帮助我理解我的想法。

因此，CheckWinner( bool ) 将检查当前玩家是否具有非常好的优势（例如 winning 或 very good在这个深度移动像阻挡对手等），如果是这样，则返回当前玩家的BIG分数。这一切都是因为无论是玩家还是对手都不会每回合都试图赢得（大比分），对吧？

否则，如果 depth=0 则返回当前选定 Board 的评估（分数）（int evaluateBoard()），好吧。

在此之后，我必须生成一个单板（具有单个令牌可能的移动）：

   while( board.generateNextPossibleBoard(nextBoard) ) // board generated and stored in "nextBoard". Also check if it is a valid board or no more boards to generate further.

好的，现在有了一个新生成的棋盘，递归，如果找到更好的棋盘（具有更好的 SCORE 的棋盘），则将当前棋盘保存到 selectedBoard。如果不是，则切断并返回（不要进一步检查树）。

再次非常感谢你！

【问题讨论】：

所以可以肯定的是，这是一个极小极大算法，对吧？
是的，minimax alpha-beta pruning 。感谢您的快速回复。我很困惑将哪些参数传递给 getBestMove() ？作为一个节点？作为令牌？我在这里作为董事会发送！ ..以及如何生成下一个可能的动作（板）是我到目前为止做的正确吗？
用你的 alpha-beta 树生成所有可能的板。然后使用“EvaluateBoard”函数评估结束节点（板）来评估板的分数（您可以根据 AI 对棋子的处置来确定要如何给出分数）。然后，您选择所有棋盘中最好的棋盘（根据其得分）。那么下一步就是走向最好的棋盘。
谢谢弗朗西斯。哦，看来我在上面所做的事情是正确的。但是不确定我是否应该为token1生成一个棋盘，然后仍然为token1进行第二次移动……依此类推，直到没有为token1移动。在那之后，我将继续使用令牌 2 的电路板，就像我为令牌 1 所做的那样？？

标签： java algorithm alpha-beta-pruning

【解决方案1】：

一般来说你的代码是可以的，但是你应该记住一些点。

首先，您应该检查节点（这里是棋盘）是否是最终节点（有人赢得了比赛），然后检查深度是否等于 0。如果有人在该状态下获胜，您可能希望返回一个大值（用于赢得最大玩家）和一个小值（用于赢得最小玩家），例如分别为 MAXINT 和 MININT。

为避免高内存消耗，不要生成所有可能的板。生成一个板并对其进行递归调用，然后生成另一个板并搜索它，依此类推。这样，您只需将内存用于每个堆栈帧中的一个状态。这对于具有高分支因子的搜索至关重要！

最后你应该记录最大玩家的棋盘更新分数（你更新 alpha 的地方）。

请参阅我的伪代码以获得更多说明：

if ( board.checkWinner(bIsPlayer) ) return board.evaluateBoard(bIsPlayer);

// if depth reached, return current's board's Evaluation (a score).
if (depth == 0) return board.evaluateBoard(bIsPlayer);

board chosenBoard;    
if (bIsPlayer)
{
    // You should implement this method, or write your board generation code here
    // returns false if no more boards could be generated
    board nextBoard;
    while( board.generateNextPossibleBoard(nextBoard) )
    {
        int v = getBestMove(iterator, depth - 1, alpha, beta, !bIsPlayer));

        if ( v > alpha )
        {
            alpha = v;
            chosenBoard = nextBoard;  // return this chosenBoard by reference ;)
        }

        if (beta < alpha)
        {
            break;
        }
    }

    return alpha;
}
else
{
    // The same for beta except you don't need to update chosenBoard :)
}

【讨论】：

我不完全确定但不应该 board.evaluateBoard(bIsPlayer);是 board.evaluateBoard(root_color) ?
@FolkertvanHeusden 这取决于evaluateBoard 的实现方式。根据OP的代码，应该是board.evaluateBoard(bIsPlayer)。
如果你做 number_of_queens[current_player] - number_of_queens[opponent_of_current_player] 那么它应该是 bIsPlayer？对不起，我唠叨了这么多，但我想 100% 肯定地知道，然后我会适应维基百科页面，这样其他人现在也可以肯定地知道。顺便谢谢 :-)
@FolkertvanHeusden 我不知道目标游戏！我的建议是关于 alpha-beta 实现的一般想法。在这里，您可以将评估板程序封装在一个函数中。如果您认为evaluateBoard输入参数错误，请在原帖评论中提及，以便作者得到通知并回答您的问题:)