TicTacToe Alpha Beta 修剪答案

【问题标题】：TicTacToe Alpha Beta PruningTicTacToe Alpha Beta 修剪
【发布时间】：2021-04-15 15:11:42
【问题描述】：

编辑 2021 年 3 月 30 日：问题的措辞确实很糟糕，重新表述了

我在 Python 中实现了一个 Alpha-Beta Prunning 算法，我想知道它不走最快的胜利路线是否正常（有时它会在 2 步中获胜，而它本可以在 1 步中获胜） .

import math
from collections import Counter
from copy import copy, deepcopy

""" Board Class Definition """
class Board:
    """ constructor """
    def __init__(self):
        # init data
        self.data = [ "." for i in range(9) ]
    
    
    """ copy constructor equivalent """
    @staticmethod
    def copy(board):
        return deepcopy(board)
    
    
    """ play at given coordinates """
    def play_at(self, position, color):
        # check if you can play
        if self.data[position] == ".":
            # make the move
            self.data[position] = color
            return True
        
        # did not play
        return False
    
    
    """ get coordinates of empty pieces on the board """
    def get_playable_coord(self):
        # define coordinates of empty tiles
        return [ i for i in range(9) if self.data[i] == "." ]
    
    
    """ board is full """
    def is_full(self):
        # define tile counter
        c = Counter( [ self.data[i] for i in range(9) ] )
        return ( c["x"] + c["o"] == 9 )
    
    
    """ get winner of the board """
    def get_winner(self):
        # straight lines to check
        straightLines = [ (0, 1, 2) , (3, 4, 5) , (6, 7, 8) , (0, 3, 6) , (1, 4, 7) , (2, 5, 8) , (0, 4, 8) , (2, 4, 6) ]
        
        # check straight lines - 8 in total
        for i in range(8):
            # get counter of line of tiles
            c = Counter( [ self.data[j] for j in straightLines[i] ] )
            
            # different scenarii
            if c["x"] == 3:
                return "x"
            
            elif c["o"] == 3:
                return "o"
        
        # if board is full, game is a draw
        if self.is_full():
            return "draw"
        
        # return None by default
        return None
    
    
    """ get heuristic value of board - for "x" if 'reverse' == False """
    def get_heuristic_value(self, reverse):
        # init variable
        value = 0
        
        # straight lines to check
        straightLines = [ (0, 1, 2) , (3, 4, 5) , (6, 7, 8) , (0, 3, 6) , (1, 4, 7) , (2, 5, 8) , (0, 4, 8) , (2, 4, 6) ]
        
        # check straight lines - 8 in total
        for i in range(8):
            # get counter of line of tiles
            c = Counter( [ self.data[j] for j in straightLines[i] ] )
            
            # different scenarii
            if c["x"] == 3:
                value += 100
            
            elif c["x"] == 2 and c["."] == 1:
                value += 10
            
            elif c["x"] == 1 and c["."] == 2:
                value += 1
            
            elif c["o"] == 3:
                value -= 100
            
            elif c["o"] == 2 and c["."] == 1:
                value -= 10
            
            elif c["o"] == 1 and c["."] == 2:
                value -= 1
        
        # return heuristic value
        if reverse:
            return -value
        else:
            return value



""" Model Class Definition """
class Model:
    """ constructor """
    def __init__(self, color):
        # define parameters
        self.color = color
        self.other = self.get_opponent(color)
        
        # define board
        self.board = Board()
        
        # define winner
        self.winner = None
        
        # 'x' plays first
        if self.other == "x":
            self.make_ai_move()
    
    
    """ get opponent """
    def get_opponent(self, player):
        if player == "x":
            return "o"
        return "x"
    
    
    """ player makes a move in given position """
    def make_player_move(self, pos):
        if self.winner is None:
            # get result of board method
            res = self.board.play_at(pos, self.color)
            
            # check end of game <?>
            self.winner = self.board.get_winner()
            
            if res and self.winner is None:
                # make AI move
                self.make_ai_move()
    
    
    """ AI makes a move by using alphabeta pruning on all child nodes """
    def make_ai_move(self):
        # init variables
        best, bestValue = None, - math.inf
        
        for i in self.board.get_playable_coord():
            # copy board as child
            copie = Board.copy(self.board)
            copie.play_at(i, self.other)
            
            # use alpha beta && (potentially) register play
            value = self.alphabeta(copie, 10, - math.inf, math.inf, False)
            if value > bestValue:
                best, bestValue = i, value
        
        # play at best coordinates
        self.board.play_at(best, self.other)
        
        # check end of game <?>
        self.winner = self.board.get_winner()
    
    
    """ alpha beta function (minimax optimization) """
    def alphabeta(self, node, depth, alpha, beta, maximizingPlayer):
        # ending condition
        if depth == 0 or node.get_winner() is not None:
            return node.get_heuristic_value(self.other == "o")
        
        # recursive part initialization
        if maximizingPlayer:
            value = - math.inf
            for pos in node.get_playable_coord():
                # copy board as child
                child = Board.copy(node)
                child.play_at(pos, self.other)
                value = max(value, self.alphabeta(child, depth-1, alpha, beta, False))
                
                # update alpha
                alpha = max(alpha, value)
                if alpha >= beta:
                    break
            return value
        
        else:
            value = math.inf
            for pos in node.get_playable_coord():
                # copy board as child
                child = Board.copy(node)
                child.play_at(pos, self.color)
                value = min(value, self.alphabeta(child, depth-1, alpha, beta, True))
                
                # update beta
                beta = min(beta, value)
                if beta <= alpha:
                    break
            return value

我对这个问题的结论：

Alpha-Beta Pruning 是一种深度优先搜索算法，而不是广度优先搜索算法，所以我认为无论深度如何，它都会选择它找到的第一条路线，而不是寻找最快的路线。 ...

【问题讨论】：

对于 maximizing_player，初始 alpha/beta 调用不应该是 True 吗？
是的，如果你从当前的棋盘状态开始，应该是这样，但我所做的是计算每个可能的 AI 动作的分数，然后从中选出最好的
您是否尝试打印输出的分数？你有没有得到中奖或抽奖的分数？
我在帖子中添加了一些分数示例
在一个空棋盘上你应该得到 0 分（相等），因为如果两个玩家都玩得正确，它总是在相等的游戏中。

标签： python minimax alpha-beta-pruning

【解决方案1】：

我知道这不是问题的答案，但我想建议 AI tac-tac-toe 播放器可能更简单的方法，其中涉及计算位置是赢还是输。这将需要考虑游戏中任何时候可能发生的所有有效位置，但由于场地是 3x3，有效位置的数量少于 3^9 = 19683（每个位置要么是 'x '、'o' 或 ' '）。这不是一个硬性限制，因为从游戏规则的角度来看，很多位置都是无效的。我建议你从这里开始，因为你说的算法主要用在全搜索不可行的难度较高的游戏中。

因此，您只需在启动程序后计算每个位置的输赢指标，然后在 O(1) 中做出决定。这对于 3x3 字段是可以接受的，但可能不多。

这里描述了一般方法：https://cp-algorithms.com/game_theory/games_on_graphs.html。简而言之，您构建了一个可能移动的树，将叶子标记为赢或输，并通过考虑所有子转换（例如，如果每个转换导致对面玩家的获胜位置，失败的位置） .

如果你懂俄语，这里是原始页面的链接：http://e-maxx.ru/algo/games_on_graphs

附：我在过去的某个时候也玩过这个游戏并实施了这种方法。如果您想调查，这是我的回购：https://github.com/yuuurchyk/cpp_tic_tac_toe。公平警告：它是用 C++ 编写的，代码有点难看

【讨论】：

"你构建了一个可能的移动树，将叶子标记为赢或输，并通过考虑所有子转换（例如，如果每个转换导致对面玩家的获胜位置），输掉的位置）”也许我从一开始就错了，但这不是极小极大原则（以及扩展的 alpha-beta 修剪）吗？
谢谢你的链接，我去看看能不能适应他们
@Tartempion 我想这有点不同，因为图形方法没有为节点引入任何启发式分数。至于剪枝，我不熟悉这种方法，但在我看来，它并没有探索整个样本空间。所以我想它应该运行超过 1 次（我可能在这里错了）。对于像井字游戏这样简单的游戏，一劳永逸地遍历整个树可能是可以接受的。您甚至可以尝试发布预先计算好的博弈树，并从一开始就获得经过全面训练的代理（当然，这应该被分析）。