【问题标题】:TicTacToe Alpha Beta PruningTicTacToe Alpha Beta 修剪
【发布时间】:2021-04-15 15:11:42
【问题描述】:

编辑 2021 年 3 月 30 日:问题的措辞确实很糟糕,重新表述了

我在 Python 中实现了一个 Alpha-Beta Prunning 算法,我想知道它不走最快的胜利路线是否正常(有时它会在 2 步中获胜,而它本可以在 1 步中获胜) .

import math
from collections import Counter
from copy import copy, deepcopy

""" Board Class Definition """
class Board:
    """ constructor """
    def __init__(self):
        # init data
        self.data = [ "." for i in range(9) ]
    
    
    """ copy constructor equivalent """
    @staticmethod
    def copy(board):
        return deepcopy(board)
    
    
    """ play at given coordinates """
    def play_at(self, position, color):
        # check if you can play
        if self.data[position] == ".":
            # make the move
            self.data[position] = color
            return True
        
        # did not play
        return False
    
    
    """ get coordinates of empty pieces on the board """
    def get_playable_coord(self):
        # define coordinates of empty tiles
        return [ i for i in range(9) if self.data[i] == "." ]
    
    
    """ board is full """
    def is_full(self):
        # define tile counter
        c = Counter( [ self.data[i] for i in range(9) ] )
        return ( c["x"] + c["o"] == 9 )
    
    
    """ get winner of the board """
    def get_winner(self):
        # straight lines to check
        straightLines = [ (0, 1, 2) , (3, 4, 5) , (6, 7, 8) , (0, 3, 6) , (1, 4, 7) , (2, 5, 8) , (0, 4, 8) , (2, 4, 6) ]
        
        # check straight lines - 8 in total
        for i in range(8):
            # get counter of line of tiles
            c = Counter( [ self.data[j] for j in straightLines[i] ] )
            
            # different scenarii
            if c["x"] == 3:
                return "x"
            
            elif c["o"] == 3:
                return "o"
        
        # if board is full, game is a draw
        if self.is_full():
            return "draw"
        
        # return None by default
        return None
    
    
    """ get heuristic value of board - for "x" if 'reverse' == False """
    def get_heuristic_value(self, reverse):
        # init variable
        value = 0
        
        # straight lines to check
        straightLines = [ (0, 1, 2) , (3, 4, 5) , (6, 7, 8) , (0, 3, 6) , (1, 4, 7) , (2, 5, 8) , (0, 4, 8) , (2, 4, 6) ]
        
        # check straight lines - 8 in total
        for i in range(8):
            # get counter of line of tiles
            c = Counter( [ self.data[j] for j in straightLines[i] ] )
            
            # different scenarii
            if c["x"] == 3:
                value += 100
            
            elif c["x"] == 2 and c["."] == 1:
                value += 10
            
            elif c["x"] == 1 and c["."] == 2:
                value += 1
            
            elif c["o"] == 3:
                value -= 100
            
            elif c["o"] == 2 and c["."] == 1:
                value -= 10
            
            elif c["o"] == 1 and c["."] == 2:
                value -= 1
        
        # return heuristic value
        if reverse:
            return -value
        else:
            return value



""" Model Class Definition """
class Model:
    """ constructor """
    def __init__(self, color):
        # define parameters
        self.color = color
        self.other = self.get_opponent(color)
        
        # define board
        self.board = Board()
        
        # define winner
        self.winner = None
        
        # 'x' plays first
        if self.other == "x":
            self.make_ai_move()
    
    
    """ get opponent """
    def get_opponent(self, player):
        if player == "x":
            return "o"
        return "x"
    
    
    """ player makes a move in given position """
    def make_player_move(self, pos):
        if self.winner is None:
            # get result of board method
            res = self.board.play_at(pos, self.color)
            
            # check end of game <?>
            self.winner = self.board.get_winner()
            
            if res and self.winner is None:
                # make AI move
                self.make_ai_move()
    
    
    """ AI makes a move by using alphabeta pruning on all child nodes """
    def make_ai_move(self):
        # init variables
        best, bestValue = None, - math.inf
        
        for i in self.board.get_playable_coord():
            # copy board as child
            copie = Board.copy(self.board)
            copie.play_at(i, self.other)
            
            # use alpha beta && (potentially) register play
            value = self.alphabeta(copie, 10, - math.inf, math.inf, False)
            if value > bestValue:
                best, bestValue = i, value
        
        # play at best coordinates
        self.board.play_at(best, self.other)
        
        # check end of game <?>
        self.winner = self.board.get_winner()
    
    
    """ alpha beta function (minimax optimization) """
    def alphabeta(self, node, depth, alpha, beta, maximizingPlayer):
        # ending condition
        if depth == 0 or node.get_winner() is not None:
            return node.get_heuristic_value(self.other == "o")
        
        # recursive part initialization
        if maximizingPlayer:
            value = - math.inf
            for pos in node.get_playable_coord():
                # copy board as child
                child = Board.copy(node)
                child.play_at(pos, self.other)
                value = max(value, self.alphabeta(child, depth-1, alpha, beta, False))
                
                # update alpha
                alpha = max(alpha, value)
                if alpha >= beta:
                    break
            return value
        
        else:
            value = math.inf
            for pos in node.get_playable_coord():
                # copy board as child
                child = Board.copy(node)
                child.play_at(pos, self.color)
                value = min(value, self.alphabeta(child, depth-1, alpha, beta, True))
                
                # update beta
                beta = min(beta, value)
                if beta <= alpha:
                    break
            return value

我对这个问题的结论:

Alpha-Beta Pruning 是一种深度优先搜索算法,而不是广度优先搜索算法,所以我认为无论深度如何,它都会选择它找到的第一条路线,而不是寻找最快的路线。 ...

【问题讨论】:

  • 对于 maximizing_player,初始 alpha/beta 调用不应该是 True 吗?
  • 是的,如果你从当前的棋盘状态开始,应该是这样,但我所做的是计算每个可能的 AI 动作的分数,然后从中选出最好的
  • 您是否尝试打印输出的分数?你有没有得到中奖或抽奖的分数?
  • 我在帖子中添加了一些分数示例
  • 在一个空棋盘上你应该得到 0 分(相等),因为如果两个玩家都玩得正确,它总是在相等的游戏中。

标签: python minimax alpha-beta-pruning


【解决方案1】:

我知道这不是问题的答案,但我想建议 AI tac-tac-toe 播放器可能更简单的方法,其中涉及计算位置是赢还是输。这将需要考虑游戏中任何时候可能发生的所有有效位置,但由于场地是 3x3,有效位置的数量少于 3^9 = 19683(每个位置要么是 'x '、'o' 或 ' ')。这不是一个硬性限制,因为从游戏规则的角度来看,很多位置都是无效的。我建议你从这里开始,因为你说的算法主要用在全搜索不可行的难度较高的游戏中。

因此,您只需在启动程序后计算每个位置的输赢指标,然后在 O(1) 中做出决定。这对于 3x3 字段是可以接受的,但可能不多。

这里描述了一般方法:https://cp-algorithms.com/game_theory/games_on_graphs.html。简而言之,您构建了一个可能移动的树,将叶子标记为赢或输,并通过考虑所有子转换(例如,如果每个转换导致对面玩家的获胜位置,失败的位置) .

如果你懂俄语,这里是原始页面的链接:http://e-maxx.ru/algo/games_on_graphs

附:我在过去的某个时候也玩过这个游戏并实施了这种方法。如果您想调查,这是我的回购:https://github.com/yuuurchyk/cpp_tic_tac_toe。公平警告:它是用 C++ 编写的,代码有点难看

【讨论】:

  • "你构建了一个可能的移动树,将叶子标记为赢或输,并通过考虑所有子转换(例如,如果每个转换导致对面玩家的获胜位置) ,输掉的位置)”也许我从一开始就错了,但这不是极小极大原则(以及扩展的 alpha-beta 修剪)吗?
  • 谢谢你的链接,我去看看能不能适应他们
  • @Tartempion 我想这有点不同,因为图形方法没有为节点引入任何启发式分数。至于剪枝,我不熟悉这种方法,但在我看来,它并没有探索整个样本空间。所以我想它应该运行超过 1 次(我可能在这里错了)。对于像井字游戏这样简单的游戏,一劳永逸地遍历整个树可能是可以接受的。您甚至可以尝试发布预先计算好的博弈树,并从一开始就获得经过全面训练的代理(当然,这应该被分析)。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-10-15
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-01-18
  • 1970-01-01
相关资源
最近更新 更多