1
Laurea
Laurea MAGISTRALE in MAGISTRALE in COMPUTER SCIENCE COMPUTER SCIENCE
Corso di
ARTIFICIAL INTELLIGENCE Stefano Ferilli
Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.
Search / 2 Search / 2
Search for Games – min-max and alpha-beta Search for Games – min-max and alpha-beta
algorithms algorithms
● Effectiveness of graph search-based methods in defining a problem solving agent’s behavior may decrease when
●
Perception process does not provide necessary information on the state of the environment
– Noise, problems with sensors, etc.
●
Actions don’t always have the “modeled” effect
– Imprecise model, imprecise effectors
●
Other agents exist in the environment that try to hinder the main agent
● Problems:
●
External world may change while the agent is enacting its plan
●
The agent may be required to act before it can complete its search up to a goal state
●
Even when the agent has enough time, there is not enough storage to complete the search
● Need to develop alternate search methods
Single Agent
● Attaining goal through rewards
●
Assumption of search strategies in the state space:
the agent has a single short-term task, described by a goal state/condition
●
In many real-world problems defining the goal in advance may be hard,
but it may be possible to express
satisfaction/dissatisfaction for task execution, giving the agent positive/negative rewards
– The agent must maximize the quantity of reward received
– Problem: tasks not defined in advance do not terminate and reward might be infinite (difficult to define how to maximize it). The model establishes that the agent prefers immediate rewards than (distant) future rewards
Single Agent
●
Assumption: environment is observable
– There are observable states characterized by rewards and transitions specified from state to state
– Regime is stationary (stable)
●
If learning ability is not provided for, theoretical solution is completely specified
– The problem may be reduced to a search problem using tentative and explorative methods
●
If it is possible to learn, methods may include reinforcement learning, etc.
Multiagent Systems
●
The agent tries to solve its task/problem, possibly learning
– The other agents in the environment do the same
●
The theoretical
solution for
completely specified
problems, not
involving learning, is in
Game Theory
Games
● A suitable domain to explore machine intelligence
●
Consist of a structured task in which success or failure are easily measured
●
Require much knowledge, to reduce combinatorial explosion of the solution space
● Can be analyzed using production systems
Games
● An overview
Add more competing agents
to the domain
Problem Solving
Game Playing
Add complexity:
states (conjunctions of facts), and operators (links between condition-facts
and effect-facts)
Planning
Game Theory
Von Neumann & Morgenstern (1944)
Decision Theory Game Theory
Analyze individual behavior whose actions have a direct effect
Analyze individual behavior whose actions have an effect that depends on the others’ choices
Bets &
Puzzle World
World of multi-player games Theory of Games and Economic Behaviour
Games in AI and beyond
● “Games are not chosen because they are clear and simple, but because they provide
maximum complexity with minimum initial structures”
●
[M. Minsky, 1968]
● Scientific motivations
●
Mathematics: graph theory and complexity
●
Computer Science: databases, parallel computing, etc.
●
Economiy: game theory, cognitive/experimental economy
●
Psychology: confidence, risk, etc.
Types of Games
●
Classification 1: choice conditions
– Games with “perfect” information
●
Game states are completely explicit for the agents – Games with “imperfect” information
●
Game states are only partially made explicit
●
Classification 2: choice effects
– Deterministic games
●
States are determined solely based on the agents’ actions – Stochastic games
●
States are determined also based on external factors – Perfect Information E.g.: dices Imperfect Information Deterministic
Games
Chess, Go, Checkers, Othello, Forza4
MasterMind (game or puzzle?) Stochastic
Games
Backgammon, Monopoly
Scrabble, Risiko
Bridge, Poker, … (card games)
Types of Games
● Other classifications
●
Number of players
– Multi-agent systems
●
Play turn policy
– Diachronic
●
Defined/indefinite turns – Sinchronous
●
Environment
– Discrete vs Continuous
– Static vs Dynamic
– Episodic vs Sequential
●
Zero-sum Games
Men act in a continuous, dynamic, sequential environment
with synchronous choices
and imperfect
information
Games as Search Problems
● Theoretical solution: how to choose the best move in a game with limited search space
●
Extension to more complex games,
in which exhaustive exploration is impossible
●
Search optimization techniques
●
What if there are many opponents?
Games as Search Problems
● States = game configurations
● Initial state = initial configuration
● Successor function = legal moves and resulting states
● Termination test = function determining the end of the game
● Utility function (pay-off): numeric value that weights terminal states of the game
●
E.g.: 1 | –1 | 0, score, …
Generate-and-Test Strategy
● Process enacted by a player to win
~ sequence of generation and test procedures
●
Test carried out after some (variable) amount of work carried out by the generator statically or dynamically
GENERATOR Tester
generates complete
solutions that Tester evaluates
OR Generates
individual moves that Tester evaluates choosing the most
promising one
Generate-and-Test Strategy
● Improving effectiveness
●
Improving generation procedure so that only useful moves are generated
●
Improving test procedure so that best moves are recognized and attempted first
● How to deal with the presence of two opponents?
Adversarial Games
● Why?
●
Simple and formalizable rules
●
Deterministic: two players, alternate turns, zero- sum, perfect information (accessible environment)
●
Presence of opponent makes the problem contingent more difficult than search problems seen so far
●
Complexity and real-time constraints: it is only possible to try making the best move in the available time
● Slightly more similar to real problems
Games as Search Problems
● Problem solving (1)
●
Can a game be analyzed as a search problem, even if multi-agent?
– Example: chess
●
X = all states of the chessboard
●
x
0= initial state of the game
●
succ(x) = legal moves for a state
●
T(x) = checkmate
●
g = number of moves – There’s something wrong!...
●
g not decisive
●
succ(x) can be controlled only for half the moves, and often it is not reversible
●
T(x) insufficient to define termination – Need for a utility function on termination
●
E.g.: win = +1, draw = 0, loose = –1
– Agent’s goal: defining a strategy to attain T(x) = +1
Games as Search Problems
● Problem solving (2)
●
To cast a game with perfect information as a classical search, consider that
– There is an opponent to be simulated
– The opponent minimizes our utility
●
Search tree developed on 2 players:
– MAX (the computer/us)
– MIN (the opponent)
●
Goal: reaching a terminal state of the tree which maximizes utility
– If the opponent plays first:
●
It becomes MAX
●
We become MIN, with the goal of minimizing utility
Search with Two Opponents
● Representation
●
Search for a solution in games for many players can be cast as a search process that can still be described using search trees
– Nodes = game situations
– Branches = moves that produce the different configurations
●
Competition factor must be considered in the special semantics adopted on such trees
– Level of the tree decision up to one of the two players
– So, each level, in turn, indicates the move of either opponent
Search with Two Opponents
Player A Player A
Player B
Search with Two Opponents
● Example
●
Game : build, starting from a single stack of objects, many stacks made up of ever different numbers of objects, according to the following rules
– Two players: MAX and MIN. MIN plays first
– One move available to each player:
splitting the stack in two non-equal stacks
●
Game ends when there are only stacks of 1 or 2 objects – The first player splits the initial stack
– The game proceeds by splitting any of the stacks generated so far
●
The first player that cannot play is the loser
Search with Two Opponents
● Example (7 objects in the initial stack)
●
DB = an unordered sequence of numbers (representing the number of objects in the stacks) + indication of who makes the next move
(7, MIN)
6,1, MAX 5,2, MAX 4,3, MAX
5,1,1, MIN 4,2,1, MIN 3,2,2, MIN 3,3,1, MIN 4,1,1,1, MAX 3,2,1,1, MAX 2,2,2,1, MAX
3,1,1,1,1, MIN 2,2,1,1,1, MIN
2,1,1,1,1,1, MAX Winning strategies for MAX
Search with Two Opponents
● In building the search tree (game tree) the iterated process of selecting the first “good”
move may lead to the solution strategy
●
As usual, one may use
– Breadth-first search
– Depth-first search
– Heuristic methods
●
With suitable termination conditions
– Time factors, memory limitations, depth of the tree, etc.
Search with Two Opponents
● MIN-MAX depth-first search procedure
●
Assumption: search a winning strategy for MAX, who plays first
●
Procedure based on evaluating the evaluation function e at many levels (stages) of the search
– Search for the first best move for MAX based on maximization of the static evaluation function e
– Next stage: MIN chooses the move
●
It will try and oppose MAX by choosing a move that minimizes the evaluation function’s value
●
The MAX node parent of MIN nodes is assigned a backpropagated value equal to the maximum value of e for the MIN nodes
●
The MIN node, parent of subsequent MAXs, is assigned a backpropagated value equal to the minimum value of e for its successor nodes
Search with Two Opponents
● MIN-MAX depth-first search procedure
– Upon termination of the search, an estimation of the
“best” first move can be extracted from the search graph
●
Apply the static evaluation function to the leaf nodes in the search graph
●
Go on backpropagating, level-wise, from leaves up to the root, until the process is completed
●
Validity of this procedure is based on the
assumption that the values propagated upward for the successors of the starting node are more reliable measures of the actual goodness of these positions than the values computed using the static evaluation function
● Evaluation function
●
An estimation of the expected utility starting from a given position in the game
●
Quality of evaluation function is crucial
– Must be consistent with utility if applied to terminal states of the game
●
Induce the same ordering – Must be efficiently computed
– Must reflect the actual likelihoods of winning
●
e(A) > e(B) if from A there are more chances of winning than from B
MIN-MAX Algorithm
● Designed to
determine the optimal strategy for MAX
●
Suggest MAX the best move to make
● Assumption: MIN makes the most favorable move for itself
● Path is not relevant, only the first move!
MIN-MAX Algorithm
● Leaves (end game conditions) labeled with 1 and –1
●
MIN tries to reach –1
– Minimizer
●
MAX tries to reach +1
– Maximizer
MIN-MAX Algorithm
● High-level description (recursive procedure)
●
Determine if level is minimization or maximization
●
If search bound reached
– Compute static value of current position associated to the proper player
– Exit with result
●
Use MIN-MAX on children of current node
– IF minimization level
●
choose the minimum of the results – ELSE (maximization level)
●
choose the maximum of the results
MIN-MAX Algorithm
● Based on two predicates
●
gen_moves(POSition, Player)
– Generator of plausible moves
●
Provides a list of nodes representing applicable moves that Player may make
●
static(POSition, Player)
– Static evaluation function
●
Returns a number that represents the goodness of POS from Player’s perspective
– Large values -> good position for the player who is going to move (indicated by even or odd depth)
– Its caller tries to maximize its own score
●
When value is backpropagated to higher level it is ‘negated’ (to reflect the opponent’s perspective)
●
By inverting values at alternate levels MINIMAX may be very simple
MIN-MAX Algorithm
● Additional function
●
deep_enough(POSition, Depth)
– Evaluates termination factors
– Returns TRUE if search must be stopped at current level, or FALSE otherwise
– May take just (Depth) in its simplest implementation
● Returns two objects
●
Value
– The backpropagated value of the chosen path
●
Path
– The path itself
MIN-MAX Algorithm
● Note: Neatly separates the search tree generation and position evaluation processes
●
Positions evaluation starts only after generation is completed
– Since MIN-MAX is a depth process, a path is completely explored and the static evaluation function is applied to the last step of the path so that values can be backpropagated
●
Inefficient!!!
– Search can be reduced by evaluating the leaf nodes and computing the backpropagated values during tree generation
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
There is no path from this node
its value is determined by the static evaluation function
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
Generate another
stage of the tree
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
there are no moves to make
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
the same structure as for deep_enough = True
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
examine in turn each element, recording the best one
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
This variable will be updated to record the best score that can be obtained
from the elements in Successors
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
Recursive call that will examine Succ
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
Reflects the goodness of position
from the opponent’s perspective
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
found a better successor than those examined so far;
store it
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path Best path known so far is that from Current to Succ, that then goes on along the path starting from Succ as determined by the recursive call to MIN-MAX
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
Now that all successors have been examined, both the value of Position and the path to take from it are known
MIN-MAX Algorithm
– MIN-MAX(Pos,Depth,Player)
●
IF deep_enough(Pos,Depth) THEN
– return Value static(Pos,Player); Path NIL ELSE
– Successors gen_move(Pos,Player)
●
IF Successors is empty THEN
– return Value static(Pos,Player); Path NIL ELSE
– Best_score minimum possible value for static() – For all elements Succ in Successors do
●
Succ_res MIN-MAX(Succ,Depth+1,opponent(Player))
●
New_value –value(Succ_res)
●
IF New_value > Best_value THEN
●
Best_value New_value
●
Best_path Succ & path(Succ_res) – return Value Best_value; Path Best_path
MIN-MAX Algorithm
●
Recursive version ( Depth-first visit)
●
function MINIMAX-Decision(state) returns an action – v Max-Value(state)
– return the action in Successors(state) with value v
●
function Max-Value(state) returns a utility value – if Terminal-Test(state) then return Utility(state) – v –
– for a,s in Successors(state) do v Max(v, Min-Value(s)) – return v
●
function Min-Value(state) returns a utility value – if Terminal-Test(state) then return Utility(state) – v
– for a,s in Successors(state) do v Min(v, Max-Value(s)) – return v
●
To label s, Min-Value and Max-Value explore the whole tree under state s
MIN-MAX Algorithm
● Properties
●
Complete – If tree is finite
●
Optimal – Against an opponent playing at its best
●
Space complexity O(bm)
– Depth-first
●
Time complexity O(b m )
– Very inefficient (exponential) if the whole tree is to be developed
●
b branching factor, m number of levels
● Need to prune the tree!
●
E.g., chess
– b 35, m 100 for reasonable matches
35 100 Impractical!
MIN-MAX Algorithm
● 3 approaches to reducing the tree
●
Heuristic evaluation function, used in intermediate (not necessarily terminal) states to stop tree expansion
●
Ignoring portions of the tree that do not affect final choice (- cuts)
●
(Parallel multi-processor architectures)
Revised MIN-MAX Algorithm
● Solution to inefficiency (Shannon, 1949)
●
Apply MIN-MAX up to a given depth
– Look-ahead only for a while and evaluate moves up to a non-terminal node considered as successful
●
Use a heuristic state evaluation function eval( n)
– Based on the available information, estimates how likely is that move to state n subsequently leads to solution
– eval(n) = –1 surely winning for MIN – eval(n) = +1 surely winning for MAX – eval(n) = 0 about same probabilities – Other intermediate values for eval(n) – Various features of the state can be evaluated
●
Weighted linear function e(s) = w
1
f
1
(s) + w
2
f
2
(s) + … + w
n
f
n
(s) – or even non-linear combinations of features
●
Trade-off between search and evaluation function
MIN-MAX with Imperfect Decision
● In more complex cases, eval(s) is needed
●
Strategy: look-ahead of d levels
– Expand search tree for a given number of levels d
●
Compliant with available time and space
– Evaluate the obtained states and backpropagate the results using the MAX and MIN rule
●
MIN-MAX Algorithm as before, but…
– if TerminationTest(state) then return utility(state)
– if CutTest(state, d) then return eval(state)
MIN-MAX Algorithm
●
Example: Tic-Tac-Toe
– Consider MAX X MIN O
– Carry out Breadth-first search up to depth 2
– Static evaluation function e( p) for position p:
●
IF p is not a winning position for any player:
e(p) = (no. of rows, columns or diagonals still open for MAX) – (no. of rows, columns or diagonals still open for MIN)
●
IF p is winning for MAX : e(p) =
●
IF p is winning for MIN : e(p) = –
– E.g.:
●
p = e(p) = 6 – 4 = 2
●
Use of symmetries:
states considered equal
X 0
X 0
X 0
X 0
X 0
Stage 1
X X
0 X
0 X X 0
X 0
X
0 X
0 X 0
X 0 X 0
X 0 X 0
X 0 e(p)=3
X
e(p)=1
e(p)=0 1
0 -1
-1 0
-1 0
-2
1 2
e(p)=1 e(p)=-
2
Backpropagated value Start node
1
-1
1 -2
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0 X 0
X 0
X 0
X 0 X 0
Stage 2 Start
X node
X 0 X X X X X
X
X X X X X
X
X
X X X X X X
X X X
0 0
0 0 0
0
0 0 0 0 0
0 0
0 0 0 0
0 0
0
1 1
1 0
0 1
2 3 1 2 1
0
1 2 0 1 1
2
2 3 1 2 2
1 1 0
X
X 0 X 0
X 0 X 0
X X X
0 0
0 0
- ∞ 1 0 1
X X 0
X 0 X 0
X 0
X X X
0 0
0 0
- ∞ 1 2 X 2
X 0 0 X X 0 0
X X 0
X 0 X 0
X 0
X X X
0 0
0 0
2 X
1 2 1
0 X 0 X
0 X 0 X
X X X
0 0 0 0
X
X 0 X 0
X 0 X 0
X X X X
0 0
0 0
- ∞ 0 0 1
X 0 0 X
1 1 - ∞ 1
X 0 X 0 1
0 X X 0 - ∞
X 0 X 0 1
- ∞
- ∞
- ∞
Stage 3 Start node
X X
X 0
X X X X X X X
0
0 0 0
0 0 0
0 0
0 0 0
0 0 0
X X X X X X X X
X X X X
0 0 0
0 X
X X
Revised MIN-MAX Algorithm
●
Example: Chess
– Sum values of pieces owned by a player and normalize the result in [–1,+1]
– Features
●
Value of piece
– pawn 1, knight or bishop 3, tower 5, queen 9, …
●
Good layout of pawns
●
Protection of king – E.g., weighted linear function
●
e(s) = w
1f
1(s) + w
2f
2(s) + … + w
nf
n(s)
– w
1= 9, f
1(s) = (no. white queens) – (no. black queens) – etc.
– Might be refined by considering relative positions
●
Is the king defended?
●
Does the pawn protect another piece?
●
...
Revised MIN-MAX Algorithm II
●
To evaluate a node n in a game tree
– 1. Put n in not yet expanded nodes L L = (n)
– 2. Let x be the first node in L. If x = n and it has an assigned value return such a value
– 3. Else if x has an assigned value V
x , let p be the parent of x and V
p its temporary value
●
If p is a MIN node, V
p min(V
p,V
x), else V
p max(V
p,V
x)
●
Remove x from L and goto 2 – 4. if x is not assigned any value
●
If it is a terminal node, or we decide not to further expand the tree, assign it a value using evaluation function eval(x). Leave x in L because its ancestors have to be updated and goto 2
●
If it is not a terminal node,
assign V
x – if x is a MAX or V
x if x is a MIN Add children of x to L and goto 2
Revised MIN-MAX Algorithm II
●
How to decide that the tree is not to be further expanded?
●
Note: if eval(n) were perfect there would not be such problem.
Only root’s children would be expanded to decide what to do
●
Always expand up to a given depth p
– Feasible and simple (even computationally)
– Problems:
●
Tactically more complex moves (with values that change more widely for eval(n)) should be evaluated with more depth up to quiescence (values of eval(n) changing very slowly)
●
Horizon effect: non-particularly-useful moves may delay essential moves beyond the maximum tree depth p, so that they are not taken into account
– Solution
●
Sometimes it is appropriate to make a secondary search, focused on the chosen move
Problems with MIN-MAX
● Non-quiescent states
●
Exploration up to a level may show a very advantageous situation
●
At the next move, the black queen is taken
● Solution:
quiescence search
●
Applying evaluation to quiescent states in which the evaluation function is not subject to sudden changes
Problems with MIN-MAX
● Horizon effect
●
It may happen that diversion moves, only aimed at pushing the problem beyond the horizon, are privileged
– Bishop in a7, that can be taken in 3 moves by the tower, is dead
– Placing the white king under chessmate with the pawn in e5 and then with the one in f4 ...
temporarily avoids the
problem, but it is a
useless sacrifice of pawns
- cuts
● So, computers that play, simply search in trees according to certain mathematical properties
●
So, they consider also moves and nodes that will never occur
●
Need to reduce the search space
– Most popular technique: - cut
- cuts
● Example
●
As soon as one discovers that the move toward c is a winning one, one does not care expanding nodes b and d
●
Nodes under b
will never affect his choice
- cuts
● Pruning technique to reduce the search space in MIN-MAX algorithms
●
Idea
– The player will never select a node n in the tree for his move if a better choice m is available at the parent node’s level or in any previous choice point
●
If this conclusion is reached, n can be dropped – Let
●
be the value of the best choice found on the path of MAX (the largest)
●
be the value of the best choice found on the path of MIN (the smallest)
– Update and and prune when finding worse values
- cuts
● General principles
●
Branch&Bound technique used to drop paths whose partial solutions are worse than known solutions
– Need to define the bound (best path explored so far) on which deciding if a path is to be abandoned
– MIN-MAX considers two bounds (one per player)
●
value: lower bound for the value assigned to a maximizing node
●
value: upper bound for the value assigned to a minimizing node
- Procedure
● Search reduced by keeping track of the bounds on backpropagated values
●
Bounds ( and values) updated at each step during search
– Value for a MAX node set equal to the current largest backpropagated final value of its successors
– Value for a MIN node set equal to the current smallest backpropagated final value of its successors
●
BUT
– values of MAX-nodes (including the start node) cannot ever decrease
– values of MIN-nodes cannot ever increase
- Procedure
● Rules to stop search
●
Search can be stopped
●
-cut:
– Under any MIN node n having less than or equal to any of its predecessor MAX nodes
●
The final backpropagated value of n may be set to its
●
-cut:
– Under any MAX node n having greater than or equal to any of its predecessor MIN nodes
●
The final backpropagated value of n may be set to its
- cuts
● - strategy
●
Generate tree depth-first, left-to-right
●
Propagate (estimated) values from leaves up
– Example:
- cuts
● Terminology
●
-values: (temporary) values in MAX-nodes
●
-values: (temporary) values in MIN-nodes
- Algorithm
●
- principle
– Stop generation of descendent’s children IF
●
an -value is greater than or equal to the -value
●
a -value is less than or equal to the -value of a descendent node
- Algorithm
● MIN-MAX with -
11 evaluations avoided!
- cuts
X X
X 0 X
0 X
0 X
0
X 0 X 0
Start Alpha = -1
A -1 B
C Beta = -1
1 0 1 0 -1 -1
- Algorithm
● [Russell-Norvig] (see MIN-MAX)
- Algorithm
● [Russell-Norvig] (see MIN-MAX)
- Pruning
● Implementation
●
Proceed in depth up to the desired level
●
While backpropagating the values, decide if exploration of the sub-tree can be abandoned
– MaxValue and MinValue called with two reference values:
– (initially –) and (initially +):
●
Respectively represent the best option for MAX and for MIN up to that moment
– Cuts happen when, during backpropagation:
●
v for MAX nodes (-cut)
●
v for MIN nodes (-cut)
- Pruning
● Effectiveness
●
Depends on node ordering
– Best ordering: O(b ½m )
– Worst ordering: O(b m )
– Average ordering: O(b ¾m )
●
E.g.: Chess (average case)
– Opening stages (b 35, let m = 10)
●
MIN-MAX: ca. 2.700.000.000.000.000 nodes
●
-: ca. 380.000.000.000 nodes – Middle stages (b 18, let m = 10)
●
MIN-MAX: ca. 3.500.000.000.000 nodes
●
-: ca. 2.600.000.000 nodes
Node Ordering
A good computer (10
6moves/sec) chooses a move in 4 minutes!
- Pruning
● Pruning effectiveness
– Evaluating always the worst nodes, the progressively evaluated nodes are always in the current line of search and there is no cut
– Best case: best nodes are evaluated first
●
Remaining nodes are always pruned – (clearly, this is only theoretical)
●
Number of nodes reduced from b
dto about b
d/2– branching factor reduced by the square root, i.e., it is
possible to look ahead twice as long in the same time
●
Beginner player Expert player
●
In the average case with random distribution of node values, number of nodes becomes about b
3d/4●
So, a good sorting of a node’s children is important
●
Note: all cited results are for an “ideal” game tree with fixed depth and branching factor for all branches and nodes
●
Repeated states, list of closed nodes (see graph search)
- Pruning
● Best case:
●
at each level, best node is on the left
- Pruning
● Perfect example!
Further Readings
● N.J. Nilsson: “Artificial Intelligence”, McGraw- Hill, 2002 (Ch. 12)
● S.J. Russell, P. Norvig: “Artificial Intelligence: A Modern Approach”, Prentice Hall, 1998 (Ch. 8)
●