• Non ci sono risultati.

Laurea Laurea MAGISTRALE in MAGISTRALE in COMPUTER SCIENCECOMPUTER SCIENCE

N/A
N/A
Protected

Academic year: 2021

Condividi "Laurea Laurea MAGISTRALE in MAGISTRALE in COMPUTER SCIENCECOMPUTER SCIENCE"

Copied!
14
0
0

Testo completo

(1)

1

Laurea

Laurea MAGISTRALE in MAGISTRALE in COMPUTER SCIENCE COMPUTER SCIENCE

Corso di

ARTIFICIAL INTELLIGENCE Stefano Ferilli

Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.

Search / 2 Search / 2

Search for Games – min-max and alpha-beta Search for Games – min-max and alpha-beta

algorithms algorithms

● Effectiveness of graph search-based methods in defining a problem solving agent’s behavior may decrease when

Perception process does not provide necessary information on the state of the environment

– Noise, problems with sensors, etc.

Actions don’t always have the “modeled” effect

– Imprecise model, imprecise effectors

Other agents exist in the environment that try to hinder the main agent

● Problems:

External world may change while the agent is enacting its plan

The agent may be required to act before it can complete its search up to a goal state

Even when the agent has enough time, there is not enough storage to complete the search

● Need to develop alternate search methods

Single Agent

● Attaining goal through rewards

Assumption of search strategies in the state space:

the agent has a single short-term task, described by a goal state/condition

In many real-world problems defining the goal in advance may be hard,

but it may be possible to express

satisfaction/dissatisfaction for task execution, giving the agent positive/negative rewards

– The agent must maximize the quantity of reward received

– Problem: tasks not defined in advance do not terminate and reward might be infinite (difficult to define how to maximize it). The model establishes that the agent prefers immediate rewards than (distant) future rewards

Single Agent

Assumption: environment is observable

– There are observable states characterized by rewards and transitions specified from state to state

– Regime is stationary (stable)

If learning ability is not provided for, theoretical solution is completely specified

– The problem may be reduced to a search problem using tentative and explorative methods

If it is possible to learn, methods may include reinforcement learning, etc.

Multiagent Systems

The agent tries to solve its task/problem, possibly learning

– The other agents in the environment do the same

The theoretical

solution for

completely specified

problems, not

involving learning, is in

Game Theory

(2)

Games

● A suitable domain to explore machine intelligence

Consist of a structured task in which success or failure are easily measured

Require much knowledge, to reduce combinatorial explosion of the solution space

● Can be analyzed using production systems

Games

● An overview

Add more competing agents

to the domain

Problem Solving

Game Playing

Add complexity:

states (conjunctions of facts), and operators (links between condition-facts

and effect-facts)

Planning

Game Theory

Von Neumann & Morgenstern (1944)

Decision Theory Game Theory

Analyze individual behavior whose actions have a direct effect

Analyze individual behavior whose actions have an effect that depends on the others’ choices

Bets &

Puzzle World

World of multi-player games Theory of Games and Economic Behaviour

Games in AI and beyond

● “Games are not chosen because they are clear and simple, but because they provide

maximum complexity with minimum initial structures”

[M. Minsky, 1968]

● Scientific motivations

Mathematics: graph theory and complexity

Computer Science: databases, parallel computing, etc.

Economiy: game theory, cognitive/experimental economy

Psychology: confidence, risk, etc.

Types of Games

Classification 1: choice conditions

– Games with “perfect” information

Game states are completely explicit for the agents – Games with “imperfect” information

Game states are only partially made explicit

Classification 2: choice effects

– Deterministic games

States are determined solely based on the agents’ actions – Stochastic games

States are determined also based on external factors – Perfect Information E.g.: dices Imperfect Information Deterministic

Games

Chess, Go, Checkers, Othello, Forza4

MasterMind (game or puzzle?) Stochastic

Games

Backgammon, Monopoly

Scrabble, Risiko

Bridge, Poker, … (card games)

Types of Games

● Other classifications

Number of players

– Multi-agent systems

Play turn policy

– Diachronic

Defined/indefinite turns – Sinchronous

Environment

– Discrete vs Continuous

– Static vs Dynamic

– Episodic vs Sequential

Zero-sum Games

Men act in a continuous, dynamic, sequential environment

with synchronous choices

and imperfect

information

(3)

Games as Search Problems

● Theoretical solution: how to choose the best move in a game with limited search space

Extension to more complex games,

in which exhaustive exploration is impossible

Search optimization techniques

What if there are many opponents?

Games as Search Problems

● States = game configurations

● Initial state = initial configuration

● Successor function = legal moves and resulting states

● Termination test = function determining the end of the game

● Utility function (pay-off): numeric value that weights terminal states of the game

E.g.: 1 | –1 | 0, score, …

Generate-and-Test Strategy

● Process enacted by a player to win

~ sequence of generation and test procedures

Test carried out after some (variable) amount of work carried out by the generator statically or dynamically

GENERATOR Tester

generates complete

solutions that Tester evaluates

OR Generates

individual moves that Tester evaluates choosing the most

promising one

Generate-and-Test Strategy

● Improving effectiveness

Improving generation procedure so that only useful moves are generated

Improving test procedure so that best moves are recognized and attempted first

● How to deal with the presence of two opponents?

Adversarial Games

● Why?

Simple and formalizable rules

Deterministic: two players, alternate turns, zero- sum, perfect information (accessible environment)

Presence of opponent makes the problem contingent  more difficult than search problems seen so far

Complexity and real-time constraints: it is only possible to try making the best move in the available time

● Slightly more similar to real problems

Games as Search Problems

● Problem solving (1)

Can a game be analyzed as a search problem, even if multi-agent?

– Example: chess

X = all states of the chessboard

x

0

= initial state of the game

succ(x) = legal moves for a state

T(x) = checkmate

g = number of moves – There’s something wrong!...

g not decisive

succ(x) can be controlled only for half the moves, and often it is not reversible

T(x) insufficient to define termination – Need for a utility function on termination

E.g.: win = +1, draw = 0, loose = –1

– Agent’s goal: defining a strategy to attain T(x) = +1

(4)

Games as Search Problems

● Problem solving (2)

To cast a game with perfect information as a classical search, consider that

– There is an opponent to be simulated

– The opponent minimizes our utility

Search tree developed on 2 players:

– MAX (the computer/us)

– MIN (the opponent)

Goal: reaching a terminal state of the tree which maximizes utility

– If the opponent plays first:

It becomes MAX

We become MIN, with the goal of minimizing utility

Search with Two Opponents

● Representation

Search for a solution in games for many players can be cast as a search process that can still be described using search trees

– Nodes = game situations

– Branches = moves that produce the different configurations

Competition factor must be considered in the special semantics adopted on such trees

– Level of the tree  decision up to one of the two players

– So, each level, in turn, indicates the move of either opponent

Search with Two Opponents

Player A Player A

Player B

Search with Two Opponents

● Example

Game : build, starting from a single stack of objects, many stacks made up of ever different numbers of objects, according to the following rules

– Two players: MAX and MIN. MIN plays first

– One move available to each player:

splitting the stack in two non-equal stacks

Game ends when there are only stacks of 1 or 2 objects – The first player splits the initial stack

– The game proceeds by splitting any of the stacks generated so far

The first player that cannot play is the loser

Search with Two Opponents

● Example (7 objects in the initial stack)

DB = an unordered sequence of numbers (representing the number of objects in the stacks) + indication of who makes the next move

(7, MIN)

6,1, MAX 5,2, MAX 4,3, MAX

5,1,1, MIN 4,2,1, MIN 3,2,2, MIN 3,3,1, MIN 4,1,1,1, MAX 3,2,1,1, MAX 2,2,2,1, MAX

3,1,1,1,1, MIN 2,2,1,1,1, MIN

2,1,1,1,1,1, MAX Winning strategies for MAX

Search with Two Opponents

● In building the search tree (game tree) the iterated process of selecting the first “good”

move may lead to the solution strategy

As usual, one may use

– Breadth-first search

– Depth-first search

– Heuristic methods

With suitable termination conditions

– Time factors, memory limitations, depth of the tree, etc.

(5)

Search with Two Opponents

● MIN-MAX depth-first search procedure

Assumption: search a winning strategy for MAX, who plays first

Procedure based on evaluating the evaluation function e at many levels (stages) of the search

– Search for the first best move for MAX based on maximization of the static evaluation function e

– Next stage: MIN chooses the move

It will try and oppose MAX by choosing a move that minimizes the evaluation function’s value

The MAX node parent of MIN nodes is assigned a backpropagated value equal to the maximum value of e for the MIN nodes

The MIN node, parent of subsequent MAXs, is assigned a backpropagated value equal to the minimum value of e for its successor nodes

Search with Two Opponents

● MIN-MAX depth-first search procedure

– Upon termination of the search, an estimation of the

“best” first move can be extracted from the search graph

Apply the static evaluation function to the leaf nodes in the search graph

Go on backpropagating, level-wise, from leaves up to the root, until the process is completed

Validity of this procedure is based on the

assumption that the values propagated upward for the successors of the starting node are more reliable measures of the actual goodness of these positions than the values computed using the static evaluation function

● Evaluation function

An estimation of the expected utility starting from a given position in the game

Quality of evaluation function is crucial

– Must be consistent with utility if applied to terminal states of the game

Induce the same ordering – Must be efficiently computed

– Must reflect the actual likelihoods of winning

e(A) > e(B) if from A there are more chances of winning than from B

MIN-MAX Algorithm

● Designed to

determine the optimal strategy for MAX

Suggest MAX the best move to make

● Assumption: MIN makes the most favorable move for itself

Path is not relevant, only the first move!

MIN-MAX Algorithm

● Leaves (end game conditions) labeled with 1 and –1

MIN tries to reach –1

– Minimizer

MAX tries to reach +1

– Maximizer

MIN-MAX Algorithm

● High-level description (recursive procedure)

Determine if level is minimization or maximization

If search bound reached

– Compute static value of current position associated to the proper player

– Exit with result

Use MIN-MAX on children of current node

– IF minimization level

choose the minimum of the results – ELSE (maximization level)

choose the maximum of the results

(6)

MIN-MAX Algorithm

● Based on two predicates

gen_moves(POSition, Player)

– Generator of plausible moves

Provides a list of nodes representing applicable moves that Player may make

static(POSition, Player)

– Static evaluation function

Returns a number that represents the goodness of POS from Player’s perspective

– Large values -> good position for the player who is going to move (indicated by even or odd depth)

– Its caller tries to maximize its own score

When value is backpropagated to higher level it is ‘negated’ (to reflect the opponent’s perspective)

By inverting values at alternate levels MINIMAX may be very simple

MIN-MAX Algorithm

● Additional function

deep_enough(POSition, Depth)

– Evaluates termination factors

– Returns TRUE if search must be stopped at current level, or FALSE otherwise

– May take just (Depth) in its simplest implementation

● Returns two objects

Value

– The backpropagated value of the chosen path

Path

– The path itself

MIN-MAX Algorithm

● Note: Neatly separates the search tree generation and position evaluation processes

Positions evaluation starts only after generation is completed

– Since MIN-MAX is a depth process, a path is completely explored and the static evaluation function is applied to the last step of the path so that values can be backpropagated

Inefficient!!!

– Search can be reduced by evaluating the leaf nodes and computing the backpropagated values during tree generation

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

There is no path from this node

 its value is determined by the static evaluation function

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

Generate another

stage of the tree

(7)

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

there are no moves to make

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

the same structure as for deep_enough = True

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

examine in turn each element, recording the best one

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

This variable will be updated to record the best score that can be obtained

from the elements in Successors

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

Recursive call that will examine Succ

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

Reflects the goodness of position

from the opponent’s perspective

(8)

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

found a better successor than those examined so far;

store it

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path Best path known so far is that from Current to Succ, that then goes on along the path starting from Succ as determined by the recursive call to MIN-MAX

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

Now that all successors have been examined, both the value of Position and the path to take from it are known

MIN-MAX Algorithm

– MIN-MAX(Pos,Depth,Player)

IF deep_enough(Pos,Depth) THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Successors  gen_move(Pos,Player)

IF Successors is empty THEN

return Value  static(Pos,Player); Path  NIL ELSE

– Best_score  minimum possible value for static() – For all elements Succ in Successors do

Succ_res  MIN-MAX(Succ,Depth+1,opponent(Player))

New_value  –value(Succ_res)

IF New_value > Best_value THEN

Best_value  New_value

Best_path  Succ & path(Succ_res) – return Value  Best_value; Path  Best_path

MIN-MAX Algorithm

Recursive version ( Depth-first visit)

function MINIMAX-Decision(state) returns an action – v  Max-Value(state)

return the action in Successors(state) with value v

function Max-Value(state) returns a utility valueif Terminal-Test(state) then return Utility(state) – v  –

for a,s in Successors(state) do v  Max(v, Min-Value(s))return v

function Min-Value(state) returns a utility valueif Terminal-Test(state) then return Utility(state) – v  

for a,s in Successors(state) do v  Min(v, Max-Value(s))return v

To label s, Min-Value and Max-Value explore the whole tree under state s

MIN-MAX Algorithm

● Properties

Complete – If tree is finite

Optimal – Against an opponent playing at its best

Space complexity O(bm)

– Depth-first

Time complexity O(b m )

– Very inefficient (exponential) if the whole tree is to be developed

b branching factor, m number of levels

● Need to prune the tree!

E.g., chess

– b  35, m  100 for reasonable matches

 35 100 Impractical!

(9)

MIN-MAX Algorithm

● 3 approaches to reducing the tree

Heuristic evaluation function, used in intermediate (not necessarily terminal) states to stop tree expansion

Ignoring portions of the tree that do not affect final choice (- cuts)

(Parallel multi-processor architectures)

Revised MIN-MAX Algorithm

● Solution to inefficiency (Shannon, 1949)

Apply MIN-MAX up to a given depth

– Look-ahead only for a while and evaluate moves up to a non-terminal node considered as successful

Use a heuristic state evaluation function eval( n)

– Based on the available information, estimates how likely is that move to state n subsequently leads to solution

– eval(n) = –1 surely winning for MIN – eval(n) = +1 surely winning for MAX – eval(n) = 0 about same probabilities – Other intermediate values for eval(n) – Various features of the state can be evaluated

Weighted linear function e(s) = w

1

f

1

(s) + w

2

f

2

(s) + … + w

n

f

n

(s) – or even non-linear combinations of features

Trade-off between search and evaluation function

MIN-MAX with Imperfect Decision

● In more complex cases, eval(s) is needed

Strategy: look-ahead of d levels

– Expand search tree for a given number of levels d

Compliant with available time and space

– Evaluate the obtained states and backpropagate the results using the MAX and MIN rule

MIN-MAX Algorithm as before, but…

if TerminationTest(state) then return utility(state)

if CutTest(state, d) then return eval(state)

MIN-MAX Algorithm

Example: Tic-Tac-Toe

– Consider MAX  X MIN  O

– Carry out Breadth-first search up to depth 2

– Static evaluation function e( p) for position p:

IF p is not a winning position for any player:

e(p) = (no. of rows, columns or diagonals still open for MAX) – (no. of rows, columns or diagonals still open for MIN)

IF p is winning for MAX : e(p) = 

IF p is winning for MIN : e(p) = –

– E.g.:

p = e(p) = 6 – 4 = 2

Use of symmetries:

states considered equal

X 0

X 0

X 0

X 0

X 0

Stage 1

X X

0 X

0 X X 0

X 0

X

0 X

0 X 0

X 0 X 0

X 0 X 0

X 0 e(p)=3

X

e(p)=1

e(p)=0 1

0 -1

-1 0

-1 0

-2

1 2

e(p)=1 e(p)=-

2

Backpropagated value Start node

1

-1

1 -2

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0 X 0

X 0

X 0

X 0 X 0

Stage 2 Start

X node

X 0 X X X X X

X

X X X X X

X

X

X X X X X X

X X X

0 0

0 0 0

0

0 0 0 0 0

0 0

0 0 0 0

0 0

0

1 1

1 0

0 1

2 3 1 2 1

0

1 2 0 1 1

2

2 3 1 2 2

1 1 0

X

(10)

X 0 X 0

X 0 X 0

X X X

0 0

0 0

- 1 0 1

X X 0

X 0 X 0

X 0

X X X

0 0

0 0

- 1 2 X 2

X 0 0 X X 0 0

X X 0

X 0 X 0

X 0

X X X

0 0

0 0

2 X

1 2 1

0 X 0 X

0 X 0 X

X X X

0 0 0 0

X

X 0 X 0

X 0 X 0

X X X X

0 0

0 0

- 0 0 1

X 0 0 X

1 1 - 1

X 0 X 0 1

0 X X 0 -

X 0 X 0 1

-

-

-

Stage 3 Start node

X X

X 0

X X X X X X X

0

0 0 0

0 0 0

0 0

0 0 0

0 0 0

X X X X X X X X

X X X X

0 0 0

0 X

X X

Revised MIN-MAX Algorithm

Example: Chess

– Sum values of pieces owned by a player and normalize the result in [–1,+1]

– Features

Value of piece

– pawn 1, knight or bishop 3, tower 5, queen 9, …

Good layout of pawns

Protection of king – E.g., weighted linear function

e(s) = w

1

f

1

(s) + w

2

f

2

(s) + … + w

n

f

n

(s)

– w

1

= 9, f

1

(s) = (no. white queens) – (no. black queens) – etc.

– Might be refined by considering relative positions

Is the king defended?

Does the pawn protect another piece?

...

Revised MIN-MAX Algorithm II

To evaluate a node n in a game tree

– 1. Put n in not yet expanded nodes L  L = (n)

– 2. Let x be the first node in L. If x = n and it has an assigned value return such a value

– 3. Else if x has an assigned value V

x , let p be the parent of x and V

p its temporary value

If p is a MIN node, V

p

 min(V

p

,V

x

), else V

p

 max(V

p

,V

x

)

Remove x from L and goto 2 – 4. if x is not assigned any value

If it is a terminal node, or we decide not to further expand the tree, assign it a value using evaluation function eval(x). Leave x in L because its ancestors have to be updated and goto 2

If it is not a terminal node,

assign V

x

 – if x is a MAX or V

x

  if x is a MIN Add children of x to L and goto 2

Revised MIN-MAX Algorithm II

How to decide that the tree is not to be further expanded?

Note: if eval(n) were perfect there would not be such problem.

Only root’s children would be expanded to decide what to do

Always expand up to a given depth p

– Feasible and simple (even computationally)

– Problems:

Tactically more complex moves (with values that change more widely for eval(n)) should be evaluated with more depth up to quiescence (values of eval(n) changing very slowly)

Horizon effect: non-particularly-useful moves may delay essential moves beyond the maximum tree depth p, so that they are not taken into account

– Solution

Sometimes it is appropriate to make a secondary search, focused on the chosen move

Problems with MIN-MAX

● Non-quiescent states

Exploration up to a level may show a very advantageous situation

At the next move, the black queen is taken

● Solution:

quiescence search

Applying evaluation to quiescent states in which the evaluation function is not subject to sudden changes

Problems with MIN-MAX

● Horizon effect

It may happen that diversion moves, only aimed at pushing the problem beyond the horizon, are privileged

– Bishop in a7, that can be taken in 3 moves by the tower, is dead

– Placing the white king under chessmate with the pawn in e5 and then with the one in f4 ...

temporarily avoids the

problem, but it is a

useless sacrifice of pawns

(11)

- cuts

● So, computers that play, simply search in trees according to certain mathematical properties

So, they consider also moves and nodes that will never occur

Need to reduce the search space

– Most popular technique: -  cut

- cuts

● Example

As soon as one discovers that the move toward c is a winning one, one does not care expanding nodes b and d

Nodes under b

will never affect his choice

- cuts

Pruning technique to reduce the search space in MIN-MAX algorithms

Idea

– The player will never select a node n in the tree for his move if a better choice m is available at the parent node’s level or in any previous choice point

If this conclusion is reached, n can be dropped – Let

 be the value of the best choice found on the path of MAX (the largest)

 be the value of the best choice found on the path of MIN (the smallest)

– Update  and  and prune when finding worse values

- cuts

● General principles

Branch&Bound technique used to drop paths whose partial solutions are worse than known solutions

– Need to define the bound (best path explored so far) on which deciding if a path is to be abandoned

– MIN-MAX considers two bounds (one per player)

 value: lower bound for the value assigned to a maximizing node

 value: upper bound for the value assigned to a minimizing node

- Procedure

● Search reduced by keeping track of the bounds on backpropagated values

Bounds ( and  values) updated at each step during search

– Value  for a MAX node set equal to the current largest backpropagated final value of its successors

– Value  for a MIN node set equal to the current smallest backpropagated final value of its successors

BUT

 values of MAX-nodes (including the start node) cannot ever decrease

 values of MIN-nodes cannot ever increase

- Procedure

● Rules to stop search

Search can be stopped

-cut:

– Under any MIN node n having  less than or equal to any  of its predecessor MAX nodes

The final backpropagated value of n may be set to its 

-cut:

– Under any MAX node n having  greater than or equal to any  of its predecessor MIN nodes

The final backpropagated value of n may be set to its 

(12)

- cuts

● - strategy

Generate tree depth-first, left-to-right

Propagate (estimated) values from leaves up

– Example:

- cuts

● Terminology

-values: (temporary) values in MAX-nodes

-values: (temporary) values in MIN-nodes

- Algorithm

- principle

– Stop generation of descendent’s children IF

an -value is greater than or equal to the -value

a -value is less than or equal to the -value of a descendent node

- Algorithm

● MIN-MAX with -

11 evaluations avoided!

- cuts

X X

X 0 X

0 X

0 X

0

X 0 X 0

Start Alpha = -1

A -1 B

C Beta = -1

1 0 1 0 -1 -1

- Algorithm

● [Russell-Norvig] (see MIN-MAX)

(13)

- Algorithm

● [Russell-Norvig] (see MIN-MAX)

- Pruning

● Implementation

Proceed in depth up to the desired level

While backpropagating the values, decide if exploration of the sub-tree can be abandoned

– MaxValue and MinValue called with two reference values:

–  (initially –) and  (initially +):

Respectively represent the best option for MAX and for MIN up to that moment

– Cuts happen when, during backpropagation:

v   for MAX nodes (-cut)

v   for MIN nodes (-cut)

- Pruning

● Effectiveness

Depends on node ordering

– Best ordering: O(b ½m )

– Worst ordering: O(b m )

– Average ordering: O(b ¾m )

E.g.: Chess (average case)

– Opening stages (b  35, let m = 10)

MIN-MAX: ca. 2.700.000.000.000.000 nodes

-: ca. 380.000.000.000 nodes – Middle stages (b  18, let m = 10)

MIN-MAX: ca. 3.500.000.000.000 nodes

-: ca. 2.600.000.000 nodes

Node Ordering

A good computer (10

6

moves/sec) chooses a move in 4 minutes!

- Pruning

● Pruning effectiveness

– Evaluating always the worst nodes, the progressively evaluated nodes are always in the current line of search and there is no cut

– Best case: best nodes are evaluated first

Remaining nodes are always pruned – (clearly, this is only theoretical)

Number of nodes reduced from b

d

to about b

d/2

– branching factor reduced by the square root, i.e., it is

possible to look ahead twice as long in the same time

Beginner player  Expert player

In the average case with random distribution of node values, number of nodes becomes about b

3d/4

So, a good sorting of a node’s children is important

Note: all cited results are for an “ideal” game tree with fixed depth and branching factor for all branches and nodes

Repeated states, list of closed nodes (see graph search)

- Pruning

● Best case:

at each level, best node is on the left

- Pruning

● Perfect example!

(14)

Further Readings

● N.J. Nilsson: “Artificial Intelligence”, McGraw- Hill, 2002 (Ch. 12)

● S.J. Russell, P. Norvig: “Artificial Intelligence: A Modern Approach”, Prentice Hall, 1998 (Ch. 8)

Ed. 2005, vol. 1, Ch. 6

Riferimenti

Documenti correlati

CEREALI E COLTIVAZIONI INDUSTRIALI Modalità Provincia Prezzo U.M.(p) Quantità U.M.(q) Grano tenero. Frumento

(prezzi medi della settimana precedente) - I prezzi si intendono per prodotto di I cat. selezionato ed imballato reso franco magazzino produttore a peso netto

Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato

● Abduction then emerges as an important computational paradigm that would be needed for certain problem solving tasks within a declarative representation of the problem

Automatic Inductive and Analogical Reasoning on Concept Networks: a Forensic Application. Candidate: Fabio Leuzzi –

Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato

avena Euro n.q.. orzo vestito naz. arrivo alla rinfusa) Euro n.q... Tritello di

Le quotazioni delle singole regioni si riferiscono ai diversi contratti conclusi nella settimana di riferimento e sono calcolate come media ponderata dei prezzi sulle quantità