Multi-Agent Planning:
Models, Algorithms, Complexity
Ronen I. Brafman
Computer Science Department Ben-Gurion University
Joint work with Raz Nissim (BGU) and Carmel Domshlak (Technion)
Part II:
Multi-Agent Distributed Search,
Privacy, and Selfish Agents
Perspectives We Study
Agents as Abstractions
Agents as Components
Part I Part
I
Perspectives We Study
Agents as Abstractions
Agents as Components
Privacy Preserving Agents with Common Goal
Goal: A group of agents seek to solve a
common goal, but they do not want to share private information
Example: Companies collaborating on a project, different departments in one company
Example: Agents with complex models that are difficult to transmit, or proprietary
Result: Fully distributed, privacy preserving,
algorithms
Perspectives We Study
Agents as Abstractions
Agents as Components
Privacy Preserving Agents with Common Goal
Selfish Agents with Shared Task
Setting: group of utility maximizing agents with a common task
Example: a group of contractors required for building a house
Result: distributed, strategy-proof, socially efficient, privacy preserving mechanism
Also: stable coalitions and stable plans
In lay terms: a distributed protocol that leads to an optimal solution, preserves privacy, and agents have no incentive to lie or deviate from it.
Core Model: MA-STRIPS
All variants we study are based on MA-STRIPS
MASTRIPS = STRIPS + associate an agent with each action
STRIPS Model:
(States, Actions, Initial State, Goal Condition)
MA-STRIPS Model:
(States, Actions
1,…,Actions
m, Initial State, Goal
Condition)
Logistics Example
Each truck and each airplane is an agent
Truck actions: load into, unload from, move truck
Load(p?,Ital-Truck,l?), Move(Ital-Truck,l1?,l2?)
Plane actions: load into, unload from, fly plane
Notice:
move action involves truck only
load action implies multiple agents
If truck loads a package at the airport, plane cannot load it
Private and Public Variables
Variable v is private to A
iif it does appear in the description of any action of another
agent
Example: at(Ital-Track, VRN)
All other variables are public
Example: at(p1, TLV)
Private and Public Actions
Action a
iεA
iis private to A
iif its description contains only private variables
Example: move(Isra-Truck, BS, TLV)
Private actions don’t affect nor are they affected by actions of other agents
All other actions are public
Example: Load(p1,Ital-Truck, BresciaPO)
A public action may have private
precondition: at(Ital-Truck, BresciaPO)
MAP as
CSP+Planning
CSP makes sure that agent’s public plans are consistent
Local planning extends public plan of each agent into a full plan
If the coordination is simple, local planning
requires planning on much smaller domains with fewer variables
Should be exponentially easier to solver each such problem
Main complexity factor: difficulty of solving the
global CSP
Plane
L(3,TLV),U(4,VRN),L(5,VRN),U(6 ,TLV)
Plane
L(3,TLV),U(4,VRN),L(5,VRN),U(6 ,TLV)
Isra- Truck
L(1,BS),U(2,TLV),L(9,TLV),U (10,BS)
Isra- Truck
L(1,BS),U(2,TLV),L(9,TLV),U (10,BS)
Ital-Truck
L(1,B),U(2,VRN),L(5VRN), U(6,B)
Ital-Truck
L(1,B),U(2,VRN),L(5VRN), U(6,B)
Plane
L(3,TLV),fly(TLV,VRN),U(4,VRN), L(5,VRN),fly(VRN,TLV),U(6,TLV)
Plane
L(3,TLV),fly(TLV,VRN),U(4,VRN), L(5,VRN),fly(VRN,TLV),U(6,TLV)
Isra- Truck
L(1,BS),Move(BS,TLV),U(2,T LV),
L(9,TLV),Move(TLV,BS), U(10,BS)
Isra- Truck
L(1,BS),Move(BS,TLV),U(2,T LV),
L(9,TLV),Move(TLV,BS), U(10,BS)
Ital-Truck
L(1,BR),Move(BR,VRN), U(2,VRN),
L(5,VRN),Move(VRN,BR),U(6, BR)
Ital-Truck
L(1,BR),Move(BR,VRN), U(2,VRN),
L(5,VRN),Move(VRN,BR),U(6, BR)
13
Practical Difficulties
CSP method does not scale up well when delta > 3
Solution: adapt state-of-the-art heuristic forward search algorithms to MAP
Not all problems have natural distributed structure; a problem when trying to solve general single-agent problems
Answer 1: No one said this method works on all domains
Answer 2: Try to discover such structure, if possible
Now!Now!
Part I Part I
If You’re Using Forward
Search, Why not Centralized?
Why not send models to one agent who will perform planning
Less robust (prone to failures)
Problematic if agent models are complex
Generative action model implemented as a simulator
Requires electing a trusted center and having it do all the work
Main Issue: Requires Giving Up Private Information
Privacy
Agents do not always want to share all their information with the other agents
Ad-hoc teams of agents
Different companies
Agents seeking to maximize their utility will lie when beneficial
In MAP: private variables and their values and private actions may contain confidential business data, such as inventory levels or local processes
Public actions/variables are its interface to the world
Private actions and variables are just that – private
Multi-Agent Forward Search
A multi-agent version of forward search
Can be used for optimal planning (MAD-A*)
Can be used for satisficing search (MAD-X)
The algorithm is inherently distributed
Has a built-in pruning effect
Does not require sharing private
information
MAD-A*
Each agent perform A* search with the following changes:
Maintain own open and closed list
Expand a node using its own actions only
Send this node to other agents who have an operator applicable to this node
Run a distributed optimal solution detection procedure
Method outperforms all current MA planning algorithms
In fact, was able to solve 6 previously unsolvable single- agent problems (most likely thanks to the pruning effect)
MAD-A*
Each agent performs the following:
Initialize open and closed list to empty
Special initial agent initializes to initial state
Retrieve first node n from open list
If n is a solution, perform distributed optimality check
Otherwise, expand n using own actions actions only
Compute h-value and add to open list all children n’
If n’ was obtained by applying a public action
Send n’ to all agents to which n’ is relevant
Message contains n’, g and h values, and creating action
27
Super-linear Speedup?
Notice that in some cases we get super- linear speedup
This implies that this is more than simple distribution
Where does the extra power come from?
Pruning!
Pruning Effect of MAD-A*
Recall: agents send other agents a state only if their last action was public
We exclude plans in which there are two adjacent private actions that belong to two different actions!
Each legal plan has a permutation that satisfies this property and is a legal plan with the same outcome
apr1 apr2 apr2 apb2 …. apb5
a
pb1 = apr2 apr2 apb2 …. apb5 apr1a
pb1MA-FD
A general, scalable framework for solving the MA planning problem
Enables developing new algorithms, heuristics, and extensions to richer planning formalism, for optimal and satisfycing planning
Based on FastDownard [Helmert, 2006]
FD is rich with implementations of many heuristics and algorithms
Uses FD’s translator and preprocessor, adjusted to Multi-agent planning
Standard input + # of agents, their names, and their IP address
No shared memory
Agent communicate via TCP/IP
This allows usage on multi-core machines, networked computers/robots, cloud
All heuristics available for FD, are immediately accessible
Maintaining Privacy
DisCSP – immediate
Agents only coordinate on their public actions
Forward search
Agent need only expand using their operators
Agents need only know about public actions to determine state relevance
Heuristic function is now computed locally based on the agent’s own model and the public actions of the others
Different agents will have different heuristic values
Not a problem!
MAD-A* remains optimal, provided all heuristics are admissible
Selfish Agents
Goal: getting a group of selfish agent with private information to perform cost-optimal planning
Main contributions:
A centralized VCG mechanism for optimal planning
A privacy preserving, distributed mechanism which is ex-post faithful
Ex-post faithful = it is in the best interest of each
agent to truthfully follow all aspects of the algorithm, regardless of agents’ private values, provided all
other agents follow the algorithm
Selfish Agents
Selfish agents maximize their own utility function
Will not perform computations and take actions that lower their utility
May lie or diverge from agreed protocol if profitable
We assume:
Pi is the payment given to agent i for its participation
Costi(π) is the cost of i’s actions in plan π
u
i( p , P) = P
i- Cost
i( p )
Vickrey-Clarke-Groves Mechanisms
A general technique in mechanism design
Agent report private information to a trusted center
Center computes solution and a payment to be made to the agents
Payments are computed based on the solutions to the marginal problems in which agent i does not participate.
Guarantees
Efficient: outputs solution with maximal sum of utility
This is called maximizing social welfare
Strategy-proof: truth-telling is a weakly dominant strategy
VCG Mechanism for Optimal Multi-Agent Planning
Agents report their private actions and all action costs to a center
Given agents’ reports, the center finds an optimal solution π* as well as π-i* for each marginal planning problem Π-i*
Each agent receives the following payment:
Payments are the social cost/benefit of the agent – the impact its participation has on others’ utilities
Main drawback: centralized, and requires revealing information to the center
P
i= (Cost
j(
j¹i
å p
-i) - Cost
j( p *))
Example – Logistics Domain
3 agents: A1,A2,A3 2 packages p1,p2, 2 locations L,R
All agents and packages at L
Package destination is R
Load/Unload costs as follows:
P1: A1 = 1, A2 = 2, A3 = 2
P2: A1 = 2, A2 = 1, A3 = 2
Move costs 1 for all agents
Optimal solution cost = 6
A1 handles P1, A2 handles P2
Optimal cost of Π-A1, Π-A2 = 8
Replacing A1 or A2 by A3 adds 2
Ρ1 = Ρ2 = 8 – 3 = 5
Optimal solution cost to Π-A3 = 6
Ρ3 = 6 – 6 = 0
P
i= (Cost
j(
j¹i
å p
-i) - Cost
j( p *))
Distributed Mechanism
Goal: preserve privacy while maintaining VCG’s strong guarantees, and supporting a distributed
implementation
Solution:
The agents compute the solution as well as the payments
We use MAD-A* to maintain privacy
Agents have new opportunities to manipulate the mechanism beyond lying about costs, such as not performing all steps
We prove that they have no motivation to deviate individually
Selfish MAD-A*
Run MAD-A* on Π with all agents to find π*
For all agents i
Run MAD-A* on Π-i with all agents except i to find π- i*
Agents j≠i compute and send it to the bank
The bank computes
Bank pays P
i to every agent i and broadcasts π* toall agents
P
ijis the cost j incurs if i is removed
P
ij= (Cost
j(
j¹i
å p *
-i) -Cost
j( p *))
Pi = Pij
j¹i
å
Optimizing Selfish MAD-A*
Selfish MAD-A* requires running m+1 sequential search runs, duplicating search effort
We can improve this by executing a multi-goal search on one (distributed) search space
States are additionally identified by the agent participating in their formation
Goal state is treated as a candidate solution for Π
-I
if i does not participate in the plan leading to it
States irrelevant to the remaining unsolved
problems are pruned
Experimental
Results
Discussion
The person who “orders” the plan from the team has to pay more than the cost of the optimal plan
In cases where some agent has very high value, this can be substantial
Makes sense either if agents are replaceable
E.g., there are many electricians and plumbers
Or if plan optimality is very important
E.g., a government wants to build a road quickly
A well known issue in mechanism design that arises in routing, too
In planning and routing we seek a least cost path from source to sink
Planning Games
Planning games formalize and study situation where the goal is not an optimal plan, but rather a stable plan
Intuitively, a plan is stable if the agents that perform it do not have motivation to quit the team
There are different ways of formalizing them
The plan must also specify payment to each agent
Combine ideas from planning and coalitional game theory
Coalitional game theory studies stable coalitions
The planning model is MASTRIPS
The algorithms extend our CSP formulation to handle stability
Summary
Using the MASTRIPS problem, we investigated planning problems in which agents are not fully cooperative
May want to maintain private information
May have own utility function
MAD-A* provides a very efficient, privacy
preserving method for solving these problems
This method forms the basis for practical
planning algorithms in both contexts
Future Work
How private is private?
Agents do not explicitly report their private information, but some of it may be deduced by others based on information sent during the
algorithm
Handling more complex models
Interacting actions
Uncertainty
Task allocation – can be captured by giving different agents similar actions
Applications
More realistic models for certain settings
How do we get this to work when some agents are indispensable
How do we get this to work when some agents are replaceable
References
Nissim & Brafman: Cost-optimal planning by self-interested agents. AAAI 2013
Nissim, Apsel & Brafman: Tunneling and Decomposition-Based State Reduction for Optimal Planning. ECAI 2012
Nissim & Brafman: Multi-Agent A* for Parallel and Distributed Systems. AAMAS 2012
Brafman & Domshlak: On the complexity of planning for agents teams and its implications for single agent planning. AIJ 198: 52-71 (2013)
Brafman, Domshlak, Engel & Tennenholtz: Transferrable Utility Planning Games. AAAI 2010
Nissim, Brafman & Domshlak: A general, fully distributed multi- agent planning algorithm. AAMAS 2010
Brafman, Domshlak, Engel & Tennenholtz: Planning Games.
IJCAI 2009