DISI, Università di Trento, Italy – roberto.sebastiani@unitn.it http://disi.unitn.it/rseba/DIDATTICA/fai_2020/
Teaching assistant: Mauro Dragoni – dragoni@fbk.eu http://www.maurodragoni.com/teaching/fai/
M.S. Course “Artificial Intelligence Systems”, academic year 2020-2021
Last update: Friday 18thDecember, 2020, 16:39
Copyright notice: Most examples and images displayed in the slides of this course are taken from [Russell & Norwig, “Artificial Intelligence, a Modern Approach”, 3rded., Pearson],
including explicitly figures from the above-mentioned book, so that their copyright is detained by the authors.
A few other material (text, figures, examples) is authored by (in alphabetical order):Pieter Abbeel,Bonnie J. Dorr,Anca Dragan,Dan Klein,Nikita Kitaev,Tom Lenaerts,Michela Milano,Dana Nau,Maria Simi, who detain its copyright.
These slides cannot can be displayed in public without the permission of the author.
1 / 35
1
Bayesian Networks
2
Constructing Bayesian Networks
3
Exact Inference with Bayesian Networks
2 / 35
1
Bayesian Networks
2
Constructing Bayesian Networks
3
Exact Inference with Bayesian Networks
3 / 35
Bayesian Networks
Bayesian Networks (aka Belief Networks):
allow for compact specification of full joint distributions represent explicit conditional dependencies among variables:
an arc from X to Y means that X has a direct influence on Y Syntax: a directed acyclic graph (DAG):
each node represents a random variable (discrete or continuous) directed arcs connect pairs of nodes: X → Y (X is a parent of Y) a conditional distribution P(X
i|Parents(X
i)) for each node X
iConditional distribution represented as a conditional probability table (CPT)
distribution over X
ifor each combination of parent values Topology encodes conditional independence assertions:
Toothache and Catch conditionally independent given Cavity
Tootchache, Catch depend on Cavity
Weather independent from others No arc ⇐⇒ independence
4 / 35
Bayesian Networks
Bayesian Networks (aka Belief Networks):
allow for compact specification of full joint distributions represent explicit conditional dependencies among variables:
an arc from X to Y means that X has a direct influence on Y Syntax: a directed acyclic graph (DAG):
each node represents a random variable (discrete or continuous) directed arcs connect pairs of nodes: X → Y (X is a parent of Y) a conditional distribution P(X
i|Parents(X
i)) for each node X
iConditional distribution represented as a conditional probability table (CPT)
distribution over X
ifor each combination of parent values Topology encodes conditional independence assertions:
Toothache and Catch conditionally independent given Cavity
Tootchache, Catch depend on Cavity
Weather independent from others No arc ⇐⇒ independence
4 / 35
Bayesian Networks
Bayesian Networks (aka Belief Networks):
allow for compact specification of full joint distributions represent explicit conditional dependencies among variables:
an arc from X to Y means that X has a direct influence on Y Syntax: a directed acyclic graph (DAG):
each node represents a random variable (discrete or continuous) directed arcs connect pairs of nodes: X → Y (X is a parent of Y) a conditional distribution P(X
i|Parents(X
i)) for each node X
iConditional distribution represented as a conditional probability table (CPT)
distribution over X
ifor each combination of parent values Topology encodes conditional independence assertions:
Toothache and Catch conditionally independent given Cavity
Tootchache, Catch depend on Cavity
Weather independent from others No arc ⇐⇒ independence
4 / 35
Syntax: a directed acyclic graph (DAG):
each node represents a random variable (discrete or continuous) directed arcs connect pairs of nodes: X → Y (X is a parent of Y) a conditional distribution P(X
i|Parents(X
i)) for each node X
iConditional distribution represented as a conditional probability table (CPT)
distribution over X
ifor each combination of parent values Topology encodes conditional independence assertions:
Toothache and Catch conditionally independent given Cavity
Tootchache, Catch depend on Cavity
Weather independent from others
No arc ⇐⇒ independence
( c S. Russell & P. Norwig, AIMA) 4 / 35Syntax: a directed acyclic graph (DAG):
each node represents a random variable (discrete or continuous) directed arcs connect pairs of nodes: X → Y (X is a parent of Y) a conditional distribution P(X
i|Parents(X
i)) for each node X
iConditional distribution represented as a conditional probability table (CPT)
distribution over X
ifor each combination of parent values Topology encodes conditional independence assertions:
Toothache and Catch conditionally independent given Cavity
Tootchache, Catch depend on Cavity
Weather independent from others
No arc ⇐⇒ independence
( c S. Russell & P. Norwig, AIMA) 4 / 35Example (from Judea Pearl, UCLA)
“The burglary alarm goes off very likely on burglary and occasionally on earthquakes. John and Mary are neighbors who agreed to call when the alarm goes off. Their reliability is different ...”
Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal” knowledge:
A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call CPTs:
alarm setoff if bunglar in 94% of cases alarm setoff if hearthq.
in 29% of cases false alarm setoff in 0.1% of cases
5 / 35
Network topology reflects “causal” knowledge:
A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call CPTs:
alarm setoff if bunglar in 94% of cases alarm setoff if hearthq.
in 29% of cases false alarm setoff in 0.1% of cases
( c S. Russell & P. Norwig, AIMA)
5 / 35
Network topology reflects “causal” knowledge:
A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call CPTs:
alarm setoff if bunglar in 94% of cases alarm setoff if hearthq.
in 29% of cases false alarm setoff in 0.1% of cases
( c S. Russell & P. Norwig, AIMA)
5 / 35
Network topology reflects “causal” knowledge:
A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call CPTs:
alarm setoff if bunglar in 94% of cases alarm setoff if hearthq.
in 29% of cases false alarm setoff in 0.1% of cases
( c S. Russell & P. Norwig, AIMA)
5 / 35
A CPT for Boolean X
iwith k Boolean parents has 2
krows for the combinations of parent values each row requires one number p for P(X
i= true) (P(X
i= false) = 1 − P(X
i= true))
=⇒ If each variable has no more than k parents, the complete network requires O(n · 2
k) numbers
a full joint distribution requires 2
n− 1 numbers linear vs. exponential!
Ex: for burglary example:
1 + 1 + 4 + 2 + 2 = 10 numbers vs. 2
5− 1 = 31
6 / 35
A CPT for Boolean X
iwith k Boolean parents has 2
krows for the combinations of parent values each row requires one number p for P(X
i= true) (P(X
i= false) = 1 − P(X
i= true))
=⇒ If each variable has no more than k parents, the complete network requires O(n · 2
k) numbers
a full joint distribution requires 2
n− 1 numbers linear vs. exponential!
Ex: for burglary example:
1 + 1 + 4 + 2 + 2 = 10 numbers vs. 2
5− 1 = 31
6 / 35
A CPT for Boolean X
iwith k Boolean parents has 2
krows for the combinations of parent values each row requires one number p for P(X
i= true) (P(X
i= false) = 1 − P(X
i= true))
=⇒ If each variable has no more than k parents, the complete network requires O(n · 2
k) numbers
a full joint distribution requires 2
n− 1 numbers linear vs. exponential!
Ex: for burglary example:
1 + 1 + 4 + 2 + 2 = 10 numbers vs. 2
5− 1 = 31
6 / 35
A CPT for Boolean X
iwith k Boolean parents has 2
krows for the combinations of parent values each row requires one number p for P(X
i= true) (P(X
i= false) = 1 − P(X
i= true))
=⇒ If each variable has no more than k parents, the complete network requires O(n · 2
k) numbers
a full joint distribution requires 2
n− 1 numbers linear vs. exponential!
Ex: for burglary example:
1 + 1 + 4 + 2 + 2 = 10 numbers vs. 2
5− 1 = 31
6 / 35
of the local conditional distributions:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
if X
ihas no parent, then prior probability P(X
i)
Intuition: order X
1, ..., X
ns.t. parents(X
i) ≺ X
ifor each i:
P(X
1, ..., X
n) = Q
ni=1
P(X
i|X
1, ..., X
i−1)) // chain rule
= Q
i=1
P(X
I|parents(X
i)) // conditional independence
=⇒ A Bayesian network is a distributed representation of the full joint distribution
7 / 35
of the local conditional distributions:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
if X
ihas no parent, then prior probability P(X
i)
Intuition: order X
1, ..., X
ns.t. parents(X
i) ≺ X
ifor each i:
P(X
1, ..., X
n) = Q
ni=1
P(X
i|X
1, ..., X
i−1)) // chain rule
= Q
i=1
P(X
I|parents(X
i)) // conditional independence
=⇒ A Bayesian network is a distributed representation of the full joint distribution
7 / 35
of the local conditional distributions:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
if X
ihas no parent, then prior probability P(X
i)
Intuition: order X
1, ..., X
ns.t. parents(X
i) ≺ X
ifor each i:
P(X
1, ..., X
n) = Q
ni=1
P(X
i|X
1, ..., X
i−1)) // chain rule
= Q
i=1
P(X
I|parents(X
i)) // conditional independence
=⇒ A Bayesian network is a distributed representation of the full joint distribution
7 / 35
of the local conditional distributions:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
if X
ihas no parent, then prior probability P(X
i)
Intuition: order X
1, ..., X
ns.t. parents(X
i) ≺ X
ifor each i:
P(X
1, ..., X
n) = Q
ni=1
P(X
i|X
1, ..., X
i−1)) // chain rule
= Q
i=1
P(X
I|parents(X
i)) // conditional independence
=⇒ A Bayesian network is a distributed representation of the full joint distribution
7 / 35
of the local conditional distributions:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
if X
ihas no parent, then prior probability P(X
i)
Intuition: order X
1, ..., X
ns.t. parents(X
i) ≺ X
ifor each i:
P(X
1, ..., X
n) = Q
ni=1
P(X
i|X
1, ..., X
i−1)) // chain rule
= Q
i=1
P(X
I|parents(X
i)) // conditional independence
=⇒ A Bayesian network is a distributed representation of the full joint distribution
7 / 35
Global Semantics: Example
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
Ex: “Prob. that both John and Mary call, the alarm sets off but no burglary nor earthquake”
P(j ∧ m ∧ a ∧ ¬b ∧ ¬e) =
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
P(j|m∧a∧¬b ∧ ¬e)P(m|a∧¬b ∧ ¬e)P(a|¬b∧ ¬e)P(¬b|¬e)P(¬e) = P(j|a) P(m|a) P(a|¬b ∧ ¬e) P(¬b) P(¬e) =
0.9 · 0.7 · 0.001 · 0.999 · 0.998
≈ 0.00063
( c S. Russell & P. Norwig, AIMA) 8 / 35
Compute:
The probability that John calls and Mary does not, the alarm is not set off with a burglar entering during an earthquake
The probability that John calls and Mary does not, given a burglar entering the house
The probability of an earthquake given the fact that John has called
...
9 / 35
Theorem: Local semantics holds iff global semantics holds:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
( c S. Russell & P. Norwig, AIMA)
10 / 35
Theorem: Local semantics holds iff global semantics holds:
P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
( c S. Russell & P. Norwig, AIMA)
10 / 35
( c S. Russell & P. Norwig, AIMA)
11 / 35
P(X |U
1, .., U
m, Y
1, .., Y
n, Z
1j, ..., Z
nj), for each X
( c S. Russell & P. Norwig, AIMA) 12 / 35
( c S. Russell & P. Norwig, AIMA)
13 / 35
Verify numerically the two previous examples:
Local Semantics Markov Blanket
14 / 35
1
Bayesian Networks
2
Constructing Bayesian Networks
3
Exact Inference with Bayesian Networks
15 / 35
Given a set of random variables 1. Choose an ordering {X
1, ..., X
n}
in principle, any ordering will work (but some may cause blowups) general rule: follow causality, X ≺ Y if X ∈ causes(Y )
2. For i=1 to n do
1. add X
ito the network
2. as Parents(X
i), choose a subset of {X
1, ..., X
i−1} s.t.
P(X
i|Parents(X
i)) = P(X
i|X
1, ..., X
i−1)
=⇒ Guarantees the global semantics by construction P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
16 / 35
Given a set of random variables 1. Choose an ordering {X
1, ..., X
n}
in principle, any ordering will work (but some may cause blowups) general rule: follow causality, X ≺ Y if X ∈ causes(Y )
2. For i=1 to n do
1. add X
ito the network
2. as Parents(X
i), choose a subset of {X
1, ..., X
i−1} s.t.
P(X
i|Parents(X
i)) = P(X
i|X
1, ..., X
i−1)
=⇒ Guarantees the global semantics by construction P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
16 / 35
Given a set of random variables 1. Choose an ordering {X
1, ..., X
n}
in principle, any ordering will work (but some may cause blowups) general rule: follow causality, X ≺ Y if X ∈ causes(Y )
2. For i=1 to n do
1. add X
ito the network
2. as Parents(X
i), choose a subset of {X
1, ..., X
i−1} s.t.
P(X
i|Parents(X
i)) = P(X
i|X
1, ..., X
i−1)
=⇒ Guarantees the global semantics by construction P(X
1, ..., X
N) = Q
i=1
P(X
i|parents(X
i))
16 / 35
( c S. Russell & P. Norwig, AIMA)
17 / 35
( c S. Russell & P. Norwig, AIMA)
17 / 35
( c S. Russell & P. Norwig, AIMA)
17 / 35
( c S. Russell & P. Norwig, AIMA)
17 / 35
( c S. Russell & P. Norwig, AIMA)
17 / 35
Ex: 1+2+4+2+4 = 13 numbers needed (rather than 10) Can be much worse
ex: try {M, J, E, B, A} (see AIMA) Much better with causal orderings
ex: try either {B, E, A, J, M}
{E, B, A, J, M}
{B, E, A, M, J}
{E, B, A, M, J}
i.e. {B, E} ≺ A ≺ {J, M}
(both B and E cause A, A causes both M and J)
( c S. Russell & P. Norwig, AIMA) 18 / 35
Ex: 1+2+4+2+4 = 13 numbers needed (rather than 10) Can be much worse
ex: try {M, J, E, B, A} (see AIMA) Much better with causal orderings
ex: try either {B, E, A, J, M}
{E, B, A, J, M}
{B, E, A, M, J}
{E, B, A, M, J}
i.e. {B, E} ≺ A ≺ {J, M}
(both B and E cause A, A causes both M and J)
( c S. Russell & P. Norwig, AIMA) 18 / 35
Ex: 1+2+4+2+4 = 13 numbers needed (rather than 10) Can be much worse
ex: try {M, J, E, B, A} (see AIMA) Much better with causal orderings
ex: try either {B, E, A, J, M}
{E, B, A, J, M}
{B, E, A, M, J}
{E, B, A, M, J}
i.e. {B, E} ≺ A ≺ {J, M}
(both B and E cause A, A causes both M and J)
( c S. Russell & P. Norwig, AIMA) 18 / 35
Ex: 1+2+4+2+4 = 13 numbers needed (rather than 10) Can be much worse
ex: try {M, J, E, B, A} (see AIMA) Much better with causal orderings
ex: try either {B, E, A, J, M}
{E, B, A, J, M}
{B, E, A, M, J}
{E, B, A, M, J}
i.e. {B, E} ≺ A ≺ {J, M}
(both B and E cause A, A causes both M and J)
( c S. Russell & P. Norwig, AIMA) 18 / 35
add Independent failure probability q
ifor each cause U
i=⇒ P(¬X |U
1...U
j, ¬U
j+1...¬U
k) = Q
j i=1q
inumber of parameters linear in number of parents!
Ex: q
Cold= 0.6, q
Flu= 0.2, q
Malaria= 0.1:
( c S. Russell & P. Norwig, AIMA)
19 / 35
add Independent failure probability q
ifor each cause U
i=⇒ P(¬X |U
1...U
j, ¬U
j+1...¬U
k) = Q
j i=1q
inumber of parameters linear in number of parents!
Ex: q
Cold= 0.6, q
Flu= 0.2, q
Malaria= 0.1:
( c S. Russell & P. Norwig, AIMA)
19 / 35
add Independent failure probability q
ifor each cause U
i=⇒ P(¬X |U
1...U
j, ¬U
j+1...¬U
k) = Q
j i=1q
inumber of parameters linear in number of parents!
Ex: q
Cold= 0.6, q
Flu= 0.2, q
Malaria= 0.1:
( c S. Russell & P. Norwig, AIMA)
19 / 35
1. Consider the probabilistic Wumpus World of previous chapter (a) Describe it as a Bayesian network
20 / 35
1
Bayesian Networks
2
Constructing Bayesian Networks
3
Exact Inference with Bayesian Networks
21 / 35
E/e: the set of evidence variables {E
1, ..., E
m} and of evidence values {e
1, ..., e
m}
Y/y: the set of unknown variables (aka hidden variables) {Y
1, ..., Y
l} and unknown values {y
1, ..., y
l}
=⇒ X = X ∪ E ∪ Y
A typical query asks for the posterior probability distribution:
P( X | E=e) (also written P( X | e))
Ex: P(Burglar |JohnCalls = true, MaryCalls = true) query: Burglar
evidence variables: E = {JohnCalls, MaryCalls}
hidden variables: Y = {Earthquake, Alarm}
22 / 35
E/e: the set of evidence variables {E
1, ..., E
m} and of evidence values {e
1, ..., e
m}
Y/y: the set of unknown variables (aka hidden variables) {Y
1, ..., Y
l} and unknown values {y
1, ..., y
l}
=⇒ X = X ∪ E ∪ Y
A typical query asks for the posterior probability distribution:
P( X | E=e) (also written P( X | e))
Ex: P(Burglar |JohnCalls = true, MaryCalls = true) query: Burglar
evidence variables: E = {JohnCalls, MaryCalls}
hidden variables: Y = {Earthquake, Alarm}
22 / 35
Inference by Enumeration
We defined a procedure for the task as:
P(X |e) = αP(X , e) = α P
y
P(X , e, y)
=⇒ P(X , e, y) can be rewritten as product of prior and conditional probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
23 / 35
Inference by Enumeration
We defined a procedure for the task as:
P(X |e) = αP(X , e) = α P
y
P(X , e, y)
=⇒ P(X , e, y) can be rewritten as product of prior and conditional probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
23 / 35
probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
( c S. Russell & P. Norwig, AIMA)
23 / 35
probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
( c S. Russell & P. Norwig, AIMA)
23 / 35
probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
( c S. Russell & P. Norwig, AIMA)
23 / 35
probabilities according to the Bayesian Network
then apply factorization and simplify algebraically when possible Ex:
P(B|j, m) = α P
e
P
a
P(B, e, a, j, m) = α P
e
P
a
P(B)P(e)P(a|B, e)P(j|a)P(m|a) = αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a)P(m|a)
=⇒ P(b|j, m) = αP(b) P
e
P(e) P
a
P(a|b, e)P(j|a)P(m|a) Recursive depth-first enumeration:
O(n) space, O(2
n) time with n propositional variables Enumeration is inefficient: repeated computation
( c S. Russell & P. Norwig, AIMA)
23 / 35
( c S. Russell & P. Norwig, AIMA)
24 / 35
( c S. Russell & P. Norwig, AIMA)
Repeated computation: P(j|a)P(m|a) & P(j|¬a)P(m|¬a) for each value of e
25 / 35
( c S. Russell & P. Norwig, AIMA)
Repeated computation: P(j|a)P(m|a) & P(j|¬a)P(m|¬a) for each value of e
25 / 35
( c S. Russell & P. Norwig, AIMA)
Repeated computation: P(j|a)P(m|a) & P(j|¬a)P(m|¬a) for each value of e
25 / 35
( c S. Russell & P. Norwig, AIMA)
Repeated computation: P(j|a)P(m|a) & P(j|¬a)P(m|¬a) for each value of e
25 / 35
1. Consider the probabilistic Wumpus World of previous chapter (a) Describe it as a Bayesian network
(b) Compute the query P(P
1,3|b
∗, p
∗) via enumeration (c) Compare the result with that of the example in Ch. 13
26 / 35
Inference by Variable Elimination
Variable elimination:
carry out summations right-to-left (i.e., bottom-up in the tree) store intermediate results (factors) to avoid recomputation Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
Ex: P(B|j, m)
= α
B
z }| { P(B) P
e E
z }| { P(e) P
a A
z }| { P(a|B, e)
J
z }| { P(j|a)
M
z }| { P(m|a)
= αP(B) P
e
P(e) P
a
P(a|B, e)P(j|a) × f
M(A)
= αP(B) P
e
P(e) P
a
P(a|B, e) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) P
a
f
A(A, B, E ) × f
J(A) × f
M(A)
= αP(B) P
e
P(e) × f
AJM(B, E ) (sum out A)
= αP(B) P
e
f
E(E ) × f
AJM(B, E ) (sum out A)
= αP(B) × f
E AJM(B) (sum out E )
= α × f
B(B) × f
E AJM(B)
“×” is the pointwise product (see later) f
M(A)
def=
P(m| a) P(m|¬a)
, f
J(A)
def=
P(j| a) P(j|¬a)
, ...
f
A...(.): summation over the values of A...
( c S. Russell & P. Norwig, AIMA) 27 / 35
a
n1a
n1... b
n1b
n1... a
n1+ b
n1a
n1+ b
n1...
must have the same argument variables
Pointwise product: Multiply the array elements for the same variable values
Ex:
f
J(A) × f
M(A) = P(j| a) P(j|¬a)
× P(m| a) P(m|¬a)
= P(j| a)P(m| a) P(j|¬a)P(m|¬a)
General case:
f
3(X
1, ..., X
j, Y
1, ..., Y
k, Z
1, ..., Z
l) =
f
1(X
1, ..., X
j, Y
1, ..., Y
k) × f
2(Y
1, ..., Y
k, Z
1, ..., Z
l) union of arguments
values: f
3(x, y, z) = f
1(x, y) · f
2(y, z) matrix size: f
1: 2
j+k, f
1: 2
k +l, f
3: 2
j+k +l28 / 35
a
n1a
n1... b
n1b
n1... a
n1+ b
n1a
n1+ b
n1...
must have the same argument variables
Pointwise product: Multiply the array elements for the same variable values
Ex:
f
J(A) × f
M(A) = P(j| a) P(j|¬a)
× P(m| a) P(m|¬a)
= P(j| a)P(m| a) P(j|¬a)P(m|¬a)
General case:
f
3(X
1, ..., X
j, Y
1, ..., Y
k, Z
1, ..., Z
l) =
f
1(X
1, ..., X
j, Y
1, ..., Y
k) × f
2(Y
1, ..., Y
k, Z
1, ..., Z
l) union of arguments
values: f
3(x, y, z) = f
1(x, y) · f
2(y, z) matrix size: f
1: 2
j+k, f
1: 2
k +l, f
3: 2
j+k +l28 / 35
a
n1a
n1... b
n1b
n1... a
n1+ b
n1a
n1+ b
n1...
must have the same argument variables
Pointwise product: Multiply the array elements for the same variable values
Ex:
f
J(A) × f
M(A) = P(j| a) P(j|¬a)
× P(m| a) P(m|¬a)
= P(j| a)P(m| a) P(j|¬a)P(m|¬a)
General case:
f
3(X
1, ..., X
j, Y
1, ..., Y
k, Z
1, ..., Z
l) =
f
1(X
1, ..., X
j, Y
1, ..., Y
k) × f
2(Y
1, ..., Y
k, Z
1, ..., Z
l) union of arguments
values: f
3(x, y, z) = f
1(x, y) · f
2(y, z) matrix size: f
1: 2
j+k, f
1: 2
k +l, f
3: 2
j+k +l28 / 35
f (B, C) =
af
3(A, B, C) = f
3(a, B, C) + f
3(¬a, B, C) =
0.06 0.24 0.42 0.28
+
0.18 0.72 0.06 0.04
=
0.24 0.96 0.48 0.32
( c S. Russell & P. Norwig, AIMA)
29 / 35
f (B, C) =
af
3(A, B, C) = f
3(a, B, C) + f
3(¬a, B, C) =
0.06 0.24 0.42 0.28
+
0.18 0.72 0.06 0.04
=
0.24 0.96 0.48 0.32
( c S. Russell & P. Norwig, AIMA)
29 / 35
( c S. Russell & P. Norwig, AIMA)
Efficiency depends on variable ordering O RDER (...) Efficiency improvements:
factor out of summations factors not depending on sum variable remove irrelevant variables
30 / 35
( c S. Russell & P. Norwig, AIMA)
Efficiency depends on variable ordering O RDER (...) Efficiency improvements:
factor out of summations factors not depending on sum variable remove irrelevant variables
30 / 35
( c S. Russell & P. Norwig, AIMA)
Efficiency depends on variable ordering O RDER (...) Efficiency improvements:
factor out of summations factors not depending on sum variable remove irrelevant variables
30 / 35
Factor Out Constant Factors
If f
1, ..., f
ido not depend on X, then move them out of a P
x
(...):
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
31 / 35
Factor Out Constant Factors
If f
1, ..., f
ido not depend on X, then move them out of a P
x
(...):
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
31 / 35
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
( c S. Russell & P. Norwig, AIMA)
31 / 35
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
( c S. Russell & P. Norwig, AIMA)
31 / 35
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
( c S. Russell & P. Norwig, AIMA)
31 / 35
P
x
f
1× · · · × f
k= f
1× · · · × f
iP
x
(f
i+1× · · · × f
k) = f
1× · · · × f
i× f
XEx: P
a
f
1(A, B) × f
2(B, C)
= f
2(B, C) × P
a
F
1(A, B)
Ex: P(JohnCalls|Burglary = true):
P(J|b) = α P
e
P
a
P
m
P(J, m, b, e, a) = α P
e
P
a
P
m
P(b)P(e)P(a|b, e)P(J|a)P(m|a) = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) P
m
P(m|a)
( c S. Russell & P. Norwig, AIMA)
31 / 35
Remove Irrelevant Variables
Sometimes we fave summations like P
y
P(y |...) P
y
P(y |...) = 1 =⇒ can be dropped Ex: P(JohnCalls|Burglary = true):
P(J|b) = ... = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a)
1
z }| {
P
m
P(m|a) αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) Theorem: For query X and evidence E, Y is irrelevant unless Y ∈ Ancestors(X ∪ E) Ex: X = JohnCalls, E = {Burglary }, and Ancestors({X } ∪ E) = {Alarm, Earthquake}
=⇒ MaryCalls is irrelevant
Related to backward-chaining with Horn clauses
32 / 35
Ex: P(JohnCalls|Burglary = true):
P(J|b) = ... = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a)
1
z }| {
P
m
P(m|a) αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) Theorem: For query X and evidence E, Y is irrelevant unless Y ∈ Ancestors(X ∪ E) Ex: X = JohnCalls, E = {Burglary }, and Ancestors({X } ∪ E) = {Alarm, Earthquake}
=⇒ MaryCalls is irrelevant
Related to backward-chaining with Horn clauses
( c S. Russell & P. Norwig, AIMA)
32 / 35
Ex: P(JohnCalls|Burglary = true):
P(J|b) = ... = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a)
1
z }| {
P
m
P(m|a) αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) Theorem: For query X and evidence E, Y is irrelevant unless Y ∈ Ancestors(X ∪ E) Ex: X = JohnCalls, E = {Burglary }, and Ancestors({X } ∪ E) = {Alarm, Earthquake}
=⇒ MaryCalls is irrelevant
Related to backward-chaining with Horn clauses
( c S. Russell & P. Norwig, AIMA)
32 / 35
Ex: P(JohnCalls|Burglary = true):
P(J|b) = ... = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a)
1
z }| {
P
m
P(m|a) αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) Theorem: For query X and evidence E, Y is irrelevant unless Y ∈ Ancestors(X ∪ E) Ex: X = JohnCalls, E = {Burglary }, and Ancestors({X } ∪ E) = {Alarm, Earthquake}
=⇒ MaryCalls is irrelevant
Related to backward-chaining with Horn clauses
( c S. Russell & P. Norwig, AIMA)
32 / 35
Ex: P(JohnCalls|Burglary = true):
P(J|b) = ... = αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a)
1
z }| {
P
m
P(m|a) αP(b) P
e
P(e) P
a
P(a|b, e)P(J|a) Theorem: For query X and evidence E, Y is irrelevant unless Y ∈ Ancestors(X ∪ E) Ex: X = JohnCalls, E = {Burglary }, and Ancestors({X } ∪ E) = {Alarm, Earthquake}
=⇒ MaryCalls is irrelevant
Related to backward-chaining with Horn clauses
( c S. Russell & P. Norwig, AIMA)
32 / 35