Learning automata

(1)

Learning automata

Gabriele Puppis

(2)

Learning automata …and generalisations!

(3)

Learning scenarios

(4)

Learning scenarios

NNs

(5)

Learning scenarios

NNs Automata

(6)

Learning scenarios

k

NNs Automata

(7)

Learning scenarios

complex functions, e.g. f: ℝ^k → {0,1}

representing pictures of dogs simple data, e.g. x̅ ∈ ℝ^k

NNs

simple functions, e.g. f: Σ* → {0,1}

representing regular L ⊆ Σ*

complex data, e.g. w ∈ Σ*

Automata

(8)

Learning scenarios

k

NNs Automata

(9)

Learning scenarios

(10)

Learning scenarios

Passive

data ^& label

(11)

Learning scenarios

Passive

data ^& label

Active

query

answer

(12)

Learning scenarios

Passive

data ^& label

Active

query

answer more e

ﬃcient sometimes les

s practical

(13)

A bit of history on automata learning

‘67

‘87

’93…

Gold

Angluin

Pitt et al.

’95… Maler et al.

’12… Howar et al.

’14… Maier et al.

passive learning in the limit

active learning with queries

PAC-learning, cryptographic hardness learning regular ω-languages

learning languages over inﬁnite alphabets learning timed languages

’15… Balle et al. spectral techniques for learning

’96… Vilar et al. learning word transformations

’10… Lemay et al. learning tree transformations

’00… Beimel et al. learning weighted and multiplicity automata

‘56 Moore Gedanken-experiments on sequential machines

(14)

A bit of history on automata learning

‘67

‘87

’93…

Gold

Angluin

Pitt et al.

’95… Maler et al.

passive learning in the limit

active learning with queries

PAC-learning, cryptographic hardness learning regular ω-languages

’96… Vilar et al. learning word transformations

’10… Lemay et al. learning tree transformations

’00… Beimel et al. learning weighted and multiplicity automata

‘56 Moore Gedanken-experiments on sequential machines

(15)

Learning as a game

1) Teacher has a secret regular language L0, e.g. represented by DFA A0

Learner initially only knows the underlying alphabet Σ

(16)

Learning as a game

Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:

a) either a membership query “Does w ∈ L0 ?”

b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”)

(17)

Learning as a game

b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:

a) yes if w ∈ L0, no otherwise

b) yes if L0 = L(A) (game ends and Learner wins),

otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)

(18)

Learning as a game

(19)

Observations

• membership queries alone are not suﬃcient for Learner to win

• instead, equivalence queries alone are suﬃcient to win (why? how quick?)

Learning as a game

(20)

Learning as a game

(21)

Learning as a game

Theorem Learner has a strategy to win

in a number of rounds that is polynomial in |A0| [Angluin ’87]

in the form of an algorithm (L* algorithm)

(22)

Learning as a game

MAT (Minimally Adequate Teacher) Sometimes answering an

equivalence query is not practical:

if Teacher knew A0, he could pass this information directly to Learner, so why bothering querying?

Equivalence queries can however be approximated by a series of membership queries: as long as membership in A matches

membership in A0, Learner

assumes he made the correct guess.

This latter setting is often called Black-box learning

(23)

Learning as a game

(24)

Learning as a game

10¢

5¢

10¢

5¢

coﬀe!

(25)

Learning as a game

10¢

5¢

10¢

5¢

coﬀe!

Control theory:

system identiﬁcation, diagram inference

Veriﬁcation: model-learning (in contrast to model-checking)

Language theory:

grammar inference, regular extrapolation

Security:

protoc

ol state f

uzzing

(26)

Learning as a killer app of Myhill-Nerode theorem

Myhill-Nerode equivalence: u ∼^L⁰ v if ….?

(27)

Learning as a killer app of Myhill-Nerode theorem

Myhill-Nerode equivalence: u ∼^L⁰ v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0

(28)

[u]∼^L₀

Learning as a killer app of Myhill-Nerode theorem

(29)

Learning as a killer app of Myhill-Nerode theorem

L0

Σ* - L0

• ∼^L⁰ reﬁnes =^L⁰ Properties:

(30)

Learning as a killer app of Myhill-Nerode theorem

u u a _• ∼^L⁰ reﬁnes =^L⁰

• ∼^L⁰ is right-invariant

(i.e. u ∼^L⁰ v → u a ∼^L⁰ v a) Properties:

(31)

Learning as a killer app of Myhill-Nerode theorem

u u a

• ∼^L⁰ has ﬁnite index iﬀ L0 is regular

• ∼^L⁰ reﬁnes =^L⁰

(32)

Learning as a killer app of Myhill-Nerode theorem

∼ has ﬁnite index

(33)

Learning as a killer app of Myhill-Nerode theorem

Relativization to a test set T: u ≈^L⁰^,^T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0

• ∼^L⁰ has ﬁnite index iﬀ L0 is regular

(34)

Learning as a killer app of Myhill-Nerode theorem

∼ has ﬁnite index

(35)

Learning as a killer app of Myhill-Nerode theorem

(36)

Learning as a killer app of Myhill-Nerode theorem

Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete

(37)

Learning as a killer app of Myhill-Nerode theorem

S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉^L⁰^,T s’

(38)

Learning as a killer app of Myhill-Nerode theorem

S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉^L⁰^,T s’

s s a

sa

(39)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

S = T = {ε} // S is T-minimal, possibly not T-complete Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉^L⁰^,T s’

S ⊆ Σ* is T-complete if ∀ s ∈ S ∀ a ∈ Σ

∃ sa ∈ S sa ≈^L⁰^,^T s a

(40)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

S = T = {ε} // S is T-minimal, possibly not T-complete loop

while S NOT T-complete

let s ∈ S and a ∈ Σ such that

∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}

Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉^L⁰^,T s’

(41)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

S = T = {ε} // S is T-minimal, possibly not T-complete

A = DFA with state set S, transitions δ(s,a) = sa

initial state ε, final states s’ s.t. Membership(s’) loop

∃ sa ∈ S sa ≈^L⁰^,^T s a

(42)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

sa is the unique word in S such that sa ≈L0,T sa (S T-complete → sa exists S T-minimal → sa unique)

(43)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

initial state ε, final states s’ s.t. Membership(s’) if Equivalence(A) // this surely happens

return A // when |S| = index(∼L0) else

let w be the counter-example of equivalence T = T ∪ {suffixes of w}

loop

∃ sa ∈ S sa ≈^L⁰^,^T s a

sa is the unique word in S such that sa ≈L0,T sa (S T-complete → sa exists S T-minimal → sa unique)

(44)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

// this will loop at most index(∼L0) times

(45)

Learning as a killer app of Myhill-Nerode theorem

Learner-strategy

initial state ε, final states s’ s.t. Membership(s’) if Equivalence(A) // this surely happens

return A // when |S| = index(∼L0) else

let w be the counter-example of equivalence T = T ∪ {suffixes of w}

loop

∃ sa ∈ S sa ≈^L⁰^,^T s a

Proof by contradiction:

Assume:

• w = a1 …an counter-example

• ti = ai+1 … an suﬃxes of w

• si state reached by A after reading preﬁx a1 …ai of w

• S is (T∪{t0,…,tn})-complete

Verify by induction on i that w ∈ L0 iﬀ si ti ∈ L0

Conclude w ∈ L0 iﬀ w ∈ L(A) // this will loop at most index(∼L0) times

// S becomes T-incomplete

// and will grow at next iteration…

(46)

An alternative view - Hankel matrices

(47)

An alternative view - Hankel matrices

H ∈ {0,1}^Σ*×Σ*

Hankel matrix of L0:

H(s,t) = Membership(s t)

(48)

An alternative view - Hankel matrices

H ∈ {0,1}^Σ*×Σ*

H(s,t) = Membership(s t) (inﬁnite, highly redundant, but nicely structured)

ε a b aa ab

ε a b aa ab … aba …

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

(49)

An alternative view - Hankel matrices

H ∈ {0,1}^Σ*×Σ*

ε a b aa ab

…

aba

…

a row = a ∼^L⁰-class

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

(50)

An alternative view - Hankel matrices

H ∈ {0,1}^Σ*×Σ*

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

1 0 0 1 0

(51)

An alternative view - Hankel matrices

H ∈ {0,1}^Σ*×Σ*

ε a b aa ab

…

aba

…

a row = a ∼^L⁰-class

a column = a test word t a submatrix = a pair (S,T)

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

1 0 1

0 1 0

1 0 1

(52)

An alternative view - Hankel matrices

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

(53)

An alternative view - Hankel matrices

ε a b aa ab

…

aba

…

11 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

S = {ε} T = {ε}

(54)

An alternative view - Hankel matrices

ε a b aa ab

11 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

S = {ε} T = {ε}

expand S to make it T-complete…

(55)

An alternative view - Hankel matrices

ε a b aa ab

…

aba

…

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

1 0

(56)

An alternative view - Hankel matrices

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

build candidate automaton…

1 0

ε a

a, b a

b

(57)

An alternative view - Hankel matrices

ε a b aa ab

…

aba

…

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

1 0

ε a

a, b a

b

expand T by counter-example w = aba

(58)

An alternative view - Hankel matrices

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

ε a

a, b a

b

expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}

1 0 0

0 1 0

(59)

An alternative view - Hankel matrices

ε a b aa ab

…

aba

…

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

ε a

a, b a

b

1 0 0

0 1 0

(60)

An alternative view - Hankel matrices

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

ε a

a, b a

b

1 0 0

0 1 0

0 0 0

(61)

An alternative view - Hankel matrices

ε a b aa ab

…

aba

…

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

… … … … … … … …

0 0 0 0 0 … 0 …

… … … … … … … …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

ε a

a, b a

b

S = {ε, a, ab} T = {ε, a, ba, aba}

1 0 0

0 1 0

0 0 0

ε a

a, b a

b

ab build candidate automaton…

(62)

An alternative view - Hankel matrices

ε a b aa ab

1 0 0 1 0 … 0 …

0 1 0 0 0 … 0 …

1 0 0 1 0 … 0 …

0 0 0 0 0 … 0 …

S = {ε} T = {ε}

S = {ε, a} T = {ε}

ε a

a, b a

b

1 0 0

0 1 0

0 0 0

(63)

From word languages to word functions

(64)

From word languages to word functions

Automata can be used to represent not only languages, but also functions on words f: Σ* → Γ*

e.g. f(abaab) = abcaacb

(65)

From word languages to word functions

Many variants of automata with outputs. Simplest one is sequential transducer i.e. A = (Σ, Γ, Q, q0, δ) with δ: Q × Σ → Q × Γ* (e.g. δ(q, a)=(q’, ac))

(66)

From word languages to word functions

a / a b / b

b / bc a / ac

(67)

From word languages to word functions

a / a b / b

b / bc a / ac

b / b a / a

Note: sequential transducers can only compute total, monotone functions ( f is monotone if whenever w is preﬁx of w’ then f(w) is preﬁx of f(w’) )

(68)

Learning monotone word functions

(69)

Learning monotone word functions

1) Teacher has a secret function f0: Σ* → Γ*, computed by a seq. transducer A0

Learner initially only knows the input alphabet Σ

(70)

Learning monotone word functions

2) Learner choses a query:

a) either an evaluation query “What is the value of f0(w) ?”

b) or an equivalence query “Is f0 computed by seq. transducer A ?”

(71)

Learning monotone word functions

2) Learner choses a query:

a) either an evaluation query “What is the value of f0(w) ?”

b) or an equivalence query “Is f0 computed by seq. transducer A ?”

3) Teacher answers accordingly:

a) gives value of f0(w)

b) yes if f0 is computed by A (requires an algorithm for testing equivalence), otherwise gives a shortest counter-example w such that f0(w) ≠ A(w)