Learning automata
Gabriele Puppis
Learning automata …and generalisations!
Learning scenarios
Learning scenarios
NNs
Learning scenarios
NNs Automata
Learning scenarios
k
NNs Automata
Learning scenarios
complex functions, e.g. f: ℝk → {0,1}
representing pictures of dogs simple data, e.g. x̅ ∈ ℝk
NNs
simple functions, e.g. f: Σ* → {0,1}
representing regular L ⊆ Σ*
complex data, e.g. w ∈ Σ*
Automata
Learning scenarios
k
NNs Automata
Learning scenarios
Learning scenarios
Passive
data & label
Learning scenarios
Passive
data & label
Active
query
answer
Learning scenarios
Passive
data & label
Active
query
answer more e
fficient sometimes les
s practical
A bit of history on automata learning
‘67
‘87
’93…
Gold
Angluin
Pitt et al.
’95… Maler et al.
’12… Howar et al.
’14… Maier et al.
passive learning in the limit
active learning with queries
PAC-learning, cryptographic hardness learning regular ω-languages
learning languages over infinite alphabets learning timed languages
’15… Balle et al. spectral techniques for learning
’96… Vilar et al. learning word transformations
’10… Lemay et al. learning tree transformations
’00… Beimel et al. learning weighted and multiplicity automata
‘56 Moore Gedanken-experiments on sequential machines
A bit of history on automata learning
‘67
‘87
’93…
Gold
Angluin
Pitt et al.
’95… Maler et al.
passive learning in the limit
active learning with queries
PAC-learning, cryptographic hardness learning regular ω-languages
’96… Vilar et al. learning word transformations
’10… Lemay et al. learning tree transformations
’00… Beimel et al. learning weighted and multiplicity automata
‘56 Moore Gedanken-experiments on sequential machines
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
Observations
• membership queries alone are not sufficient for Learner to win
• instead, equivalence queries alone are sufficient to win (why? how quick?)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
Theorem Learner has a strategy to win
in a number of rounds that is polynomial in |A0| [Angluin ’87]
in the form of an algorithm (L* algorithm)
Learning as a game
1) Teacher has a secret regular language L0, e.g. represented by DFA A0
Learner initially only knows the underlying alphabet Σ 2) Learner choses a query:
a) either a membership query “Does w ∈ L0 ?”
b) or an equivalence query “Is L0 = L(A) ?” (or “Are A0, A equivalent?”) 3) Teacher answers accordingly:
a) yes if w ∈ L0, no otherwise
b) yes if L0 = L(A) (game ends and Learner wins),
otherwise gives a shortest counter-example w ∈ (L0 - L(A)) ∪ (L(A) - L0) (game continues from 2)
MAT (Minimally Adequate Teacher) Sometimes answering an
equivalence query is not practical:
if Teacher knew A0, he could pass this information directly to Learner, so why bothering querying?
Equivalence queries can however be approximated by a series of membership queries: as long as membership in A matches
membership in A0, Learner
assumes he made the correct guess.
This latter setting is often called Black-box learning
Learning as a game
MAT (Minimally Adequate Teacher) Sometimes answering an
equivalence query is not practical:
if Teacher knew A0, he could pass this information directly to Learner, so why bothering querying?
Equivalence queries can however be approximated by a series of membership queries: as long as membership in A matches
membership in A0, Learner
assumes he made the correct guess.
This latter setting is often called Black-box learning
Learning as a game
10¢
5¢
10¢
5¢
5¢
coffe!
MAT (Minimally Adequate Teacher) Sometimes answering an
equivalence query is not practical:
if Teacher knew A0, he could pass this information directly to Learner, so why bothering querying?
Equivalence queries can however be approximated by a series of membership queries: as long as membership in A matches
membership in A0, Learner
assumes he made the correct guess.
This latter setting is often called Black-box learning
Learning as a game
10¢
5¢
10¢
5¢
5¢
coffe!
Control theory:
system identification, diagram inference
Verification: model-learning (in contrast to model-checking)
Language theory:
grammar inference, regular extrapolation
Security:
protoc
ol state f
uzzing
MAT (Minimally Adequate Teacher) Sometimes answering an
equivalence query is not practical:
if Teacher knew A0, he could pass this information directly to Learner, so why bothering querying?
Equivalence queries can however be approximated by a series of membership queries: as long as membership in A matches
membership in A0, Learner
assumes he made the correct guess.
This latter setting is often called Black-box learning
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ….?
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
[u]∼L0
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
L0
Σ* - L0
• ∼L0 refines =L0 Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
u u a • ∼L0 refines =L0
• ∼L0 is right-invariant
(i.e. u ∼L0 v → u a ∼L0 v a) Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
u u a
• ∼L0 has finite index iff L0 is regular
• ∼L0 refines =L0
• ∼L0 is right-invariant
(i.e. u ∼L0 v → u a ∼L0 v a) Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
∼ has finite index
• ∼L0 refines =L0
• ∼L0 is right-invariant
(i.e. u ∼L0 v → u a ∼L0 v a) Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
• ∼L0 has finite index iff L0 is regular
• ∼L0 refines =L0
• ∼L0 is right-invariant
(i.e. u ∼L0 v → u a ∼L0 v a) Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
∼ has finite index
• ∼L0 refines =L0
• ∼L0 is right-invariant
(i.e. u ∼L0 v → u a ∼L0 v a) Properties:
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
s s a
sa
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
S ⊆ Σ* is T-complete if ∀ s ∈ S ∀ a ∈ Σ
∃ sa ∈ S sa ≈L0,T s a
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete
A = DFA with state set S, transitions δ(s,a) = sa
initial state ε, final states s’ s.t. Membership(s’) loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
S ⊆ Σ* is T-complete if ∀ s ∈ S ∀ a ∈ Σ
∃ sa ∈ S sa ≈L0,T s a
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete
A = DFA with state set S, transitions δ(s,a) = sa
initial state ε, final states s’ s.t. Membership(s’) loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
sa is the unique word in S such that sa ≈L0,T sa (S T-complete → sa exists S T-minimal → sa unique)
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete
A = DFA with state set S, transitions δ(s,a) = sa
initial state ε, final states s’ s.t. Membership(s’) if Equivalence(A) // this surely happens
return A // when |S| = index(∼L0) else
let w be the counter-example of equivalence T = T ∪ {suffixes of w}
loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
S ⊆ Σ* is T-complete if ∀ s ∈ S ∀ a ∈ Σ
∃ sa ∈ S sa ≈L0,T s a
sa is the unique word in S such that sa ≈L0,T sa (S T-complete → sa exists S T-minimal → sa unique)
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete
A = DFA with state set S, transitions δ(s,a) = sa
initial state ε, final states s’ s.t. Membership(s’) loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
// this will loop at most index(∼L0) times
Learning as a killer app of Myhill-Nerode theorem
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
Learner-strategy
S = T = {ε} // S is T-minimal, possibly not T-complete
A = DFA with state set S, transitions δ(s,a) = sa
initial state ε, final states s’ s.t. Membership(s’) if Equivalence(A) // this surely happens
return A // when |S| = index(∼L0) else
let w be the counter-example of equivalence T = T ∪ {suffixes of w}
loop
while S NOT T-complete
let s ∈ S and a ∈ Σ such that
∀s’∈S ∃t∈T Membership(s a t) ≠ Membership(s’ t) S = S ∪ {s a}
Learner constructs automata from pairs (S,T) where S ⊆ Σ* is T-minimal & T-complete S ⊆ Σ* is T-minimal if ∀ s ≠ s’ ∈ S s ≉L0,T s’
S ⊆ Σ* is T-complete if ∀ s ∈ S ∀ a ∈ Σ
∃ sa ∈ S sa ≈L0,T s a
Proof by contradiction:
Assume:
• w = a1 …an counter-example
• ti = ai+1 … an suffixes of w
• si state reached by A after reading prefix a1 …ai of w
• S is (T∪{t0,…,tn})-complete
Verify by induction on i that w ∈ L0 iff si ti ∈ L0
Conclude w ∈ L0 iff w ∈ L(A) // this will loop at most index(∼L0) times
// S becomes T-incomplete
// and will grow at next iteration…
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
H ∈ {0,1}Σ*×Σ*
Hankel matrix of L0:
H(s,t) = Membership(s t)
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
H ∈ {0,1}Σ*×Σ*
Hankel matrix of L0:
H(s,t) = Membership(s t) (infinite, highly redundant, but nicely structured)
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
H ∈ {0,1}Σ*×Σ*
Hankel matrix of L0:
H(s,t) = Membership(s t) (infinite, highly redundant, but nicely structured)
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
a row = a ∼L0-class
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
H ∈ {0,1}Σ*×Σ*
Hankel matrix of L0:
H(s,t) = Membership(s t) (infinite, highly redundant, but nicely structured)
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
1 0 0 1 0
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
H ∈ {0,1}Σ*×Σ*
Hankel matrix of L0:
H(s,t) = Membership(s t) (infinite, highly redundant, but nicely structured)
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
a row = a ∼L0-class
a column = a test word t a submatrix = a pair (S,T)
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
1 0 1
0 1 0
1 0 1
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
11 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
S = {ε} T = {ε}
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
11 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
S = {ε} T = {ε}
expand S to make it T-complete…
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
1 0
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
1 0
ε a
a, b a
b
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
1 0
ε a
a, b a
b
expand T by counter-example w = aba
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
ε a
a, b a
b
expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}
1 0 0
0 1 0
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
ε a
a, b a
b
expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}
1 0 0
0 1 0
expand S to make it T-complete…
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
ε a
a, b a
b
expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}
1 0 0
0 1 0
0 0 0
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
…
ε a b aa ab … aba …
aba
…
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
… … … … … … … …
0 0 0 0 0 … 0 …
… … … … … … … …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
ε a
a, b a
b
expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}
expand S to make it T-complete…
S = {ε, a, ab} T = {ε, a, ba, aba}
1 0 0
0 1 0
0 0 0
ε a
a, b a
b
ab build candidate automaton…
An alternative view - Hankel matrices
Myhill-Nerode equivalence: u ∼L0 v if ∀ t ∈ Σ* u t ∈ L0 v t ∈ L0
Relativization to a test set T: u ≈L0,T v if ∀ t ∈ T u t ∈ L0 v t ∈ L0
ε a b aa ab
ε a b aa ab … aba …
1 0 0 1 0 … 0 …
0 1 0 0 0 … 0 …
0 1 0 0 0 … 0 …
1 0 0 1 0 … 0 …
0 0 0 0 0 … 0 …
S = {ε} T = {ε}
expand S to make it T-complete…
S = {ε, a} T = {ε}
build candidate automaton…
ε a
a, b a
b
expand T by counter-example w = aba S = {ε, a} T = {ε, a, ba, aba}
1 0 0
0 1 0
0 0 0
From word languages to word functions
From word languages to word functions
Automata can be used to represent not only languages, but also functions on words f: Σ* → Γ*
e.g. f(abaab) = abcaacb
From word languages to word functions
Automata can be used to represent not only languages, but also functions on words f: Σ* → Γ*
e.g. f(abaab) = abcaacb
Many variants of automata with outputs. Simplest one is sequential transducer i.e. A = (Σ, Γ, Q, q0, δ) with δ: Q × Σ → Q × Γ* (e.g. δ(q, a)=(q’, ac))
From word languages to word functions
Automata can be used to represent not only languages, but also functions on words f: Σ* → Γ*
e.g. f(abaab) = abcaacb
Many variants of automata with outputs. Simplest one is sequential transducer i.e. A = (Σ, Γ, Q, q0, δ) with δ: Q × Σ → Q × Γ* (e.g. δ(q, a)=(q’, ac))
a / a b / b
b / bc a / ac
From word languages to word functions
Automata can be used to represent not only languages, but also functions on words f: Σ* → Γ*
e.g. f(abaab) = abcaacb
Many variants of automata with outputs. Simplest one is sequential transducer i.e. A = (Σ, Γ, Q, q0, δ) with δ: Q × Σ → Q × Γ* (e.g. δ(q, a)=(q’, ac))
a / a b / b
b / bc a / ac
b / b a / a
Note: sequential transducers can only compute total, monotone functions ( f is monotone if whenever w is prefix of w’ then f(w) is prefix of f(w’) )
Learning monotone word functions
Learning monotone word functions
1) Teacher has a secret function f0: Σ* → Γ*, computed by a seq. transducer A0
Learner initially only knows the input alphabet Σ
Learning monotone word functions
1) Teacher has a secret function f0: Σ* → Γ*, computed by a seq. transducer A0
Learner initially only knows the input alphabet Σ
2) Learner choses a query:
a) either an evaluation query “What is the value of f0(w) ?”
b) or an equivalence query “Is f0 computed by seq. transducer A ?”
Learning monotone word functions
1) Teacher has a secret function f0: Σ* → Γ*, computed by a seq. transducer A0
Learner initially only knows the input alphabet Σ
2) Learner choses a query:
a) either an evaluation query “What is the value of f0(w) ?”
b) or an equivalence query “Is f0 computed by seq. transducer A ?”
3) Teacher answers accordingly:
a) gives value of f0(w)
b) yes if f0 is computed by A (requires an algorithm for testing equivalence), otherwise gives a shortest counter-example w such that f0(w) ≠ A(w)