Laurea in Computer Science Laurea in Computer Science MAGISTRALE MAGISTRALE

(1)

1 Multistrategy Reasoning / 1 Multistrategy Reasoning / 1

Inferential Theory of Learning – Induction: Inductive Inferential Theory of Learning – Induction: Inductive

Logic Programming Logic Programming

Laurea in Computer Science Laurea in Computer Science

MAGISTRALE MAGISTRALE

Corso di

ARTIFICIAL INTELLIGENCE Stefano Ferilli

Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.

Inferential Theory of Learning

● ITL (Michalski)

● Characterize any learning process as a goal-driven search in a knowledge space

– Defined by the knowledge representation language and the available search operators

– Operators are instantiations of knowledge transmutations that a learner can perform

– Each transmutation takes some input information and/or background knowledge,

and changes various aspects of knowledge

●

Some generate new knowledge

●

Some only manipulate knowledge

– Transmutations can exploit any type of inference

Inferential Theory of Learning

● Learning process = sequence of knowledge transmutations that transform the initial

learner's knowledge to knowledge satisfying the learner's goal

● Generally, a system of goals

● Learning =

Inferencing + Memorizing

Inferential Theory of Learning

● Any learning process characterized as a transformation:

● Given

– Input Knowledge (I)

– Goal (G)

– Background Knowledge (BK)

– Transmutations (T)

● Determine

– Output knowledge (O), that satisfies goal G,

by applying transmutations from the set T to input I and the background knowledge BK

Types of Inference

● P  B= C

● P premise

● B background knowledge

● C consequence

Multistrategy Reasoning

● No focus on learning as the main goal

● Multistrategy Learning (MSL) [Michalski]

-> induction as the central strategy

● Strategies

● Deduction

● Induction

● Abduction

● Abstraction

● Argumentation

● Probability

● Analogy

● Similarity

(2)

Multistrategy Reasoning

● Cooperation of strategies

● Learning

– Induction

● Knowledge expansion

– Deduction

● Efficiency

– Abstraction

● Noise handling

– Probability

– Abduction

– Argumentation

● Knowledge transfer

– Analogy

Inductive Logic Programming (ILP)

● Merger of 2 areas

● Logic Programming

– Representation Formalism

●

First-Order Logic

● Inductive Learning

– Learning of concept descriptions from examples

●

Inference from specific (observations) to general (hypotheses)

● Provides a formal framework and algorithms for inductive learning of relational descriptions expressed as logic programs

● Later extended to all kinds of learning in logics

Inductive Logic Programming

● Advantages

● Powerful representation formalism

– Same formalism for data and hypotheses

– High expressive power

– Human-readable!

● Possibility of expressing background domain knowledge

● Powerful reasoning mechanisms available

– Various kinds of reasoning in FOL

FOL Representations

● When?

● Problem cannot be effectively

represented in attribute-value (or propositional) representations

– Complex schemes

●

Variable number of objects in the examples

●

Relationships among objects are important

● Example

● Positive examples

●

● ● Negative examples

FOL Representations

● Attribute-value language

– Determine a fixed number of objects k

●

But number of objects may vary in real applications – Exponential number of equivalent representations of

examples and rules

●

IF includes(Configuration,Object) AND shape(Object,circle) THEN class(Configuration,positive)

– IF object1 = circle THEN Class = positive – IF object2 = circle THEN Class = positive – ...

– n-ary relationships might be represented extensionally

●

Each possible n-tuple -> One boolean feature – Each n-ary relationship generates k

ⁿ

features

●

in(circle1,square) = true

●

in(circle2,circle3) = true

●

in(circle1,circle2) = false

●

...

– Many more false than true values

FOL Representations

● Geometric shapes example

– Shape

●

shape/2 relationship

●

3 constants for shapes – Square – Circle – Triangle – Position

●

include/2 relationship

(3)

FOL Representations

● Another example

– Eastbound vs Westbound Trains (Michalski)

Learning from Interpretations

● Representation: Direct Relevance

● Example

– pos(obj1) :- contains(obj1,o1), contains(obj1,o2), triangle(o1), points(o1,down), circle(o2).

– pos(obj2) :- contains(obj2,o3), triangle(o3), points(o3,down).

– :- pos(obj3), contains(obj3,o4), circle(o4).

● Background

– polygon(X) :- triangle(X).

– polygon(X) :- square(X).

● Hypothesis ?

Learning from Interpretations

● Representation: Direct Relevance

● Example

– pos(obj1) :- contains(obj1,o1), contains(obj1,o2), triangle(o1), points(o1,down), circle(o2).

– pos(obj2) :- contains(obj2,o3), triangle(o3), points(o3,down).

– :- pos(obj3), contains(obj3,o4), circle(o4).

● Background

– polygon(X) :- triangle(X).

– polygon(X) :- square(X).

● Hypothesis

– pos(X) :- contains(X,Y), triangle(Y), points(Y,down).

Learning from Interpretations

● Representation = Direct Relevance

● Example = Interpretation (set of facts)

● Contains a complete description of the example

– CWA for the example

– All information intuitively belonging to the example represented in the example itself, not in the BK

● Background = domain knowledge

– BK referred to domain-related information, not related to specific examples

● Limitations

– Learning of relationships between examples

– Redundancy

– Recursive clauses

● Advantage: More efficiency

Learning from Entailment

● 1 example = 1 fact

● Examples

– pos(1).

– pos(2).

– :- pos(3).

● Background

– contains(1,o1). contains(1,o2).

contains(2,o3). contains(3,o4).

triangle(o1). triangle(o3).

points(o1,down). points(o3,down).

circle(o3). circle(o4).

● Hypothesis

– pos(X) :- contains(X,Y), triangle(Y), points(Y,down).

Propositional vs FOL Learning

● Representation

● Structure of the Search Space

● “More general than”

● Refinement operators (navigation in the space)

● Recursion

● In principle, any propositional learning algorithm may be lifted to FOL

– CN2 (sequential covering + top-down generation of rules)

●

FOIL (Quinlan, 1991) – C4.5 (decision tree induction)

●

Tilde (Blockeel & De Raedt, 1998)

– AQ (sequential covering + seed-guided top-down generation of rules)

●

Progol (Muggleton, 1995)

(4)

Dimensions of ILP

● Empirical ILP

– Learns from a predefined set of examples (batch)

●

+ : noise, effectiveness; – : determine examples

● Incremental ILP

– Process examples one by one

●

+ : example selection easy, efficiency; – : noise

● Interactive ILP

– Generates questions for an oracle (the user)

●

+ : user-friendly; – : efficiency, noise

● Multiple Predicate Learning

– Learns many predicates

●

+ : more powerful; – : more complex, less efficient

● Predicate Invention

– Extends the vocabulary with new predicates

●

+ : larger class; – : more complex

History in Short

●

Prehistory

– LGG, RLGG (Plotkin, 1970, 1971)

●

First enthusiasm

– Vere 1975, 1977; Summers 1977; Kodratoff e Jouanaud 1979; MIS [Shapiro 1981, 1983]

●

Dark ages (1983-86)

●

Renaissance (1987-96), ILP (1991-...)

– MARVIN [Sammut & Banerji 1986]; DUCE [Muggleton 1987];

Helft – 1987

– QuMAS [Mozetic 1987]; LINUS [Lavrac et al. 1991]

– CIGOL [Muggleton & Buntine 1988]

– ML-SMART [Bergadano, Giordana & Saitta 1988]

– CLINT [De Raedt & Bruynooghe 1988]

– BLIP, MOBAL – Morik, Wrobel et al. 1988, 1989 – GOLEM [Muggleton & Feng 1990]

– FOIL [Quinlan 1990], mFOIL [Dzeroski 1991], FOCL [Brunk &

Pazzani 1991], FFOIL [Quinlan 1996]

– CLAUDIEN [De Raedt & Bruynooghe 1993]; PROGOL [Muggleton et al. 1995]

●

RDM (1997-...)

– TILDE [Blockeel & De Raedt 1997[; WARMR [Dehaspe 1997]

●

MIDOS – Wrobel 1997

●

...

Induction in Logics

● Given a set of sentences representing our

“beliefs” about the world

● Consider a subset as data D, and the rest as background theory G

● Assume the data are not explained by the theory:

– G DD

● We say that F is an inductive conclusion

– G D DD< DF

● If and only if:

– G D DD DF The hypothesis is consistent with G and D

– G D D{F}F}= DD The hypothesis explains the data

Inductive Learning Paradigm

● Can be defined in logical terms

● Consider:

– LO : the language of observations

– LB : the language of background knowledge

– LH : the language of hypotheses

● The problem of induction becomes

– Given

●

O  LO A consistent set of examples or observations

●

B  LB A consistent background knowledge – Find a hypothesis H  LH

– Such that B  H - O

●

The background knowledge and the hypotheses logically prove the obsevations

Predictive ILP

● Objective: Inductive Learning

– Given

●

A set of observations O – Positive examples E+

– Negative examples E-

●

Background knowledge B

●

Language of hypotheses LH

●

Covering relationship – Find

●

A hypothesis H  LH such that, given B,

H covers all positive examples and no negative example –  e  E+ : B  H = e (completeness)

–  e  E– : B  H | e (consistency)

● In ILP, examples E = E+  E- are ground facts, B and H are (sets of) definite clauses

Predictive ILP

● Properties

● Completeness

– A hypothesis H is complete with respect to the

background knowledge B and the examples E if it covers all positive examples

● Consistency

– A hypothesis H is consistent with respect to the

background knowledge B and the examples E if it covers

no negative example

(5)

Inductive Logic Programming

● Example: Logic Programming

● Learning a definition for “member/2”

– Examples

●

member(a, [a,b,c]).

●

member(b, [a,b,c]).

●

member(3, [5,4,3,2,1]).

●

:- member(b, [1,2,3]).

●

:- member(3, [a,b,c]).

– Hypotheses

●

member(X, [X|Y]).

●

member(X, [Y|Z]) :- member(X,Z).

Inductive Logic Programming

● Example: Logic Programming (using background knowledge)

● Learning sort

– Input

●

E+ = { sort([2,1,3],[1,2,3]) }

●

E- = { sort([2,1],[1]), sort([3,1,2],[2,1,3]) }

●

B = definitions of permutation/2 and sorted/1 – Output

●

sort(X, Y) :- permutation(X,Y), sorted(Y).

Inductive Logic Programming

● Example (using background knowledge)

● Learning quicksort

– Examples

●

qsort([b,c,a], [a,b,c]).

●

qsort([], []).

●

qsort([5,3], [3,5]).

●

:- qsort([5,3], [5,3]).

●

:- qsort([1,3], [3]).

– Background knowledge

●

partition(L, A, B) :- ... .

●

append(A, B, C) :- ... .

– Hypotheses

●

qsort([], []).

●

qsort([X], [X]).

●

qsort(X, Y) :- partition(X, A, B), qsort(A, AS), qsort(B, BS), append(AS, BS, Y).

Inductive Logic Programming

● Example: Knowledge Discovery

● Learning family relationships

– Input

●

E+ = { daughter(mary, ann), daughter(eve,tom) }

●

E- = { daughter(tom,ann), daughter(eve,ann) }

●

B = { mother(ann,mary), mother(ann,tom), father(tom,eve), father(tom,ian), female(ann), female(mary), female(eve), male(pat), male(tom),

parent(X,Y) :- mother(X,Y), parent(X,Y) :- father(X,Y) } – Output

●

daughter(X,Y) :- female(X), parent(Y,X).

– or

●

daughter(X,Y) :- female(X), mother(X,Y).

●

daughter(X,Y) :- female(X), father(X,Y).

Inductive Logic Programming

● Example

● Learning constraints

– Examples

●

parent(jack, mary).

●

parent(mary, bob).

●

father(jack, mary).

●

mother(mary, bob).

●

male(jack).

●

male(bob).

●

female(mary).

– Constraints

●

:- male(X), female(X).

●

male(X) :- father(X, Y).

●

father(X,Y) ; mother(X,Y) :- parent(X,Y).

– Not a Horn Clause!

Inductive Concept Learning

● Given a

● Universe of objects (or observations) U

● A concept C may be formalized as a subset of objects in U

● C  U

– Example

●

U = set of all patients in a hospital

●

C  U = set of all patient suffering from a particular disease

● Learning a concept C

● Learning a way to be able to recognize objects in C [Bratko, 1989]

●

Being able to tell whether x  C for all x  U

(6)

Successful Applications

● Mutagenesis

● Some chemical compounds cause DNA mutations

– Active, Inactive

● Part of the dataset could not be captured by propositional techniques

● Application of ILP

– Obtained good accuracy (~80%)

– Suggested to biologists connections with known chemical sub-structures

Successful Applications

● Discovery of Pharmacophores in molecules

● 3D sub-structures responsible for their medicinal activity

● Molecules described by listing for each atom:

chemical element, 3D coordinates, ...

● Example

– Description of molecules

●

atm(m1,a1,o,2,3.430400,-3.116000,0.048900).

●

atm(m1,a2,c,2,6.033400,-1.776000,0.679500).

●

atm(m1,a3,o,2,7.026500,-2.042500,0.023200).

●

...

●

bond(m1,a2,a3,2).

●

bond(m1,a5,a6,1).

●

bond(m1,a2,a4,1).

●

bond(m1,a6,a7,du).

●

...

Successful Applications

● Discovery of Pharmacophores in molecules

● Example (cont.)

– Background knowledge

●

...

●

hacc(M,A):- atm(M,A,o,2,_,_,_).

●

hacc(M,A):- atm(M,A,o,3,_,_,_).

●

hacc(M,A):- atm(M,A,s,2,_,_,_).

●

hacc(M,A):- atm(M,A,n,ar,_,_,_).

●

zincsite(M,A):- atm(M,A,du,_,_,_,_).

●

hdonor(M,A) :- atm(M,A,h,_,_,_,_), not(carbon_bond(M,A)), !.

●

...

– Hypotesis

●

active(A) :- zincsite(A,B), hacc(A,C), hacc(A,D), hacc(A,E), dist(A,C,B,4.891,0.750), dist(A,C,D,3.753,0.750), dist(A,C,E,3.114,0.750), dist(A,D,B,8.475,0.750), dist(A,D,E,2.133,0.750), dist(A,E,B,7.899,0.750).

Classification Rules

● Example (InTheLEx)

● class_elsevier(A) :-

numero_pagine(A,[13,26]), pagina_1(A,B),

frame(B,C),

height_medium_small(C), pos_center(C),

pos_upper(C), tipo_misto(C), on_top(C,D), pos_center(D), pos_middle(D), tipo_testo(D),

larghezza_pagina(B,[544.0,612.0]), altezza_pagina(B,[743.0,842.0]).

Representation Language

● Quick Recall

● First-Order Language = set of formulas defined on an alphabet made up of different classes of symbols

– Variables, constants, functions, relationships

● Terms: denote objects

● Atoms: simple claims

● Literals (positive or negative)

– Compatible if same predicate symbol and sign

Representation Language

● Quick Recall

● Clauses

– Factor: a subset of a clause

●

Reducing a clause: finding a factor equivalent to the clause

●

Reduced clause: A clause that cannot be further reduced – Horn clause: at most one positive literal

●

1 -> Program definite clause

●

0 -> Goal

● Logic program (or definite program, or program):

set of definite horn clauses

● Datalog: no function symbols of arity >0

– Only constants

(7)

ILP as Search

● Concept Learning ~ search problem [Mitchell, 1982]

● ILP:

● Search space (hypothesis space)

determined by the language of logic programs L

– Clauses in the form T :- Q where

●

T is an atom p(X1 , . . . , Xn )

●

Q is a conjunction of literals L1 , . . . , Lm

● States in the search space = concept descriptions

● Aim: finding one or more states (hypotheses) that satisfy some criteria

ILP as Search

● An ILP algorithm can be described in terms of

● Search space structure

– Space of clauses

● Search strategy

– Uniform search

●

Depth-first, breadth-climbing, iterative deepening – Heuristic search

●

Best-first, hill-climbing, beam-search

● Search heuristics

– To drive search

– To terminate search (quality criterion)

ILP as Search

● Search in the hypothesis space

● States: hypotheses from LH

● Goal: finding a complete and consistent hypothesis

● Naive generate-and-test-algorithm:

● H  L _H :

if H complete and consistent THEN return H

– Computationally expensive

● Restricting the search

● Language bias

– Reduce the hypothesis space

● Search bias

– Reduce the search in the hypothesis space

Hypotheses Space

● Structure

● Based on a generality relationship

– G more general than S iff covers(S)  covers(G)

● Pruning of the search space

● If H does not cover an example e, then no specialization of H will cover e

– Used with positive examples

● If H covers an example e,

then all generalizations of H will cover e

– Used with negative examples

● Version Space

● Above properties define the space of possible solutions

Orderings and Ordered Sets

● Definitions

● Given

– a set of clauses S

– clauses C, D, E  S,

a binary relationship  defined on S is

– A quasi-ordering (and S is a quasi-ordered set) if it is

●

reflexive (C  C)

●

transitive (C  D  D  E  C  E)

– A partial ordering on S (and S is a partially ordered set) if it is a quasi-ordering and it is also

●

anti-symmetric (C  D  D  C  C = D)

Generalization Models

● The set of clauses can be ordered

● Generalization relationships

– Implication

– -subsumption

– 

OI -subsumption

(8)

Implication

● A formula is more general than another if whenever the former is true, the latter is also true

– Classical concept of generality used in logics

● Definition (Implication between clauses)

– Given two clauses C and D, C implies D (or C is a generalization under implication of D),

●

denoted C  D or C 

_I

D

iff any model for C is also a model for D ({C} |= D)

– C and D are equivalent under implication (C  D) iff C  D and D  C

● Theorem (Undecidability)

– There exists no procedure to decide whether D  C for any two clauses C and D

Implication

● Properties

● Implication is a quasi-ordering

– Given C, D, E clauses:

●

C  C (Reflexivity)

●

If C  D and D  E, then C  E (Transitivity)

● It is not a partial ordering

– Anti-symmetry does not hold

●

Two clauses may be equivalent under implication even if they are not renamings

●

Example

– C = p(X), p(Y) :- p(f(X)), p(f(f(Y))) – D = p(Z) :- p(f(f(Z)))

– C  D and D  C, so C  D but they are not renamings

Generalization Under Implication

● A clause C is a generalization under implication of a set of clauses S = {D ₁ , ..., D _n } iff

i = 1, ..., n : C  D _i

● A generalization under implication C of S is a least general generalization under implication (lgg _I ) of S iff for any C’ generalization under implication of S : C’  C

● The lgg _I of a finite set of clauses

– Is not computable

●

Implication is undecidable – Is not unique

●

Implication is a quasi-ordering

-subsumption

● Definition

● Given two clauses C, D

– C -subsumes D (or C is a generalization under -subsumption of D, D  _ C) iff there exists a substitution  such that C  D

– C is equivalent to D under -subsumption (C ~ _ D) iff

C  _ D and D  _ C

● Example

● Clauses

– C = :- p(f(Y)), p(X)

– D = q(Y) :- p(f(Y))

– E = q(a) :- p(f(a)), r(b)

● – C  _ C

– D  _ C,  = {f(Y)/X}

– E  _ D,  = {a/Y}

– E  _ C

● Properties

● Quasi-ordering

– Reflexive

– Transitive

● Not a partial ordering

– Not Anti-simmetric

●

Two clauses may be equivalent without being renamings

-subsumption

● Example

● Clauses

– C = :- p(X), p(Y)

– D = :- p(Z)

● – C  _ D,  = {Z/X}

– D  _ C,  = {X/Z, Y/Z}

– C, D not renamings

-subsumption

● Theorem (Decidability)

● The -subsumption relationship is decidable

– Given two clauses C, D there exists a procedure to decide whether C  _ D

● However, in general the -subsumption test between clauses is NP-complete

– In some cases, SLD-resolution is inefficient

– Example

●

C = h(X

1

) :-

p(X

₁

,X

₂

), p(X

₂

,X

₃

), ..., p(X

_n–1

,X

_n

), q(X

_n

).

●

D = h(c

₁

) :-

p(c

₁

,c

₁

), p(c

₁

,c

₂

), ..., p(c

₁

,c

_n

), p(c

₂

,c

₁

), p(c

₂

,c

₂

), ..., p(c

₂

,c

_n

), ..., p(c

n

,c

1

), p(c

n

,c

2

), ..., p(c

n

,c

n

).

(9)

-subsumption and Implication

● -subsumption is weaker than Implication

● Given two clauses C and D,

if C is not self-resolving and D is not a tautology, then C  D and C  _ D are equivalent

[Gottlob 1987]

– Self-resolving = can be resolved with a variant of itself

Generalization under -subsumption

● Algebraic properties

● Given the set of clauses C, the quasi-ordered set (C / ~,  _ ) is a lattice (C / ~,  _C ,  _C ) where:

– lub 

C = least general generalization

– glb 

C = most general specialization

● Such a lattice admits infinite unbound strictly ascending and descending chains

– Made up of sytactically different but equivalent clauses

– If encountered when searching for a generalization or specialization, would cause non-termination

Generalization under -subsumption

● Least General Generalization (lgg)

● [Plotkin, 1970]

● Terms t ₁ , t ₂

– lgg(t, t) = t

– lgg( f(s ₁ , ..., s _n ), f(t ₁ , ..., t _n ) ) = f( lgg(s ₁ , t ₁ ), ..., lgg(s _n , t _n ) )

– lgg( f(s

1 , ..., s

n ), g(t

1 , ..., t

m ) ) = V

●

where f  g and V is a variable – lgg(s, t) = V

●

where s  t and at least one between s and t is a variable

● Atoms a ₁ , a ₂

– lgg( p(s ₁ , ..., s _n ), p(t ₁ , ..., t _n ) ) = p( lgg(s ₁ , t ₁ ), ..., lgg(s _n , t _n ) )

– lgg( p(s ₁ , ..., s _m ), q(t ₁ , ..., t _n ) ) is undefined

Generalization under -subsumption

● Least General Generalization (lgg)

● Literals l ₁ , l ₂

– Positive literals

●

Use lgg(l

₁

, l

₂

) for atoms – Negative literals ( l ₁ = a

1 , l ₂ = a

2 )

●

lgg(l

₁

, l

₂

) = lgg(a

₁

, a

₂

) = lgg(a

₁

, a

₂

) – Literals with opposite sign

●

Undefined

● Clauses C ₁ = {h ₁ , ..., h _n }, C ₂ = {k ₁ , ..., k _m }

– lgg(C ₁ , C ₂ ) =

{ l _ij = lgg(h _i , k _j ) | h _i  C ₁ , k _j  C ₂  lgg(h _i , k _j ) is defined }

Generalization under -subsumption

● Least General Generalization (lgg)

● Examples

– Terms

●

lgg( [a, b, c], [a, c, d] ) = [a, X, Y]

●

lgg( f(a, a), f(b, b) ) = f( lgg(a, b), lgg(a, b) ) = f(V, V) – The same variable must be used for all occurrences of the

same lgg (see lgg(a, b) = V above) – Atoms & Literals

●

lgg( parent(ann, mary), parent(ann, tom) ) = parent(ann, X)

●

lgg( parent(ann, mary), parent(ann, tom) ) = undefined

●

lgg( parent(ann, X), daughter(mary, ann) ) = undefined – Clauses

●

C

₁

= daughter(mary, ann) :– female(mary), parent(ann, mary).

●

C

2

= daughter(eve, tom) :– female(eve), parent(tom, eve).

●

lgg(C

₁

, C

₂

) = daughter(X, Y) :– female(X), parent(Y, X).

– where X = lgg(mary, eve) and Y = lgg(ann, tom)

Object Identity (OI)

● Assumption

● In a clause, terms denoted by different symbols must be distinct

– Example. Under OI assumption, clause C = p(X) :– q(X, Y), q(Y, a).

is an abbreviation for clause

C _OI = p(X) :– q(X, Y), q(Y, a) || [X  Y], [X  a], [Y  a].

● Intrinsic in human reasoning

– Examples

●

“X and Y are brothers if they have a parent in common”

●

“A bicycle is an object made up of, among other things, a wheel

X and a wheel Y”

(10)

Object Identity and Clauses

● -subsumption under Object Identity ( OI -subsumption)

● Given two Datalog clauses C, D, D -subsumes C under Object Identity

– D  _OI -subsumes C, D  _OI C

iff $ substitution such that D _OI   C _OI

– Note

●

Clauses C = p(X, X)  p(X, Y)  p(Y, Z) and D = p(X, X) are equivalent under -subsumption

– D is a subset of C;

C may become a subset of D by substitution {X/Y, X/Z}

●

But C uses more components than D!

 OI -subsumption

● Example

● Consider two Datalog clauses

– C = p(X) :– q(X, a).

– D = p(X’) :– q(X’, Y’).

● By definition, D  _OI -subsumes C iff there exists a substitution  such that D _OI   C _OI

– C _OI = p(X) :– q(X, a) || X  a.

– D _OI = p(X’) :– q(X’, Y’) || X’  Y’.

● Taking  = {X/X’, a/Y’} we have D _OI   C _OI :

– D _OI  = { p(X’), q(X’, Y’), (X’  Y’) } =

= { p(X), q(X, a), (X  a) } = C _OI

 OI -subsumption

● Properties

●  _OI -subsumption is weaker than -subsumption

– C 

OI D  C  _ D

● Under OI only renamings can be equivalent

– Under -subsumption and implication two clauses may be equivalent without being renamings

● Redundancy test has computational complexity O(n ² )

 OI -subsumption

● (L

h , 

OI ) space of Datalog clauses ordered by

 OI -subsumption

● Properties

● Quasi-ordered set

– The generalization relationship induced by  _OI - subsumption is reflexive and transitive

– Not a lattice

●

As for -subsumption θ-subsumption

● Each element in the space is an equivalence class

Generalization under OI

● Example

● E1 :

●

blocks(obj1) :–

part_of(obj1, p1), part_of(obj1, p2), on(p1, p2), cube(p1), cube(p2), small(p1), big(p2), black(p1), stripes(p2).

● E2:

●

blocks(obj2) :–

part_of(obj2, p3), part_of(obj2, p4), on(p3, p4), cube(p3), cube(p4), small(p3), big(p4), black(p4), stripes(p3).

Generalization under OI

● Example

● Least general generalization under -subsumption

– G : blocks(X) :– part_of(X, X1), part_of(X, X2), part_of(X, X3), part_of(X, X4), on(X1, X2), cube(X1), cube(X2), cube(X3), cube(X4), small(X1), big(X2), black(X3), stripes(X4).

● Least general generalization under  _OI -subsumption

– G ₁ : blocks(X) :– part_of(X, X1), part_of(X, X2), on(X1, X2), cube(X1), cube(X2), small(X1), big(X2).

– G

2 : blocks(X) :– part_of(X, X1), part_of(X, X2), cube(X1), cube(X2), black(X3), stripes(X4).

● What about intuition?

(11)

Generalization under OI

● Example

● Portion of (H / ~,  _ )

– Lattice

● Portion of (H / ~,  _OI )

– Quasi-ordered set

Refinement Operators

● Mappings from L (set of clauses in the language) to 2 ^L

● Properties

● Ideality

– Local Finiteness

●

Computation terminates – Properness

●

No loops on the same refinement – Completeness

●

Can reach any refinement

● Optimality

– Single path to any refinement

Refinement Operators

● Ideal refinement operators do not exist under - subsumption

● Infinite unbound strictly ascending and descending chains

● Ideal refinement operators do exist under OI

● Equivalence as alphabetic variance prevents infinite unbound strictly ascending and descending chains

Generic ILP Algorithm

● a

ILP Systems

● Foil [Quinlan]

● Learns a single concept h(X)

● To learn a clause: hill-climbing search

– Start with the most general clause h(X) :– true

– Repeat

●

Add best literal to clause

– May be a unification: X = c or X = Y

● Many clauses can be learned using the same procedure

– Remove examples covered by a clause before learning the next one

ILP Systems

● Golem [Muggleton & Feng]

● Based on rlgg operator

– Relative least general generalization

● To induce a clause: bottom-up search

– Given two positive examples

●

Find their rlgg

●

Repeat

– Generalize it using another example

●

Until quality of clause not improved

● Much dependent on choice of examples

● Possibility of trying several pairs of examples

(12)

ILP Systems

● Progol [Muggleton]

● Top-down approach, but with choice of seed

● To find a clause

– Start with a positive example e

– Generate search space He containing only hypotheses that cover e

●

First generate the most specific clause c that covers e

●

He includes all clauses more general than c

– Carry out an exhaustive top-down search in He looking for the clause that maximizes compaction

●

Compaction = size(covered examples) – size(clause) – In principle, does not allow covering negative examples

●

May be relaxed to deal with noise

– Repeat clause search process until a better one is found

● Currently implemented in Aleph [Srinivasan]

ILP Systems

● Claudien [De Raedt & Bruynooghe]

● “Clausal Discovery Engine”

– Discovers patterns in the data

– Each pattern is a clause

●

Example – Language bias

●

{parent(X,Y), father(X,Y), mother(X,Y)} :–

{ parent(X,Y), father(X,Y), mother(X,Y), male(X), male(Y), female(X), female(Y) } – Clauses discovered

●

parent(X,Y) :– father(X,Y).

●

parent(X,Y) :– mother(X,Y).

●

:– father(X,Y), mother(X,Y).

●

:– male(X), female(X).

●

mother(X,Y) :– parent(X,Y), female(X).

●

...

● Exhaustive top-down search

ILP Systems

● Warmr [Dehaspe]

● Induces first-order association rules

● APRIORI-like algorithm

● Looks for frequent patterns

– “Frequent itemsets” in APRIORI

– Pattern = conjunction of literals

– Search space ordered by -subsumption

● Builds association rules from the patterns

– IF this pattern occurs, THEN that pattern occurs too

ILP Systems

● Decision Tree Induction in ILP

● Induce first-order logic decision trees (FOLDTs)

– Test in a node = first-order literal

– Many nodes may share variables

● Systems

– S-CART [Kramer 1996]

●

Extension of CART

– Tilde [Blockeel & De Raedt 1998]

●

Extension of C4.5

InTheLEx

● Incremental Theory Learner from Examples

● Fully & Inherently incremental

● Multi-conceptual

● Closed-loop

● Hierarchical theories

● Positive and Negative examples

● Total storage

● Multi-strategy

● Based on Object Identity

InTheLEx

● Architecture

(13)

Arch Problem

● Learn a definition for “arch” in terms of supports and

“touch” relazionship

– [Patrick Winston]

/ \ > arch(A, B, C) :- _ / \ > brick(B), | 11 | | 31 | / 41 \ > left_of(B, C), ~~ ~ ~~~~~ > brick(C), |~| |~| |~| |~| | | | |~| |~| > supports(B, A), 12| |13 22| |23 32|33 42| |43 > supports(C, A), | | | | |_|___|_| | | | | | | | > not(touches(B,C)).

| | | | | 21 | | | | | | | | ~ ~ ~~~~~~~ ~ ~ ~ ~ / \ __ _ _____ / \ | 51 | | 61 | | 71 | / 81 \ ~~ ~ ~ ~~~

|~| |~| |~| |~| |~| |~| |~| |~|

52| |53 62| |63 72| |73 82| |83 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ~ ~ ~ ~ ~ ~ ~ ~

Arch Problem

● Execution

---

> Theory file does not exist. Creating and initializing it!

>

> THEORY TUNING

--- Arch 1 arch(b11,b12,b13) :-

______ brick(b11), | 11 | brick(b12), ~~~~~~ brick(b13), |~| |~| left_of(b12,b13), 12| |13 supports(b12,b11), | | | | supports(b13,b11).

| | | | ~ ~

---

> Processing Example No. 1 arch(b11,b12,b13): generalizing theory

> Theory generalized (added new clause)

>

> arch(X,Y,Z) :-

> brick(X),

> brick(Y),

> left_of(Y,Z),

> brick(Z),

> supports(Y,X),

> supports(Z,X).

---

Arch Problem

● ...

Arch 2

not(arch(b22,b23,b21)) :- |~| |~| brick(b22), 22| |23 brick(b23), |_|___|_| brick(b21), | 21 | left_of(b23,b21).

~~~~~~~

---

> Processing Example No. 2 not(arch(b22,b23,b24)): Theory is correct ! ---

Arch Problem

● ...

Arch 3 not(arch(b31,b32,b33)):- _______ brick(b31), | 31 | brick(b32), ~~~~~~~ brick(b33), | | | left_of(b32,b33), 32|33 supports(b32,b31), | | | supports(b33,b31), | | | touches(b32,b33), ~ ~ touches(b33,b32).

---

> Processing Example No. 3 not(arch(b31,b32,b33)): specializing theory

>

> Top-concept clause specialized (added negative literal)

>

> arch(X,Y,Z) :-

> brick(X),

> brick(Y),

> left_of(Y,Z),

> brick(Z),

> supports(Y,X),

> supports(Z,X),

> not(touches(Y,Z)).

---

Arch Problem

● ...

Arch 4 arch(b41,b42,b43):- / \ brick(b42), / \ brick(b43), / 41 \ left_of(b42,b43), ~~~~~~~ supports(b42,b41), |~| |~| supports(b43,b41), 42| |43 wedge(b41).

| | | | | | | | ~ ~

---

> Processing Example No. 4 arch(b41,b42,b43): generalizing theory .

> CLAUSE LENGTH: 7

> BOUNDS: 1.0e-003---1.0

> LGG_LITS: 0.0---7.0

>

> Theory generalized (lgg of a clause)

>

> arch(A, B, C) :-

> brick(B),

> left_of(B, C),

> brick(C),

> supports(B, A),

> supports(C, A),

> not(touches(B,C)).

---

Arch Problem

● ...

Arch 5 arch(b51,b52,b53):- ______ brick(b51), | 51 | brick(b52), ~~~~~~ brick(b53), |~| |~| left_of(b52,b53), 52| |53 supports(b52,b51), | | | | supports(b53,b51).

| | | | ~ ~

---

> Processing Example No. 5 arch(b51,b52,b53): Theory is correct ! --- Arch 6 not(arch(b61,b62,b63)):-

_______ brick(b61), | 61 | brick(b62), ~~~~~~~ brick(b63), |~| |~| left_of(b62,b63), 62| |63 supports(b63,b61).

| | | | | | | |

~ ~

---

> Processing Example No. 6 not(arch(b61,b62,b63)): Theory is correct !

---

(14)

Arch Problem

● ...

Arch 7 not(arch(b71,b72,b73)):- _______ brick(b71),

| 71 | brick(b72), ~~~~~~~ brick(b73), | | |~| left_of(b72,b73), 72| |73 supports(b72,b71).

| | | | | | | | ~ ~

---

> Processing Example No. 7 not(arch(b71,b72,b73)): Theory is correct ! --- Arch 8 arch(b81,b82,b83):-

/ \ brick(b82), / \ brick(b83), / 81 \ left_of(b82,b83), ~~~~~~~ supports(b82,b81), |~| |~| supports(b83,b81), 82| |83 wedge(b81).

| | | | | | | | ~ ~

---

> Processing Example No. 8 arch(b81,b82,b83): Theory is correct !

> End of processing

InTheLEx

● Induction

– Positive Example

●

bicycle(b1) :- has_wheel(b1,w11), has_wheel(b1,w12), has_frame(b1,f1), has_fork(b1,r1), has_pedals(b1,p1), blue(f1), mounted_on(w11,f1), mounted_on(w12,r1), mounted_on(p1,f1).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), has_pedals(X,V), blue(W), mounted_on(Y,W), mounted_on(Z,U), mounted_on(V,W).

– Positive Example

●

bicycle(b2) :- has_wheel(b2,w21), has_wheel(b2,w22), has_frame(b2,f2), has_fork(b2,r2), has_pedals(b2,p2), red(f2), mounted_on(w21,f2), mounted_on(w22,r2), mounted_on(p2,f2).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), mounted_on(Y,W), mounted_on(Z,U).

InTheLEx

● Induction

– Positive Example

●

bicycle(b1) :- has_wheel(b1,w11), has_wheel(b1,w12), has_frame(b1,f1), has_fork(b1,r1), has_pedals(b1,p1), blue(f1), mounted_on(w11,f1), mounted_on(w12,r1), mounted_on(p1,f1).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), has_pedals(X,V), blue(W), mounted_on(Y,W), mounted_on(Z,U), mounted_on(V,W).

– Positive Example

●

bicycle(b2) :- has_wheel(b2,w21), has_wheel(b2,w22), has_frame(b2,f2), has_fork(b2,r2), has_pedals(b2,p2), red(f2), mounted_on(w21,f2), mounted_on(w22,r2), mounted_on(p2,f2).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), mounted_on(Y,W), mounted_on(Z,U).

InTheLEx

● Induction

– Negative example

●

¬bicycle(m3) :- has_wheel(m3,w31), has_wheel(m3,w32), has_frame(m3,f3), has_fork(m3,r3), black(f3),

mounted_on(w31,f3), mounted_on(w32,r3), has_engine(m3), mounted_on(m3,f3).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), has_pedals(X,V), mounted_on(Y,W), mounted_on(Z,U), mounted_on(V,W).

– Negative example

●

¬bicycle(m4) :- has_wheel(m4,w41), has_wheel(m4,w42), has_frame(m4,f4), has_fork(m4,r4), has_pedals(m4,p4), white(f4), mounted_on(w41,f4), mounted_on(w42,r4), mounted_on(p4,f4), has_engine(m4), mounted_on(e4,f4).

– Rule

●

bicycle(X) :- has_wheel(X,Y), has_wheel(X,Z), has_frame(X,W), has_fork(X,U), has_pedals(X,V), mounted_on(Y,W), mounted_on(Z,U), mounted_on(V,W), ¬has_engine(X).

InTheLEx

● Deduction

– Example

●

bicycle(b) :- has_saddle(b,s), has_pedals(b,p), has_frame(b,f), has_fork(b,k), part_of(b,w1), circular(w1), has_rim(w1), has_tire(w1), part_of(b,w2), circular(w2), has_rim(w2), has_tire(w2).

– Background Knowledge

●

wheel(X) :- circular(X), has_rim(X), has_tire(X).

– “Saturated” example

●

bicycle(b) :- has_saddle(b,s), has_pedals(b,p), has_frame(b,f), has_fork(b,k), part_of(b,w1), circular(w1), has_rim(w1), has_tire(w1), part_of(b,w2), circular(w2), has_rim(w2), has_tire(w2), wheel(w1), wheel(w2).

InTheLEx

● Abduction

– Rule

●

bicycle(X) :- has_saddle(X,T), has_wheel(X,Y), circular(Y), has_wheel(X,Z), circular(Z), has_frame(X,W), has_fork(X,U), has_pedals(X,V).

– Example

●

bicycle(b) :- has_saddle(b,s), has_frame(b,f), has_fork(b,k), part_of(b,w1), wheel(w1), part_of(b,w2), wheel(w2), circular(w2).

– Abductive Theory

●

:- circular(X), square(X).

●

:- has_saddle(X,Y,Z), has_pedals(X,W), has_engine(X).

– Abductive Hypotheses

●

has_pedals(b,s), circular(w1), ¬has_engine(b).

(15)

InTheLEx

● Abstraction

– Example

●

bicycle(b) :- has_saddle(b,s), has_pedals(b,p), has_frame(b,f), part_of(b,w1), circular(w1), size (w1,26), has_rim(w1), has_tire(w1), part_of(b,w2), circular(w2), size (w2,26), has_rim(w2), has_tire(w2).

– Abstraction Theory

●

wheel(X) :- circular(X), has_rim(X,C), has_tire(X,T).

●

small(X) :- size(X,Y), Y < 20.

●

medium(X) :- size(X,Y), Y >= 20, Y =< 26.

●

large(X) :- size(X,Y), Y > 26.

– “Abstract” example

●

bicycle(b) :- has_saddle(b,s), has_pedals(b,p), has_frame(b,f), part_of(b,w1), part_of(b,w2), wheel(w1), medium(w1), wheel(w2), medium(w2).

References

● Carnap R., Jeffrey R.C.

– “Studies in Inductive logic and Probability” Berkley, University of California Press, 1971

● Plotkin G.D.

– “A note on inductive generalization”, Machine Intelligence ( Meltzer & Michie eds.), Edinburgh University Press, 1970

– “A further note on inductive generalization”, Machine Intelligence (Meltzer & Michie eds.), Edinburgh University Press, 1971

● Mitchell, T.M.

– “Generalization as Search”, Artificial Intelligence, 1982

● Vere, S.A.

– “Induction of concepts in predicate calculus”, 4th IJCAI, Tbilisi, USSR, 281-287, 1975

– E.Y. Shapiro: “Algorithmic program debugging”, MIT Press (1983)

– L. De Raedt

●

“Interactive Theory Revision: An Inductive Logic Programming Approach”, Academic Press (1992)

●

(ed.) “Advances in Inductive Logic Programming”, IOS Press (1996)

– S. Muggleton (ed.): “Inductive Logic Programming”, Academic Press (1992)

– N. Lavrac, S. Dzeroski: “Inductive Logic Programming:

Techniques and Applications”, Ellis Horwood (1994)

– F. Bergadano, D. Gunetti: “Inductive Logic Programming:

From Machine Learning to Knowledge Engineering”, MIT Press (1995)

– S.H. Nienhuys-Cheng, R. de Wolf: “Foundations of Inductive Logic Programming”, Spinger LNAI (1997)

– S. Dzeroski, N. Lavrac (eds.): “Relational Data Mining”,

Springer (2001)