• Non ci sono risultati.

Laurea in INFORMATICA Laurea in INFORMATICA MAGISTRALE MAGISTRALE

N/A
N/A
Protected

Academic year: 2021

Condividi "Laurea in INFORMATICA Laurea in INFORMATICA MAGISTRALE MAGISTRALE"

Copied!
17
0
0

Testo completo

(1)

1

Machine Learning / 1 Machine Learning / 1

Techniques for Automatic Knowledge Acquisition – Techniques for Automatic Knowledge Acquisition – Concept formation: ordering the hypotheses space for Concept formation: ordering the hypotheses space for search – Candidate elimination algorithm and Version search – Candidate elimination algorithm and Version

Space Space

Laurea in INFORMATICA Laurea in INFORMATICA

MAGISTRALE MAGISTRALE

Corso di

ARTIFICIAL INTELLIGENCE Stefano Ferilli

Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.

Knowledge Acquisition

Case 1

Expert interacts with knowledge engineer

KB System Expert Knowledge

Engineer

Inference Engine

Knowledge Base

Knowledge Acquisition

Case 2

Expert interacts directly with system

– Use of an editing program with dialogue capabilities and knowledge about the structure of the knowledge base

KB System Expert “intelligent” editing

program

Inference Engine

Knowledge Base

Knowledge Acquisition Bottleneck

Problem in Decision Support Systems

Experts not willing to elicit their knowledge

Experts unable to formalize their knowledge

Experts ignore details of their knowledge

Solution: Machine Learning (ML)

Learning

– Knowledge Acquisition

Machine

– Carried out by computers

Knowledge Acquisition

Case 3

Expert replaced by a program that induces from big quantities of data

– Expert builds knowledge by analyzing world facts and typical “cases” in the literature

KB System Data Induction

program

Inference Engine

Knowledge Base

Knowledge Acquisition

Case 3

Expert

– Builds knowledge by analyzing world facts and typical

“cases” in the literature

replaced by a program that induces from big quantities of data

– Developed in Machine Learning research areas

– May find, using statistical routines and conceptual analysis

relationships among data,

regularities between data

rules

Classification

Prediction

...

(2)

Knowledge Acquisition

Case 4

Knowledge acquisition directly from textbooks

– Use of programs for understanding texts in natural language

KB System Textbooks Text understanding

program

Inference Engine

Knowledge Base

Machine Learning

Motivations

Knowledge Acquisition Bottleneck

– More comfortable, cheaper (and sometimes reliable) to acquire knowledge automatically instead of encoding it manually

Advances in the area of algorithms and theoretical and experimental models

– More powerful machines

Relevant industrial applications

Increasing availability of on-line data

– Wikipedia, Wiktionary, Flickr, Twitter, &c.

– Blogs

– Social Networks

– ...

Learning

Extraction of deep knowledge

Not superficial

from a kind of situation occurred many times

Experience may modify the status of an organism in such a way that the new status makes the system capable of working better in a next situation

One of the features of intelligent systems

Adaptivity

Learning

Classical view

A system

– Natural or artificial

learns if, using its experience, changes something in its behavior/operation so as to improve its own performance in

– Solving a problem

– Attaining a goal

– Carrying out a task

Learning

Reference cognitive theories

Comportamentism

– Mental processes based on reactions to external stimuli

– Impossible building a theory

Cognitivism

– Interaction of mind and brain

– Study of psychological aspects of learning mechanisms

Constructivism

– Dynamic mental processes which create new knowledge

Explain learning mechanisms

Machine Learning

Reproduction on a computer of the intelligent learning capabilities

Emulation

Simulation

“That may be exactly what’s needed for anybody who wants to go into this field [AI], namely, blind optimism with no reasonable basis for it.”

Raj Reddy (Turing prize 1994)

(3)

Machine Learning

A program

Learns from experience E

With respect to a given set of tasks T

And with a performance measure P if

Its performance on tasks T

Measured through P

Improves with experience E

Any learning program must identify and define

The class of tasks

The performance measure to improve

The source of experience

Machine Learning

Successful applications

Word recognition

– Lee, 1989; Waibel, 1989

Autonomous drive

– Pomerlau, 1989

Classification of new astronomic structures

– Fayyad et al., 1995

World-class Backgammon

– Tesauro, 1992; 1995

...

Go game

– Deepmind, 2016

Learning Agent

Experience gained through the agent’s

perceptions used to improve the system’s ability to act in the future

Experience  new status  performance

Learning takes place

As a result of the interaction between agent and world

From the agent’s self-observation of the way in which it takes decisions

Learning Agent

Components

Learning Element

– Learns and improves behavior

Performance Element

– The agent itself knows what to do and can evaluate and improve what it does

Problem Generator

– Suggests alternative actions to explore and carry out

Critical evaluation element

– Provides feedback about how the agent is behaving

Model of Machine Learning System in AI

[Simon, 1981]

Environment provides information to the Learning element, that uses it to improve the base of explicit knowledge

Knowledge base expresses explicit knowledge in declarative form

Indeed, it is separate from the learning module

Performance element uses the knowledge to carry out at best its task

Additionally, the information acquired in the various attempts to carry out the task is used as feedback to the learning element

Environment Learning

element Knowledge

Base

Element for improving performance

Environment

Typical assumption:

The system

– Is directly immersed in the environment from which, through its sensors, receives inputs, in the most appropriate form

Representation language, grain size

or

– Interacts through an expert that plays the role of the system’s trainer and chooses what and how to represent inputs

Flow

Environment

Feature Extraction

Space of Features

 Learning System

Rules or classification

models, Theories,

Relationships,

Analogies, &c.

(4)

Knowledge Base

IA considers explicit forms of knowlege

Production rules, logistic clauses, etc.

The various forms of knowledge representation must be evaluated based on

Expressiveness

Possibility of making inferences

Modifiability

Extensibility

Power, Understandability, Etc.

Update of knowledge guaranteeing conflicts are avoided

Possibility of evaluating and changing a representation by adding new terms and operators

Performance Evaluation

Parameters

Complexity

– Related to task

Classification or prediction

Single concept vs multi-concept

Conjunctive or disjunctive concepts

Polymorphism

...

– Related to integration

Possibility of adding / integrating new rules in a set of existing rules

– Related to incrementality

Possibility of adding new observations or new classes / concepts during learning

Performance Evaluation

Parameters

Feedback

– A learning system must be able to evaluate the hypotheses proposed by the learning element

E.g., in classification systems using the percentage of correct classification by exploiting a set of training examples and a set of testing examples

In other cases the utility / goodness / significance of the produced classifications can be evaluated

Transparency

– Possibility of the learning element to access internal actions of the performance element

Useful to

Understand/evaluate/weigh differently the rules/theories/models in the knowledge base

Compare alternate paths that might in the long term turn out to be better

Learning Programs

Tasks

Categorization

Modeling

Clustering

Classification

Prediction

Aims of Learning

In most cases the task is classification/prediction

Supervised learning problem

– Given from the trainer a set of training data as positive and negative examples of a given concept/class/category (belonging or not belonging)

– Find a classifier to predict wheter future data belong or not to a given concept/class/category

Unsupervised learning problem

– Given from the environment or a trainer a set of observations (training data) and a number K of concept/class/category to identify

– Find a classifier that defines how observations best group into the K concept/class/category and formulates the model to predict how future observations belong or not to a given concept/class/category

Machine Learning

Tasks, Goals, Systems

Classification and Prediction

– Goal: improve predictive accuracy of the system

Performance = recognition rate

One-step systems

Problem solving

– Goal: increase efficiency of the system that learns to solve problems

Multi-step inference systems

Knowledge discovery

(knowledge acquisition and concept formation)

– Goal: increase performance in understanding, controlling

and monitoring the environment

(5)

Machine Learning

What do machines learn?

For Classification and Prediction tasks:

– Mathematical models

Neural networks

Genetic algorithms

Bayesian learning – Logic models

Production rules

Decision trees

Logic theories

For classification, prediction, etc.

Machine Learning

Classic model

Machine Learning algorithms

– Autonomously discover relationships among variables of the system

Input, output, hidden

starting from observations of the system

– Inspired by several disciplines

Inferential statistics

Mathematics

Physics

Neurosciences

Theoretical computer science

...

Machine Learning

Classic model

Task: classification

System

… …

x 1

x 2

x

N

y 1

y 2

y

M

1 , ,..., 2

K

h h h

x x 1 , ,..., 2 x N

x

h h 1 , ,..., 2 h K

h

y y 1 , ,..., 2 y K

y

Input : Hidden Variables:

Output :

Performance evaluator

Feedback

Trainable Pattern Classifiers

Traditional view of learning from examples

Basic model : a device with d input lines and an output line representing the class to which the pattern belongs (a signal that may assume R distinct values)

– Performance measure provides feedback on the learning function by adapting its parameters to improve its performance in time

Modeled as a black box that computes a classification function

– The trainer trains the system by providing pre-classified data (examples)

– The system can infer the model (mathematical function) and make predictions on new data

Trainable Pattern Classifiers

Some examples

Task Input data Output

Weather forecast Weather measures Forecasts

Recognition of handwritten characters Optical signals Name of character

Medical diagnosis Symptoms Disease

Speech recognition Acoustic waveforms Name of the word

Speaker recognition Acoustic waveforms Name of the speaker

Image classification Optical signals Image category

x1 x2

xd

i0 = 1 or 2 or 3 or… R

[N.J. Nilsson, “Mathematical Foundations of Learning Machines”, (1 ed. 1965), new ed. 1990, Morgan Kaufmann]

Pattern (Data to classify)

Outcome (classification)

Machine Learning of Mathematical Models from Observations

(UNsupervised)

Black box model also applicable to systems that learn in an unsupervised way

To infer the classification model

The system starts from unclassified data

The trainer must provide the number of classes and define utility

The system can identify the best partitioning of the

space of data, determine a mathematical function

for classification and subsequently classify new

observations

(6)

Machine Learning

Some methods and techniques

Neural Modelling (1955-1965) (1980-...)

– Gradual change of probabilities that neuron-type elements transmit a signal

– Based on mathematical-statistical models

Techniques from Decision Theory (1955-1965)

– Estimation and updating of parameters in a classification and prediction model, starting from a set of training examples

Conceptual learning (1962-1983)

– Use of logics and more complex forms of knowledge representation to obtain high-level descriptions of concepts

Intensive knowledge learning (1976-...)

– Use of “deep” knowledge. Exploration of different learning strategies

– Learning with integration of different strategies

Machine Learning in Artificial Intelligence

Learning aims at

Acquiring knowledge in declarative form

Developing “abilities” through education and practice

Organizing knowledge in more effective and efficient representations

Discovering new facts/theories through observation and experimentation

Lessons Learned thanks to Artificial Intelligence

[Charniak E., McDermott D. “Artificial Intelligence”, Addison Wesley, 1984]

An organism may learn if it already owns much knowledge

Learning starts with organized knowledge that grows and becomes more and more organized

Nothing can be learned unless strong hints about what one is to learn are already available

Induction

The most powerful inference to automatically acquire knowledge

Goal: formulate general laws of the world starting from partial observations of the world itself

What do machines learn?

– Typical situation

Learning concept descriptions from examples

Learning classification functions from examples

– Or

Generating rules that make accurate predictions on future observations

Inductive Method

Given a set F of facts, and, possibly, a set of inference rules having general validity

Find an inductive hypothesis that is true on all facts F and is true in the future on all and only those facts that are similar to those in F

Example

– Facts

Socrates, Aristotle and Plato were philosophers

Socrates, Aristotle and Plato all died – Inference rules

Philosophers are men – Possible hypotheses

All philosophers will die

All men will die

Inductive Method

Used both in numerical/statistical methods and in symbolic/conceptual methods

– Stages

Self-organizing systems

Change to adapt to the environment (Rosenblatt, 1957;

Friedbeg, 1958; Fogel, Owens & Walsh, 1966)

A famous paper by Minsky & Papert (1969) highlighted their limits. Research on adaptive systems became a branch of research on linear systems theory

Knowledge-based systems

To improve their performance, they must own specific knowledge of the domain in which they operate (Winston, 1970; Buchanan & Mitchell, 1978; Lenat, 1976)

Systems for automatic knowledge acquisition

Research focuses on how to use several forms of learning to build, refine and maintain knowledge bases in expert systems

(see cognitive theories)

(7)

Deduction vs Induction

Deductive Arguments

If all premises are true, the conclusion is necessarily true

All information that is implicit in the

conclusion is already included in the premises

Inductive Arguments

If all premises are true, the conclusion is probably true

The conclusion includes information that is not included, not even implicitly, in the premises

Facts Observations Measures Symptoms

Theories Hypoteses Rules Algorithms Deductive Inference

Inductive Inference

Stages of Learning

Training

Initial learning process

– Use of a large (or significant) number of typical instances, whose classification is known

Training set

Testing

Validation of the learned classification functions

– Use of additional instances whose classification is known

Test set

Tuning

Further adaptation (in incremental approaches)

– Use of further instances whose classification is known, that become available after a model was learned

Partitioning of Examples

Examples must be suitably split in different sets

Training set

– on which learning takes place

Test set

– including examples not included in the training set, on which checking what was learned

Validation set

– Similar role as the test set, but aimed at searching the best parameters for the learning algorithm

– Used when learning depends on empirical parameters

Tuning set

– Used in incremental approaches

Example-driven

Inductive Concept Learning

Model

– Given

A language for describing examples

A language for describing concepts

Intensional class descriptions

A set of positive and negative examples of the concept to learn

A function that checks if an example belongs to a concept (matching or coverage)

A background knowledge (that includes a preference criterion for hypotheses, known constraints among variables, generalization and specialization rules)

– Determine

A hypothesis in the hypotheses space that is consistent and complete with respect to the examples and fulfils the constraints set on the background knowledge

Matching

Inductive Inference requires the use of a matching procedure that may be based on a covering relation between a generalized description j and an instance f, of the kind

j covers f iff j(f) = true

Computationally feasible problem at the propositional level

May be a NP-complete problem in first-order

Terminology

A hypothesis h covers a positive instance x if it correctly classifies that example as

(h(x) = c(x)) = True

An example satisfies a hypothesis if it is compliant with the conditions

Independently of its being labeled as a positive or negative example for the target concept

A hypothesis is consistent with

an example <x ,c(x)>, if h(x)=c(x)

the training set D, if it is consistent with each

example xD

(8)

Supervised Learning Requirements/Properties

Correctness

Completeness

– All positive examples for a class

are covered

I.e, they MUST satisfy the class description

Consistency

– No negative example for a class

is covered

I.e., satisfies the class description

Supervised Learning

Need for a trade-off between simplicity and predictiveness

Disjunctive concepts

– Polymorphism or overfitting?

+ + + + + + + + + + + + + + + +

-- - - - - - - - - - - - - -

- -

-- - -

-- --- -- - --- - + + + + ++ + + + +

Learning Process Fundamental Elements

How to represent

Instances-facts (input)

Hypotheses-concepts (output)

Choose a hypothesis

Define a criterion to order them

Define good search procedures in the knowledge/hypotheses space

Define effective and efficient matching procedures

Representation

The role of the language used to describe facts and hypotheses is relevant

Questions to devise a suitable description

Is a set of object attributes (feature vectors,

attribute-value pairs) sufficient for the learning task?

Are there relationships among objects?

How many and what attributes and/or relationships is it necessary and sufficient to include in the representation?

Can the inductive assertion (concept) be expressed as the value of an attribute, as a predicate or as the argument of a predicate?

Representation

Input

0-order logic languages

– Feature vectors

(v

1

, v

2

, ..., v

n

) [shape,color,size,area] -> [tri,med,red,0.75]

– (Attribute-Value) pairs

(shape = triangle)  (size = medium)  (color = red)  (area = 0.75)

Some problems require more sophisticated formalisms

– Not just attributes, but also relationships between objects

– It is convenient to use more powerful notations that allow to describe the objects with their properties and to express relationships between objects

Sets of relational literals

Representation

Output

To represent the acquired knowledge, many

“symbolic” learning systems, often defined

“conceptual”, express output as conjunctive concepts

– These systems usually make categorizations and, consequently, predictions

Concept =

Intensional definition of a class (category)

– Represented extensionally by all instances (observations)

in that class

(9)

Description Languages

Unstructured domains

Input

– Instance vector or <attribute,value> pairs

Example

– e

1

= <color,red>, <shape,square>,

<size,small>, <side_length,2>, <texture,diagonal>

– NO NEED TO EXPRESS RELATIONSHIPS BETWEEN OBJECTS AND CONCEPTS

e

1

Description Languages

Unstructured domains

Output

– Hypotheses/concepts

Propositional calculus sufficient to represent descriptions that are combinations of values of the attributes

– Example

The concept underlying many geometrical shapes in red color

can be expressed as

j = [color = red]  [shape = square triangle]

In conceptual learning, some representations for the outputs, can be reduced to these ones

– Decision trees

– Production rules

Description Languages

Structured domains

Input

– Instance parts, attributes, relationships

– Example

e

1

= square(a)  triangle(b)  large(a)  small(b)  white(a)  red(b)  ontable(a)  on(b,a) – NEED TO EXPRESS RELATIONSHIPS AMONG

OBJECTS

e

1

b a

Description Languages

Structured domains

Output

– Hypotheses

1st order calculus needed to express descriptions that are combinations of attribute values and relationships among parts – Example

j = red(Y)  triangle(Y)  square(X)  on(Y,X)

A first example of logic language used in learning is VS [Michalski, 1980]

– A first-order language with multi-valued logics, in variable number

Number of Hypotheses Consistent with a Given Number of Examples

Consider

A universe including m objects

A training set of m objects

k classes

– Concepts identify classes

The number of concepts consistent with the training set is k m–m

Examples

– Chess end-game

1.8 million positions

2 classes (win, lose) in 2 moves

2

900K

= 10

270,000

possible hypotheses

– Using reduced languages

39 binary attributes per task

551 distinct descriptions

2 classes

Again 2

280

= 10

84

consistent hypotheses...

Inductive Learning as Search

Learning = goal-driven process to improve the knowledge of the learning system, by exploring the experience and a-priori knowledge of the system

– May be cast as a search in the space of knowledge and inductive hypotheses

– May exploit any kind of inference

– Must take into account the system’s background knowledge and terminates after several “learning loops” if the new knowledge satisfies the learning goal

Learning = Inference + Storage

Input Inferential mechanism

Goals

Background Knowledge Memory

New

knowledge

(10)

Inductive Learning as Search

Learning can be cast as a problem of Heuristic search in the knowledge space

Search space = Hypotheses space

Operators = Generalization rules + abduction + abstraction + ...

Final state = A hypothesis that is consistent and complete with respect to

the available observations

Inductive Learning as Search

Ordering hypotheses

“More general than...” criterion

– Extensional case

Given j and  hypotheses

 is more general than j j<  (j implies )

Example

<color = red> is more general than

<color = red>  <shape = square>

– Intensional case

General Specific

Example

G

2

= { (?, ?, circle) (large, ?, ?) }

G

1

= { (large, red, circle) (large, ?, ?) }

G

3

= { (?, ?, circle) (large, blu, ?) }

 j

G

2

G

3

G

1

Ordering Hypotheses

Intensional case with first-order languages

Implication

– j<   j  

–  more general than j if and only if j implies 

q-subsumption (Plotkin)

– j = red(a)  small(a)  on(b,a)

–  = red(X)  on(Y,X)

– q = {a/X, b/Y}

– q  j

Generalized subsumption (Buntine)

q

OI

-subsumption

Inductive Learning as Search

Search for the “best” hypothesis in the space of hypotheses

Several exploration strategies

– Bottom-up or observation-driven

Initially select one or more examples, formulate a hypothesis covering them and then generalize (or sometimes specialize) it in order to explain the remaining examples

– Top-down or hypotheses-driven

Initially formulate a very general hypothesis and then specialize (or sometimes generalize) it in order to adapt it as much as possible to the observed data

– Bi-directional

Inductive Learning as Search

...but what is the “best” hypothesis?

Since induction is a search in the space of descriptions, the goal of the search must be specified

In addition to completeness and consistency, descriptions of a class of objects in the context of a fixed set of other classes of objects are required to be

– Discriminative

A discriminative description defines all properties of the objects in that class necessary to distinguish them from objects in other classes

– Characterizing

A characterizing description defines facts and properties that are true for all objects in a class

Inductive Learning as Search

Discrimination

Characterization

MSC (Maximally Specific Characterization)

– The most detailed description that is true for all known objects belonging to the (conjunctive) class

+ + + + + + + + + + + + +

MSC

- - -

- - - - - - - - - - - - - - - - - - - + + + + + +

+ + + + + + + + +

- - - -

- - -

- - - -

- -

(11)

Learning Logic Formulas

Class description =

Logic formulas + class they belong to

<F

i

, class>

– (F

i

often ground)

– Predicates in F

i

= attributes of classes

– F

i

consistent with examples

True on examples

False on counterexamples

Extension of F

i

: set of instances for which F

i

is true

Learning Logic Formulas

Example

Class “wait at restaurant”

∀X wait(X) ↔

noClients(X,some) ∨

(noClients(X,many) ¬hungry(X) type(X,fastfood)) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∨

(noClients(X,many) ¬hungry(X) type(X,thai) fri-sat(X)) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∨

(noClients(X,many) ¬hungry(X) type(X,burger)) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨

Instance

alternative(x1) ¬bar(x1) ¬fri-sat(x1) hungry(x1) … CLASSIF wait(x1) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨

Learning Logic Formulas

Generality ordering

Given hypotheses H and H’

with extensions e(H) and e(H’)

– H is more specific than H’ (H’ is more general than H)

H is a specialization of H’ (H’ is a generalization of H) if and only if e(H)  e(H’) ( H - H’ )

Example

– cube(X) red(X) more specific than cube(X) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨

– cube(X) more specific than cube(X) pyramid(X) ∨

Learning Logic Formulas

Generalization and Specialization to modify formulas

Generalization operators

– Dropping condition

From cube(X) red(X) to cube(X) ∧ ¬hungry(X) ∧ type(X,fastfood)) ∨ – Replacing with more general predicates

From cube(X) to geometricShape(X)

If cube(X) is_a geometricShape(X)

Specialization operators (dual)

– Adding conditions

– Replacing with more specific predicates

Inductive Learning as Search

Generalization

Positive examples

– Ex+1: “a white triangle”

(shape = triangle)  (color = white) – Ex+2: “a white square”

(shape = square)  (color = white)

To produce a hypotheses one may generalize

– “object X is white”

H : (color = white)

or, if the system knows that triangle and square are both specializations of polygon, another hypothesis can be formulated

H : (color = white)  (shape = polygon)

Inductive Learning as Search

Specialization

If negative examples (or counterexamples) are available, generalization on positive examples may yield too general concept descriptions, that cover negative examples (over-generalization)

– In such a case, one may specialize

Negative example

H : (color = white)  (shape = polygon) – Ex–1: “a white pentagon”

(shape = pentagon)  (color = white)

– A more specific hypothesis must be produced, e.g.

(shape = polygon)  (color = white)  (no_sides(X) < 5)

(12)

Inductive Learning as Search

How to reduce the number of consistent hypotheses?

Bias

– Any reason to prefer one among different hypotheses that are consistent with the training set

[Mitchell, Utgoff, Haussler, Rendell]

– Examples

Considering only hypotheses that can be expressed in a given language

Preferring hypotheses that have a shorter, simpler description

Occam’s razor

Preferring the first consistent hypothesis found

Assuming that the relative frequency of objects of class C in the training set is approximately the same as that of the objects of class C in the universe

Concept Learning

Inferring a boolean function from examples

Given < P, N, C, L >

– P set of positive examples (instances) of the target concept

N set of negative examples of the target concept

– C set of concepts (depending on conceptual bias)

– L available language (depending on logic-linguistic bias)

Define a relationship r acceptable iff definable in terms of concepts C in language L

r characteristic iff satisfied by positive examples

r discriminant iff not satisfied by any negative example

r admissible iff characteristic and discriminant

Inductive bias: the kind of assumptions needed about the nature of the target function (e.g., Occam’s Razor)

Concept Learning

Positive and negative examples (instances) of the target concept “enjoy sport”

Task:

Learn to predict the value of “enjoy sport” for an arbitrary day

Sky Air

Temp Humid Wind Water Forecst Enjoy Sport Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes

Concept Learning

Representing the hypotheses

Many possible representations

A hypothesis h is a conjunction of constraints on the attributes of instances

– Each constraint may be:

A specific value (“Water=Warm”)

Don’t care (“Water=?”)

No allowed value (“Water=Ø”)

Sky Temp Humid Wind Water Forecst

? Cold High ? ? ?

Formalization

Given

Instances X:

– Possible days describes by attributes

Sky, Temp, Humidity, Wind, Water, Forecast

Training examples D: <x

i

, c(x

i

)>

Target concept c:

– EnjoySport : X  {0,1}

c(x)=1 if EnjoySport=yes

c(x)=0 if EnjoySport=no

Find

Hypothesis H:

– Expressed as a conjunction of constraints on attributes

Instances X

x

1 = <Sunny, Warm, High, Strong, Cool, Same>

x

2 = <Sunny, Warm, High, Light, Warm, Same>

Hypotheses H

h

1 = <Sunny, ?, ?, Strong, ?, ?>

h2 = <Sunny, ?, ?, ?, ?, ?>

h3 = <Sunny, ?, ?, ?, Cool, ?>

x

1

x

2

General

Specific h

1

h

3

h

2

(13)

One may take advantage from the general-to- specific ordering:

Starting with the most specific hypothesis possible and generalizing it each time it fails in covering an observed training instance

– Bottom-up search

Finding a maximally specific hypothesis

Instances X

 

General

Specific Hypotheses H

h

0

= Ø, Ø, Ø, Ø, Ø, Ø  h

0

Instances X

 

General

Specific Hypotheses H

h

0

= Ø, Ø, Ø, Ø, Ø, Ø  h

0

h

1

x

1

x

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

1

= Sunny, Warm, Normal, Strong, Warm, Same

Instances X

 

General

Specific Hypotheses H

h

0

= Ø, Ø, Ø, Ø, Ø, Ø  h

0

h

1

x

1

x

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

2,3

x

2

x

2

= Sunny, Warm, High, Strong, Warm, Same

h

2

= Sunny, Warm, ?, Strong, Warm, Same

Instances X

 

General

Specific Hypotheses H

h

0

= Ø, Ø, Ø, Ø, Ø, Ø  h

0

h

1

x

1

x

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

2,3

x

2

x

2

= Sunny, Warm, High, Strong, Warm, Same

h

2

= Sunny, Warm, ?, Strong, Warm, Same

x

3

x

3

= Rainy, Cold, High, Strong, Warm, Change

h

3

= Sunny, Warm, ?, Strong, Warm, Same

Instances X

 

General

Specific Hypotheses H

h

0

= Ø, Ø, Ø, Ø, Ø, Ø  h

0

h

1

x

1

x

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

1

= Sunny, Warm, Normal, Strong, Warm, Same

h

2,3

x

2

x

2

= Sunny, Warm, High, Strong, Warm, Same

h

2

= Sunny, Warm, ?, Strong, Warm, Same

x

3

x

3

= Rainy, Cold, High, Strong, Warm, Change

h

3

= Sunny, Warm, ?, Strong, Warm, Same

h

4

x

4

x

4

= Sunny, Warm, High, Strong, Cool, Change

h

4

= Sunny, Warm, ?, Strong, ?, ?

(14)

Let h  <Ø, Ø , Ø , Ø , Ø , Ø> be the hypothesis

– Consider the first example in D:

<Sunny, Warm, Normal, Strong, Warm, Same>, + – The maximally specific hypothesis is

h  <Sunny, Warm, Normal, Strong, Warm, Same>

– Consider the second example in D:

<Sunny, Warm, High, Strong, Warm, Same>, + – The maximally specific hypothesis is

h  <Sunny, Warm, ?, Strong, Warm, Same>

– The third example is ignored because it is negative

<Rainy, Cold, High, Strong, Warm, Change>, - – Consider the fourth example in D:

<Sunny, Warm, High, Strong, Cool, Change>, + – The maximally specific hypothesis is

h  <Sunny, Warm, ?, Strong, ?, ?>

Find-S Algorithm

Initialize h as the most specific hypothesis in H

For each positive example x

For each attribute a

i

in x

– If constraint a

i

in h is satisfied by x then

Do nothing – else

Replace a

i

in h with the next more general constraint satisfied by x

Return hypothesis h as the result

Negative Examples

Why no revision occurs on negative examples?

Basic assumptions

– The concept to be learned (target) c is in H

– There are no errors in the training data

h is the most specific hypothesis in H

– Thus, c 

g

h

But

c will never be satisfied by a negative example

– so, neither h will

Negative Examples

Use

No valid reason to prefer the most specific consistent hypothesis

– How to find the set of all consistent hypotheses with the training set? (Version Space)

– For the sake of simplicity, assume the Version Space has a hypothesis most specific of all

I.e., the one returned by Find-S

– A hypothesis in the Version Space is more general or equivalent to that returned by Find-S; moreover, it must be consistent with all negative examples

Candidate-Elimination Algorithm

Intuition

Start with an initial candidate Version Space equivalent to the whole Hypotheses Space

– No example presented yet

Then

Use positive examples to remove too specific hypotheses,

and negative examples to remove too general hypotheses

Candidate-Elimination Algorithm

Example

[r,s] to denote cards

– r value: 1..13

1..10 numbers

J, Q, K figures – s seed

Red

Hearts H

Diamonds D

Black

Flowers F

Clubs C

(15)

Example

Cards described by unary relationships

– num, figure, red, black

G

– ∀n z ∀ n≤10  num([n,z])

– ∀n z ∀ n>10  figure([n,z])

– ∀n z ∀ (z=C v z=F)  black([n,z])

– ∀n z ∀ (z=D v z=H )  red([n,z])

D : “winning” and “non-winning” cards

– win([4,F])

– win([7,F])

– win([2,C])

non- win([5,H])

non- win([J,D])

Simple inductive conclusion:

“All black, non-figure cards win”

∀x num(x)  black(x)  win(x)

Example

Start building a graph to represent the space of hypotheses that are consistent with the examples

– For the first positive instance (4,F) the graph is

– Additional examples help to reduce the space

(-,-)

(Number,-) (-,Black)

(-,F) (Number,Black)

(Number,F)

(4,F) (4,Black) (4,-)

Example

– Next positive example (7,F); the updated graph is

– Next negative instance (5,H); the graph becomes

(Number,F) (Number,Black)

(-,-)

(-,Black)

(-,F) (Number,-)

(Number,F) (-,Black)

(-,F) (Number,Black)

Example

– Next positive example (2,C)reduces the graph to

– Finally, (J,C) as a negative example yields the concept- node

(-,Black)

(Number,Black)

(Number,Black)

Version Space

Version space for a concept formation problem

The set of all admissible (characteristic and discriminant) relationships for the problem

– Example

Let (4,F) be the only positive instance

There do not exist negative instances

It is possible to build a graph of the versions

Directed Acyclic Graph

Nodes = elements in the version space

Arc from p to q IFF

p is less general than q

– There is no node r more general than p and less general than q

Version Space

A version space is well-structured if and only if each chain of relationships has a minimum and a maximum element

A relationship is maximally specific and minimum element in the version space if and only if no other less general relationship exists in the space

Similarly, a maximum element denotes a maximally general relationship

Given a well-structured space V

Specific Boundary Set S: the set of minimum elements

General Boundary Set G: the set of maximum

elements

(16)

Version Space

A representation in which:

Each node is associated to a model

– Write operations are available to connect a node to a model

– Read operations are available to produce a model of a node

Links between nodes denote specialization and generalization relationships between the corresponding models

There is a specialization tree and a generalization tree

– A node in the generalization tree is connected to a model which covers everything

– A node in the specialization tree is connected to a model which covers a single case

Candidate Elimination Algorithm

[Mitchell, 1982]

During search, considers only a subset of the space of hypotheses

– The one consisting of all hypotheses consistent and complete with the observed examples (Version Space)

Only the top and bottom “borders” of such a space are stored

– S = set of most_specific consistent and complete hypotheses

– G = set of most_general consistent and complete hypotheses

A generalization H is in the version space (S,G) IFF

– H is more_specific than (or equal to) a given element in G

H is more_specific than (or equal to) a given element in S

Candidate Elimination Algorithm

Boundary sets are updated when a new positive or negative instance is available

Positive example x

– pg(x,S,G) = { g  G | g(x) }

Update general boudary set pg(x,S,G) by removing the elements that do not cover the new example:

– ps(x,S,G) = { r | pup(x,S,G,r)}

Positive updating (pup): update specific boundary set by including a relationship if and only if:

It is an element of the old specific boundary set or a generalization thereof

It is a specialization of the new general boundary set

It covers the new instance

There is no more specific relationship with the above properties

Candidate Elimination Algorithm

– Analogously/symmetrically,

Negative example

– ns(x,S,G) = { s  S | s(x)}

Boundary sets updated by pruning the old (S,G) to remove elements that cover the negative instance

– ng(x,S,G)= { r | nup(x,S,G,r)}

Negative updating (nup): include a relationship in the general boundary set ng(x,S,G) if and only if:

It is in the old general (S,G) or a specialization thereof

It is a generalization of an element in the new specific

boundary set

It does not cover the new instance

There is no more general relationship with the above properties

Candidate Elimination Algorithm

Candidate Elimination Algorithm

Basic theorems

Given a concept formation problem <P,N,C,L> with a well- structured version space V and Boundary Sets (S,G)

– Boundary Set Theorem

A relationship r is in V if and only if it is connected downwards to an element in S and upwards to an element in G

– Candidate Elimination Theorem

ps(x,S,G) and pg(x,S,G) are the Boundary Sets for the updated version space <P{x},N,C,L>

ns(x,S,G) and ng(x,S,G) are the Boundary Sets for the new problem

<P,N {x},C,L>

(17)

Version Space

Candidate Elimination Algorithm

Summary

– Initialize G as the set of maximally general hypotheses and S as the set of the maximally specific hypotheses

– While there are examples

Consider the next example

If negative e–

Keep in S the generalizations that do not cover e-

Specialize hypotheses in G that cover e-, so as to exclude e- and keep them more general than other hypotheses in S

Remove from G the hypotheses that are more specific than

othes that are in G

If positive e+

Keep in G the generalizations that cover e+

Generalize the hypotheses in S that do not cover e+ so as to cover e+ and keep them more specific than other hypotheses in G

Remove from S the hypotheses that are more general than others that are in S

Candidate Elimination Algorithm

Advantages

Incremental

Bi-directional search

Storage optimization

Disadvantages

Needs a complete set of examples

Does not produce disjunctive concepts

References

Carnap R., Jeffrey R.C.

– “Studies in Inductive logic and Probability” Berkley, University of California Press, 1971

Mitchell, T.M.

– “Version spaces: an approach to concept learning”, 5th IJCAI, MIT, Cambridge, MA, 305-310, 1977

– “Generalization as Search”, Artificial Intelligence, 1982

Hirsh H.

– “Incremental version space merging: a general framework for concept learning”, Kluwer, 1990

– “Generalizing version spaces”, Machine Learning, 1994

Vere, S.A.

– “Induction of concepts in predicate calculus”, 4th IJCAI, Tbilisi, USSR, 281-287, 1975

References

T.M. Mitchell

"Machine Learning", McGraw Hill, 1997

J.C. Carbonell, R.S. Michalski, T.M. Mitchell

“Machine Learning: a historical and methodological analysis”, in Readings from AI Magazine vol 1-5, R.

Engelmore ed., AAAI, Menlo Park

Riferimenti

Documenti correlati

(pts 7 - 15) Let G be an LR grammar for Boolean expressions with disjunction, _or_, negation, not_, conditional, _?_:_, grouping, variables and the literals true, e false?. All

Note the differences between the localization of the GAGs (stained in orange, asterisk) in the matrix of the two menisci (in the inner zone for the sane meniscus and in

Based on this, Chapter 4 suggests that knowledge management benefits from organizational settings that motivate individuals’ sharing of intellectual capital (Cabrera,

The second research area (i.e., knowledge sharing and knowledge transfer) has led to the development of Chapter 4 and Chapter 5 which respectively investigate

Rappresentazione della Conoscenza ING-INF/05 a scelta Sistem i Em bedded e Real Tim e ING-INF/05 a scelta Strutture Dati Avanzate ING-INF/05 a scelta Visione Artificiale ING-INF/05

– Programming = describing the problem domain through facts and rules in the form of Horn clauses. ● Questions -&gt;

● Based on the graphical representation of the relationships existing between elements in a given domain, Semantic Nets adopt the metaphor that. – objects are nodes in a

● In case the knowledge base does not contain sufficient information to obtain answer Answer to the current goal Goal, it asks it, if possible, directly to the user. – Same