1
Process Mining Process Mining
Basics – Representation – Tasks – WoMan Basics – Representation – Tasks – WoMan
Laurea in INFORMATICA Laurea in INFORMATICA
MAGISTRALE MAGISTRALE
Corso di
INTELLIGENZA ARTIFICIALE Stefano Ferilli
Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.
“Veder fare è Saper fare”
[If you look at someone doing something, you learn how to do that]
Proverb
Introduction
●
Increased complexity of most human activities due to advances in science and technology applied to everyday life
●
Loop: New possibilities generate, as a side-effect, new needs, that in turn call for more and more complex solutions
●
Result: contemporary society pervaded by procedures that are characterized by a complex interaction of different and inter-related tasks
Motivations
●
Proper definition, handling and management of such interactions crucial for reaching the given objectives
●
Need for specific techniques and tools that allow to model processes and to enact them
●
The availability of suitable process models can determine the success or failure of companies
●
Standardized processes important for correctly carrying out activities in an organization
●
Often processes are already in operation
– Need to understand and formalize them in a model that can support their analysis, replication and enforcement
Problems
●
Producing the models is complex, costly and error-prone
●
Interest in automatically learning them from examples of actual procedures
●
Need of adapting and fine-tuning existing models, when available, in dynamic environments
●
Gap that often causes models not to perfectly fit the practice
●
Formal models of activities are typically produced by supervisors and managers
●
Most information for properly setting up these models comes from the practitioners that day by day carry out and improve the procedures
Ojectives
●
Studying processes and techniques for modeling them
●
Building tools that can (at least partially) automatically learn process models
●
Companies lacking a process management can obtain an initial model from examples of current practice, to be later refined and fine-tuned by experts
●
Companies endowed with a process model can
study it, compare it to the current practice, or use it
to drive their practices
Tasks
●
Mining
●
Discovery
●
Analysis
●
Supervision/Conformance checking
●
Prediction
●
Activity
●
Process
●
Simulation
Definitions
●
●
Process Management techniques are useful in domains where a production process must be monitored in order to check its compliance with a desired behavior
●
When a Process Model is available, new process enactments can be supervised automatically
●
Process Mining approaches can solve these problems by learning process models automatically
– Declarative approaches learn models expressed as a set of constraints (instead of monolithic graphs)
Definitions
●
Business processes
●
Separate, semantically identifiable actions that modify the state of the process.
●
Process
●
A sequence of actions performed by (any kind of) agents, possibly working concurrently.
Definitions
●
Events
●
Identifiable, instantaneous actions
– including decisions upon the next activity to be performed that characterize the dynamic behavior of a process
●
Given the time granularity of interest:
– shorter activities represented as single events
– activities spanning some significant period of time represented by the interval between two or more events
●
Overlapping activities of a process represented by a sequence of events
Definitions
●
Workflow
(or Workflow schema, or Process model)
●
A (formal) specification of how a set of actions can be composed in a given environment to result in valid processes for that environment
– Allowed compositional schemes for execution
●
sequential
●
parallel
●
conditional
●
iterative
Definitions
●
Case (or Workflow instance)
●
A particular execution of actions in a specific order as specified by a given workflow.
– Many cases can be handled by the same workflow process definition
●
By abuse of terminology, usually referred to as a
‘sequence’ of actions
– one can recognize direct adjacency (‘sequential’) relationships between specific pairs of steps only
– the overall composition may (and often does) involve split/joint routes
●
Described by a trace
Definitions
●
A case is in essence a tuple
●
c = (S
c,f
c,≤
c)
– S
ca set of steps (time points)
– f
c
: S
c
⇒ T function assigning one from a set of possible tasks (activities) T to a step
– ≤
ca partial ordering on S
c●
Case trace
– List of events associated to steps (time points)
●
Can be produced by the software supporting an organization's activity
●
Log (execution logs or event streams)
– Collection of traces, interleaving their elements
Definitions
●
Important distinction
●
Task
– a generic piece of work, defined to be executed for many cases of the same type
●
Task enabling (Work item)
– a concrete piece of work, i.e., a task which is enabled for a specific case (possibly depending on a trigger)
●
Task execution (Activity)
– the actual execution of a work item by a resource
– Activities are transactions
●
Must fulfill the ACID (Atomicity Consistency Isolation Durability) properties
–
bias on how the proper task size is to be determined
Definitions
●
It is often useful to identify
●
(boolean) conditions which correspond to causal dependencies between tasks
– Preconditions
●
should hold before the task is executed – Postconditions
●
should hold after execution of the task
●
Additional external conditions (Triggers)
– must be satisfied in the current state of a case to enable the execution of a task for that specific case
●
automatic, if the task is triggered just because it is enabled, or
●
determined by some kind of user interaction, or by a message notifying an external event, or by a clock reaching a pre- determined time
Definitions
●
Resource
●
The agent (human or artifact) that can carry out a specific task
– Resources with similar characteristics are grouped into classes
●
Role
–
A resource class based on the functional requirements of its members.
●
Organizational unit
–
Classification based on the structure of the organization
●
E.g.: team, branch or department – A resource may be a member of many classes
Workflow
●
A workflow ‘lives’ in a 3-dimensional space
– Axes are:
●
Case
–
As an independent axis, all cases are handled individually, and do not directly affect each other
●
Indirect dependences could be determined by their sharing resources and data
●
Process
–
the tasks and the routing along these tasks
●
Resource
– Work item = a point in the Case-Process plane
●
Activity = a point in the space outside this plane
●
Workflow management determines which points in the space are ‘valid’, when and how
Definitions
●
Process discovery (or Process Mining)
●
The area of research aimed at inferring workflow models from specific examples of process executions
●
Goal : using event data collected from a process execution to infer a formal model of the behavior of the process
– Both positive and negative examples would be required to construct an accurate and minimal model
●
in practice, cases are collected from actual process executions
–learning must work on positive examples only
Process Mining
●
Given
●
a set of tasks T and
●
a set of cases C ⊆ T*,
●
the aim is
●
discovering a workflow model that fulfils the following requirements:
– Completeness
●
can generate (explain, cover) all event sequences in C – Irredundancy
●
generates as few event sequences of T* \ C as possible – Minimality
●
is as simple and compact as possible
Process Mining
●
About requirements
●
Accuracy = completeness and irredundancy
– Typically in contrast with minimality
●
A more compact model is usually more general, and thus tends to cover more cases
●
Hard to capture irredundancy and minimality
– E.g., minimality may be referred to the number of edges or to number of nodes
Process Mining
●
Additional desiderata
●
Ability to capture concurrent behavior
●
Ability to deal with noise
– presence of wrong process executions in the training examples
●
Incrementality
●
Further complexity introduced by the presence in the model of
●
synchronization among tasks
●
invisible tasks
●
duplicate tasks
Process Models
●
4 classes, distinguished based on the combination of
●
the task flow being strictly sequential or not
●
the possibility of having repeated tasks in the same case
– corresponding to loops or cycles in the graph) or not (where the mapping between events and tasks is injective):
●
Class 1: sequential, unique tasks
●
Class 2: sequential, repeated tasks
●
Class 3: concurrent, unique tasks
●
Class 4: concurrent, repeated tasks
Process Mining
●
A more focused research on process discovery started in the 90s [van der Aalst]
●
Main motivations
– needs of enterprises willing to formalize, verify and enforce workflows for improving the management and efficiency/effectiveness of their production, by exploiting the learned models in WorkFlow Management Systems (WFMSs for short)
●
Main areas of interest
– how to suitably represent processes and cases
– how to learn process models from specific cases collected by process management software in logfiles
Representation
●
Generally speaking, a process can be modeled as a directed graph
●
nodes are associated to states or tasks/activities
●
edges represent the potential flow of control among activities
– Start (respectively, stop) states have no incoming (respectively, outgoing) edges
– Edges can be labeled with probabilities and/or conditions
(boolean functions independent from each other) on the
state of the process, which determine whether they will
be traversed or not
Process Model Representation
●
Several models proposed in the literature for representing processes
●
borrowed from different branches of Computer Science
●
Each has its pros and cons
Representation – Early formalisms
●
Finite State Machine (FSM) model
●
a nondeterministic, transition-labeled state machine
– nodes are associated to states
– edges of the automaton (transitions between states) represent the activities (input tokens)
●
However, while in a process graph there is only one node for each different activity, the same token may appear many times in an automaton
Representation – Early formalisms
●
Hidden Markov Models (HMMs)
●
A discrete, nth-order Markov model of a system is a restricted, probabilistic process representation
– Finite number of states defined for the process
– Markov property
●
At any point in time, the probability of the process being in some state depends only on the last n states
– State transition probabilities do not change over time
– Initial state of the process defined probabilistically
●
States = activity nodes
– Activities = output symbols
– Trace = set of output symbol sequences
●
Inducing a model from a trace = Inducing an HMM from a set of output symbol sequences
–
Several Machine Learning approaches available for this
Representation – Early formalisms
●
A serious drawback of the above
representations is that they are not suitable to model concurrency
●
If full-featured workflows are to be described, different solutions are needed
Petri Nets
●
Classical representation of distributed systems
●
can serve to specify processes and workflows by determining which routings of tasks represent valid cases
●
Desirable candidate for formal representation of processes in general, and of workflows in particular
●
mix an intuitive graphical representation with a formal specification
●
support for formal validation and exploitation from WFMSs
Petri Nets
●
Example
Petri Nets
●
Directed bipartite graph N = (P ∪ T, A), where
●
set of nodes partitioned into
– places (graphically denoted by circles) P
– transitions (denoted by squares) T
●
arcs (denoted by arrows) in A ⊆ P × T ∪ T × P
– Connect either a place to a transition or a transition to a place
●
Transitions represent tasks
●
Execution assumed to be atomic
●
Places and arcs express conditions and causal dependencies among tasks
Petri Nets
●
Places may contain tokens (denoted by dots)
●
Indicate that they are active and may enable the execution of subsequent tasks
– A transition is enabled (representing a work item),
●
i.e. it may take place, if and only if
each of its input places contains at least the number of tokens reported in the corresponding input arc, or at least one token if no such number is present
●
A specific configuration of tokens along the net is called a marking
Petri Nets
●
Task execution is non-deterministic
●
Given an enabled transition, one cannot say when, nor even if, it will take place
●
A transition that actually takes place (fires) represents an activity
●
The execution
– consumes the specified number of tokens from each of its input places
– produces in each of its output places the number of tokens specified on the corresponding output arc, or one token if no such number is present
Petri Nets
●
AND-split
– A transition with many output places, starts a parallel execution
●
AND-join
– A transition with many input places, reduces different parallel executions to a single one
●
(implicit) OR-split
– A place with many outcoming arcs, determines a choice about which transition is to be executed next
●
OR-join
– A place with many incoming arcs, represents a state that can be reached in different alternative ways
Petri Nets
●
In a correct net
●
each AND-split is ‘closed’ by an AND-join
– representing a parallel scheme
●
each OR-split is ‘closed’ by a corresponding OR- join
– representing a conditional scheme
●
a split of one kind closed by a join of the other kind is not permitted
Petri Nets Extensions
●
High-level Petri nets include 3 useful extensions
●
Color (to model data types)
●
Time
●
Hierarchy (to structure large models)
●
Triggers can be modeled in Petri nets by tokens in additional input places of the task
●
However, for the formal study one may safely
abstract from triggers and attributes
Petri Nets Restrictions
●
Workflow Nets
●
Purposely developed to express the control flow in a process
– Single starting status (source place)
– Single terminating status (sink place)
●
Do not allow ‘dangling’ nodes
●
Sound WF-nets
●
No deadlocks or live locks
– Termination guaranteed
●
No ‘dangling’ tokens
– All tokens in the net must be consumed upon termination
●
Any task must be on a path from source to sink
Petri/WF-nets
●
Shortcomings
Model Representation
●
Other more specialized formalisms also exist
●
The one exploited in the WFMS ADONIS
– distinguishes several types of nodes
●
Represent Begin, End, Activities, Decisions, Splits, Joins – connected by arcs
●
Each Activity node is associated to a task
●
Edges can be labelled with probabilities and/or conditions
Declarative Approach
●
Do not completely specify the process flow
●
Graph representation
●
Express it by means of first-order logic formulae that describe sets of states and transitions
●
E.g., using a restriction of Linear Temporal Logic (LTL)
●
Express constraints that must be fulfilled during process execution
●
Imposes only a (minimal) set of constraints that must be satisfied when executing the process activities
Definitions
●
A trace element is typically a record
●
(T,E,W,P,A,O)
– T = event timestamp
– E = type of the event
– W = name of the workflow the process refers to
– P = unique identifier for each process execution
– A = name of the activity
– O = progressive number of occurrence of A in P
●
E.g., some consider runs in which a partial order among activities can be specified
●
Additional useful information
– R = resource carrying out the activity
Traces
●
Pure sequences of task names often exploited
●
T1, T2, T4, T3, T5, T9, T6, T3, T5, T10, T8, T11, T12, T2, T4, T7, T3, T5, T8, T11 ,T13
– Assumption/limitation : the activities are instantaneous and no two activities can start at the same time
– Main challenge in inducing a process graph: identifying dependency relationships between activities
– No way for distinguishing concurrency
●
Concurrent activities produce a serialized stream that may have nondeterministic orderings of events
– Even authors that adopt this representation admit that it is used just for simplification purposes
●
although they say “without loss of generality”
– and that to handle time span and parallelism of tasks a more refined representation is needed, involving the specification of task start and end events
●
Indeed, [vanderAalst04] proposes to distinguish an even wider
range of types of task-associated events, as in the EMiT system
Process Mining
●
Several aspects come into play in a process and should be addressed by any complete modeling activity:
●
behavioral aspects
– The core of all work in process mining concerned the behavioral aspect only, and focused on learning the structure of the corresponding graph made up of activity nodes and edges only
●
relationships between the artifacts produced by the project
●
roles and responsibilities of the agents in the process
Approaches
●
Approach inspired to grammar inference
– focused on software development
●
The data describing the behavior of a process are viewed as sentences in a regular language, whose grammar is the formal model of the process.
●
Statistics concerning the number, frequency and regularity of event sequences are exploited to discover from the data FSM models of behavioral patterns involving sequencing, selection, and iteration
– More powerful representations, such as Petri nets, are better suited for prescribing a process, but make more complex the discovery problem
●
Wide literary production by van der Aalst and colleagues
●
Initially more focused on workflow mining
●
More recently interested in other related topics
– Analysis
●
properties, metrics, verification, evaluation, comparison – Representation
●
formalisms and languages, tools, visualization – Exploitation of workflow models
– management, enactment, application fields, simulation
Little Thumb
●
Works on plain sequences of tasks
●
Infers causal dependencies based on several statistics about the frequency and mutual ordering of task execution
●
Model built in 3 stages
– extraction of statistics on tasks,
– construction of a Dependency/Frequency graph among tasks
– transformation of the DF-graph into a WF-net by recognizing the type of splits/joins and adding suitable places consequently
Little Thumb
●
Relations
●
A>B
– if and only if there is a trace line in W in which event A is directly followed by B
●
A→B (dependency)
– if and only if A>B and not B>A
●
Used to connect events in the a-algorithm
●
A#B (non-parallel) – parallelism unlikely
– if and only if not A>B and not B>A
●
Used to detect the kinds of splits and joins
●
A||B (parallel) – potential parallelism
– if and only if both A>B and B>A
Little Thumb
●
Drawbacks
●
Tests have pointed out that it becomes less and less accurate as long as the number of parallel processes and/or nested loops increases
– Not surprising, since several parallel processes are linearized in the log, and all possible reorderings thereof should be identified by the statistical analysis, but such an analysis is based on pairs of tasks only
●
Nested loops can be hardly identified
– They imply multiple causal relationships, that are not
handled by the technique
SWF-nets and α-algorithms
●
Aim:
●
characterizing classes of learnable process models
●
providing methods for doing this
●
Proposed solution:
●
defined the class of ‘sound Structured WF-nets’
(SWF-nets)
●
α
+-algorithm
– extension of a previous α-algorithm needed to handle short loops of length 1 or 2
– provably able to mine all of them
●
Strong assumptions and limitations posed on the workflow properties
Genetic Approach
●
Based on standard Machine Learning techniques
●
proposed to tackle the limitations of the α- algorithms family
●
Can learn any Petri net without duplicate tasks and without more than one place with the same input and output tasks
●
New models formalism, called Causal Matrix
– Associates each task to the sets of its input and output ones according to the available traces
●
Computationally demanding
●
Noise handled by a post-processing step that prunes low-frequency arcs
Declarative Approach
●
Specifically concerned with declarative process specification
●
Very important when dealing with particularly complex models and domains
Declarative Approach
●
Declarative Process Model Learner (DPML)
– DecMiner plug-in of the ProM tool
●
Very high accuracy on unseen traces, also in the presence of noise, but runtimes in the order of hours
– Incremental version IPM (Incremental Process Miner)
– Probabilistic version
●
Improves accuracy and runtime
●
Issues
– Exploits both positive and negative examples
●
Not standard in process mining
●
Simplifies the problem, but unrealistic in some domains
●
Experts required to specify examples, background knowledge and language bias
– The user may/must specify a background knowledge and a language bias
Extra Features
●
Noise and transition probabilities usually handled by adding a counter of occurrences to edges
– fundamental for real-world domains and data
●
Some works envisage the possibility of mining/inducing simple boolean conditions for edges
– Use of a propositional learner to obtain decision tree classifiers
●
determine whether a given activity is to be taken or not according to the content of feature vectors describing the status of the execution when the decision is to be carried out
–
Details are typically not provided
Process Mining Approaches
●
Shortcomings
●
Optional tasks
– Require hidden transitions in Petri nets
●
Duplicate tasks
– Usually one node per task generated by learners
●
[18] : in real workflow projects up to three or four nodes can be associated with the same activity within one process model
●
[2] : using labeled Petri net would solve the problem but also complicate matters enormously
●
Short loops
●
Nested loops
●
Large parallelism
Experiments
●
1
●
●
●
2
Experiments
●
3
●
●
●
●
4: different small loops
Experiments
●
5: different small loops
●
●
●
●
6: two nested loops
Experiments
●
7: two nested loops
●
●
●
●
8: two nested loops
Experiments
●
9: many nested loops
●
●
●
●
10: loop that crosses two OR-splits
Experiments
●
11: high parallelism
Experiments
●
Experimental setting
●
1000 training cases
– a 5% of noise, and affected by a 10% of noise, respectively. Addition of noise consisted in one of the
●
following interventions:
●
1. deletion of the head of a sequence of events;
●
2. deletion of the tail of a sequence of events;
●
3. deletion of the body of a sequence of events;
●
4. swap of two randomly selected events in a sequence.
●
Experiments
●
Experimental results
●
Id : model identifier
●
size : number of tasks/transitions
●
∧ , : number of AND and OR splits/joins, respectively ∨ : number of AND and OR splits/joins, respectively
●
#l (s,r,n) : number of loops, of which
–
s = short (2 tasks), r = recursions (1 task), n = nested – X/Y/Z = 0/5/10% noise
●
y = correct model
●
n = different model
●
e = formally wrong model
●
? = did not finish
●
a
Tool
●
ProM
●
https://svn.win.tue.nl/trac/prom/wiki/ProM66
Current Research & Open Problems
●
Use of ontologies
●
Improved semantics
●
Activity recognition
●
Low-level data available
●
Activity boundaries
– Start of activities might be identified only later
●
Different approaches
– Sub-symbolic
– Ontology-based
●
Model composition
Beyond the state-of-the-art
●
Petri Nets not satisfactory
●
No native support for Multi-perspective
– Agents, Time, Data
– Context
●
Focus on the flow of activities
– Problems with process mining systems for duplicate or hidden transitions
●
No tight integration with guards
– Conditions for determining if and when model components can be activated
– May prevent undesired activities
●
Support to irredundancy
WoMan
●
Framework/System for Workflow Management
●
Mining
●
Supervision
●
Prediction
– Activity
– Process
●
Simulation
●
Modularization
●
Analysis
WoMan
●
Features
●
Full incrementality
●
Declarative (First-Order Horn Clause Logic)
– Allows to describe any kind of context
●
Relationships
– Embedded ILP learner for conditions
●
Multi-perspective
●
Strict Adherence to Experience
●
Noise Handling
●
Efficient/Effective
●
Complex processes
Sample Domain
●
Manufacturing a motorcycle
v
v
WoMan Formalism Input
●
Trace/Log entries
●
entry(T,E,W,P,A,O[,R]).
– T : timestamp
– E : event (begin/end_process/activity)
●
context_description – W : workflow name
– P : unique case identifier
– A : activity
●
Context description: conjunction of First-Order Logic atoms – O : progressive occurrence of activity A in case P
– R : resource (agent) that carries out activity A
●
Optional
Sample Case (Trace)
entry(201509280700, begin_process, motorcycle, 4, start, 1).
entry(201509280700, begin_activity, motorcycle, 4, build_ engine, 1).
entry(201509280705, begin_activity, motorcycle, 4, build_frame, 1).
entry(201509280745, end_activity, motorcycle, 4, build_frame, 1).
entry(201509280747, begin_activity, motorcycle, 4, paint_frame, 1).
entry(201509280752, end_activity, motorcycle, 4, paint_frame, 1).
entry(201509280820, end_activity, motorcycle, 4, build_engine, 1).
entry(201509280822, begin_activity, motorcycle, 4, test_engine, 1).
entry(201509280828, end_activity, motorcycle, 4, test_engine, 1).
entry(201509280830, begin_activity, motorcycle, 4, install_engine, 1).
entry(201509280831, begin_activity, motorcycle, 4, install_wheel, 1).
entry(201509280832, begin_activity, motorcycle, 4, install_wheel, 2).
entry(201509280833, end_activity, motorcycle, 4, install_wheel, 1).
entry(201509280834, end_activity, motorcycle, 4, install_wheel, 2).
entry(201509280835, begin_activity, motorcycle, 4, install_seat, 1).
entry(201509280850, end _activity, motorcycle, 4, install_engine, 1).
entry(201509280851, begin_activity, motorcycle, 4, put_fuel, 1).
entry(201509280853, end_activity, motorcycle, 4, install_seat, 1).
entry(201509280855, end_activity, motorcycle, 4, put_fuel, 1).
entry(201509280858, begin_activity, motorcycle, 4, test_engine, 2).
entry(201509280905, end_activity, motorcycle, 4, test_engine, 2).
entry(201509280908, begin_activity, motorcycle, 4, install_radio, 1).
entry(201509280915, end_activity, motorcycle, 4, install_radio, 1).
entry(201509280917, begin_activity, motorcycle, 4, install_plate, 1).
entry(201509280920, end_activity, motorcycle, 4, install_plate, 1).
entry(201509280920, end_process, motorcycle, 4, stop, 1).
Time
build_frame paint_frame
install_seat
install_wheel install_wheel
07:05
07:00
07:45 07:47 07:52
08:31 08:33
08:32 08:34 08:35 08:53
Sample Case (Schema)
●
28/09/2015: Activity timeline
build_engine
07:00 08:20
test_engine 08:22 08:28
install_engine
08:30 08:50
put_fuel 08:51 08:55
test_engine 08:58 09:05
install_radio 09:08 09:15
install_plate 09:17 09:20
Time
build_frame paint_frame
install_seat
install_wheel install_wheel
07:05
07:00
07:45 07:47 07:52
08:31 08:33
08:32 08:34 08:35 08:53
Sample Case (Schema)
●
28/09/2015: Concurrence
build_engine
07:00 08:20
test_engine 08:22 08:28
install_engine
08:30 08:50
put_fuel 08:51 08:55
test_engine 08:58 09:05
install_radio 09:08 09:15
install_plate 09:17 09:20
Time
build_frame paint_frame
install_seat
install_wheel install_wheel
07:05
07:00
07:45 07:47 07:52
08:31 08:33
08:32 08:34 08:35 08:53
Sample Case (Schema)
●
28/09/2015: Association of steps to activities
build_engine
07:00 08:20
test_engine 08:22 08:28
install_engine
08:30 08:50
put_fuel 08:51 08:55
test_engine 08:58 09:05
install_radio 09:08 09:15
install_plate 09:17 09:20 s8
s1
s2 s3
s4
s5
s7 s9
s10 s11 s12
s6
Time
build_frame paint_frame
install_seat
install_wheel install_wheel
07:05
07:00
07:45 07:47 07:52
08:31 08:33
08:32 08:34 08:35 08:53
Sample Case (Schema)
●
28/09/2015: sequence and parallelism of steps
build_engine
07:00 08:20
test_engine 08:22 08:28
install_engine
08:30 08:50
put_fuel 08:51 08:55
test_engine 08:58 09:05
install_radio 09:08 09:15
install_plate 09:17 09:20
next relationship
s8 s1
s2 s3
s4
s5
s7 s9
s10 s11 s12
s6
Time
Sample Case (Schema)
●
28/09/2015: abstract representation of steps
next relationship
s8 s1
s2 s3
s4
s5
s7 s9
s10 s11 s12
s6
Process Model Representation
●
New formalism, more powerful than Petri Nets
●
Invisible or duplicate tasks smoothly handled
●
Agents and their roles
●
Sequential information
●
Complex pre-and post-conditions
– Can express inter-relationships among the involved entities, their properties, and the context
– Tight integration with the activity flow part of the model
●
Fine-grained control of process enactment
– Take into account the history of process enactment
– Consider interactions among the involved entities and their properties, possibly at different time points
Model: Activity Flow
●
Flow of activities
●
task(t,C) :
– task t occurred in training cases C
●
transition(I,O,p,C) :
– allowed connections between activities
– p : I Þ O occurred in training cases C
●
enabled if all input tasks in I = [t'
1,...,t'
n] are active
●
if fired, after stopping the execution of all tasks in I (in any order), the execution of all output tasks in O = [t''
1,...,t''
m] is started (in any order)
–
I and O are multisets (parallel instances of the same task)
●
C is a multiset
– Allows to compute frequency of elements, useful to
handle noise
task(stop,[1,2,3,4,5]).
task(reject,[2]).
task(install_plate,[1,2,3,4,5]).
task(install_gps,[1,5]).
task(install_radio,[1,4,5]).
task(put_fuel,[1,2,3,4,5]).
task(install_seat,[1,2,3,5]).
task(install_wheel,[1,1,2,2,3,3,4,4,5,5]).
task(install_engine,[1,2,3,4,5]).
task(paint_frame,[1,1,2,2,2,3,4,4,5,5]).
task(fix_engine,[2,3,3,5]).
task(test_engine, [1,1,2,2,2,3,3,3,3,4,4,5,5,5]).
task(build_frame,[1,2,3,4,5]).
task(build_engine,[1,2,3,4,5]).
task(start,[1,2,3,4,5]).
Model: Activity Flow (Example)
p1: {start} Þ {build_engine,build_frame}, [1,2,3,4,5]
p2: {build_engine} Þ {test_engine}, [1,2,3,4,5]
p3: {test_engine} Þ {fix_engine}, [3,3,5]
p4: {fix_engine} Þ {test_engine}, [3,3,5]
p5: {build_frame} Þ {paint_frame}, [1,2,3,4,5]
p6: {test_engine,paint_frame} Þ
{install_engine,install_wheel,install_wheel}, [1,2,3,4,5]
p7: {install_wheel,install_wheel} Þ {install_seat},
[1,2,3,4,5]
p8: {install_engine} Þ {put_fuel}, [1,2,3,4,5]
p9: {install_seat,put_fuel} Þ {test_engine}, [1,2,3,4,5]
p10: {test_engine} Þ {install_radio,install_gps}, [1,5]
p11: {install_radio,install_gps} Þ {install_plate}, [1,5]
p12: {install_plate} Þ {stop}, [1,3,4,5]
p13: {test_engine} Þ {reject}, [2]
p14: {reject} Þ {stop}, [2]
p15: {test_engine} Þ {install_plate}, [3]
p16: {test_engine} Þ {install_radio}, [4]
p17: {install_radio} Þ {install_plate}, [4]
Model: Activity Flow
●
More powerful than Petri Nets
●
Invisible tasks
●
Duplicate tasks
●
Complex configurations
Model: Activity Flow Translation from/to Petri Nets
●
AND-OR split p1: {1} Þ {a,c}
p2: {1} Þ {a,d}
p3: {1} Þ {b,c}
p4: {1} Þ {b,d}
(a c) (a d) (b c) (b d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧
= (a (c d)) (b (c d)) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
= (a b) (c d) ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
●
OR-AND join p1: {1,3} Þ {a}
p2: {1,4} Þ {a}
p3: {2,3} Þ {a}
p4: {2,4} Þ {a}
(1 3) (1 4) (2 3) (2 ^ 4) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
= (1 (3 4)) (2 (3 4)) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
= (1 2) (3 4) ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
Model: Activity Flow Translation from/to Petri Nets
●
AND-OR Split: a more complex case
– p1: {1} Þ {a,b,c}
p2: {1} Þ {a,e}
p3: {1} Þ {a,f}
– (a c f ) (a d) (b f ) = ∧ ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧
– = . . . = a (b e f ) (c e f ) ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
●
Some cases are too complex
●
Require invisible or duplicate transitions
– Cannot be learned by current process mining systems
●
WoMan formalism (and associated mining and management techniques) more powerful than the curent state-of-the-art
Model: Activity Flow Beyond Petri Nets
●
Impossible translation
●
p1: {1} Þ {a, c, f}, p2: {1} Þ {a, d}, p3: {1} Þ {b, f}
– (a c f ) (a d) (b f) = . . . = ∧ ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧
= (a b) (c d b) (f d b) (a f) ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ (c d f) (f d f)
∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
●
Solution: duplicating task f into f’ and f’’
– (a c f ) (a d) (b f ) = . . . = ∧ ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =
– = (a b) (c d b) (f d b) (a f ) ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ (c d f ) (f d f ) ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =
Model: Activity Flow Beyond Petri Nets
●
Other problematic cases
●
1
– p1: I I ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∪ I Þ O O ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∪ I
– p2: I I ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∪ I Þ O O ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∪ I
●
(O O) (O O) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧
= ((O O) O ) ((O O) O) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
= (O O ) (O O ) (O O) (O O) ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . =′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ∧ ′) ⊕ (a ∧ d) ⊕ (b ∧ f′′) = . . . = ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) = ∧ ⊕ (a ∧ d) ⊕ (b ∧ c) ⊕ (b ∧ d) =
●
2
– p1: {t2,t3,t4,t5} Þ {t}
– p2: {t1,t2,t3} Þ {t}
Model: Activity Flow Translation to Petri Nets
●
Problematic cases involving invisible tasks
–
p’: { start } Þ { t
1}, [1]
–
p’’: { t
1} Þ { t
2, t
3}, [1]
–
p’’’: { t
2
, t
3
} Þ { stop }, [1]
–
p’: { t
1} Þ { t
2, t
4}, [1]
–
p’’: { t
1} Þ { t
2}, [2]
–
p’’’: { t
2} Þ { t
3}, [1, 2]
–
p’’’’: { t
3, t
4} Þ { t
5}, [1]
–
p’’’’’: { t
3} Þ { t
5}, [2]
Model: Agents
●
task_agent(t,R) :
– task t can be carried out by an agent matching one of the roles in R
●
R : simple set of roles, intensional description, reference to hierarchies, ...
●
transition_agent([R'
1,...,R'
n],[R''
1,...,R''
m],p,C,q) :
– transition p : [t'
1,...,t'
n] Þ [t''
1,...,t''
m], may occur provided that each task t'
iis carried out by an agent matching one of the roles in R'
i
, and that each task t''
j
is carried out by an agent matching one of the roles in R''
j;
●
Several combinations can be allowed, numbered by progressive q, each encountered in cases C
Model: Agents
●
Example
●
Tasks
●
task_agent(test_engine,[mechanic,driver]).
–
Activity test_engine can be carried out by a mechanic or a driver
●
Transitions
●
transition_agent([mechanic], [electrician,electrician], p10, [1], 1).
●
transition_agent([driver], [electrician,electrician], p10, [5], 2).
–
In transition p16, the input activity test_engine must be carried out by a mechanic in cases that are compliant to sample case #1, or by a driver in cases that are compliant to sample case #5.
–
In all cases, the other two activities,
install_radio and install_gps,must be both carried out by an electrician
●
transition([test_engine],[install_radio,install_gps],p10,[1,5]).
Model: Task Producer/Consumer
●
●
transition_provider(t’,[t
1, ..., t
n])
– The i-th input task for transition t’ was provided by transition t
i
●
Example:
●
transition([a,b,c,d],[h,k],21,_).
●
transition_provider(21,[3,1,4,3]).
–
In some examples, when applying transition 21,
●
input tasks a and d were the output of transition 3;
●
E.g.: transition([u,v],[a,b,d],3,_).
●
input task b was the output of transition 1;
●
E.g.: transition([s],[b,d],1,_).
●
input task c was the output of transition 4;
●
E.g.: transition([i,j,k],[c,e],4,_).
●
transition_provider(21,[3,3,6,1]).
–
In other examples, this other combination occurred
Model: Time Constraints
●
Mined on the entire training set
– task_time(t,[b’,b’’],[e’,e’’],a,d): referred to task t
– transition_time(p,[b’,b’’],[e’,e’],g,a): referred to transition p
– task_in_transition_time(t,p,[b’,b’’],[e’,e’’],a,d): referred to task t, when run in transition p
●
Constraints:
– must begin at a time ib [b’,b’’] and end at a time ie ∈ [b’,b’’] and end at a time ie ∈ ∈ [b’,b’’] and end at a time ie ∈ [e’,e’’]
●
Begin and end times relative to the start of process execution
–Computed as the difference between the begin of the
process and the event they refer to
– a, d : average and standard deviation for duration
– g: average time gap (transitions only)
●
measured between the end of the last task in I and the activation of the first task in O
Model: Step Constraints
●
Step : progressive number of activity execution
– Total ordering based on begin_of_activity event
●
Predicates expressing temporal constraints from an event sequence perspective:
– task_step(t,[b’,b’’],[e’,e’’],a,d): task t
– transition_step(p,[b’,b’’],[e’,e’’],g,a,d): transition p
– task_in_transition_step(t,p,[b’,b’’],[e’,e’’],a,d): task t, when run in transition p
●
Constraints
– must begin at a step ib [b’,b’’] and end at a step ie ∈ [b’,b’’] and end at a time ie ∈ ∈ [b’,b’’] and end at a time ie ∈ [e’,e’’]
– a, d : average and standard deviation for number of steps
– g: average step gap (transitions only)
●
measured between the end of the last task in I and the activation
of the first task in O
Model: Conditions
●
Motivations
●
State-of-the-art:
use of decision trees for task conditions
– No specific contribution to, nor strict integration with, the process model representation
– Neglects very important aspects in real-world, complex domains
●
Unable to represent and handle relationships
–Based on propositional approaches
●
Conditions on other elements of a process
–Not just tasks
Conditions
●
Expressed as rules
●
Pre-conditions
– Must be true before a task/transition is applied
●
Post-conditions
– Must be true after a task/transition is applied
●
Head:
●
Tasks
●
Transitions
●
Tasks in transition
●
Body
●
Condition referred to elements of the process, to the context and to the history of process
Conditions for Tasks (Example)
●
Pre:
test_engine(X) :- after(Y,X,[1,8]), activity(Y,build_engine), available(X,T), testing tool(T), status(X,T,S), working(S).
●
Post:
test_engine(X) :-
after(Y,X,[1,8]), activity(Y,A), build_engine(A), available(X,T), testing tool(T), status(X,T,S), working(S), test_outcome(X,O), fail(O).
test_engine(X) :-
after(Y,X,[1,8]), activity(Y,A), build_engine(A), available(X,T), testing_tool(T), status(X,T,S), working(S), test_outcome(X,O), success(O).
Conditions for Transitions (Example)
●
Pre:
p13(X) :- after(Y,X,[1,1]), activity(Y,A1), test_engine(A1), test_outcome(Y,O1), fail(O1), after(Z,Y,[3,7]), activity(Z,A2), test_engine(A2), test_outcome(Z,O2), fail(O2), after(Z,W,[1,1]), activity(W,A3), fix_engine(A3).
p15(X) :- after(Y,X,[1,1]), activity(Y,A), test_engine(A), test_outcome(Y,O), success(O), after(Z,Y,[6,10]), activity(Z,S), start(S), required_option(S,R), none(R).
p16(X) :- after(Y,X,[1,1]), activity(Y,A), test_engine(A), test outcome(Y ,O), success(O), after(Z,Y,[6,10]), activity(Z,S), start(S), required option(S,R), radio(R).
p10(X) :- after(Y,X,[1,1]), activity(Y,A), test_engine(A), test_outcome(Y,O), success(O), after(Z,Y,[6,10]), activity(Z,S), start(S), required option(S,R), full(R).
●
Post:
p1(X) :-
after(Y,X,[1,1], [3,43]), act_start(Y), context(X,C), good_weather(C), available(C,B), ball(B), status(C,B,S), inflated(S), after(X,Z,[5,5], [487,501]), activity(Y,U,V), act_bed(U), agent steve(V).
Conditions for Tasks in Transitions (Example)
●
Pre:
test_engine_p9(X) :- after(Y,X,[4,8]), activity(Y,A), build_engine(A),
available(X,T), testing_tool(T), status(X,T ,S), working(S), fuel_level(X,L), sufficient(L).
●
Post:
test_engine_p9(X) :- after(Y,X,[4,4]), activity(Y,A), build_engine(A), available(X,T), testing_tool(T), status(X,T,S), working(S), fuel_level(X,L), sufficient(L), test_outcome(X,O), fail(O), after(X,Z,[1,1]), activity(Y,fix_engine).
test_engine_p9(X) :- after(Y,X,[4,8]), activity(Y,A), build_engine(A), available(X,T), testing_tool(T), status(X,T,S), working(S), fuel_level(X,L), sufficient(L), test_outcome(X,O), success(O), after(X,Z,[1,2]), activity(Y,install plate).
Conformance Checking
●
Supervision module WEST (Workflow Enactment Supervisor and Trainer)
●
Checks compliance of new cases with a given model; Returns suitable
– warnings (deviations from model)
●
unexpected tasks or transitions,
●
conditions not fulfilled,
●
unexpected resources running a given activity,
●
deviations from the time/step intervals
●
...
each associated to a weight that quantifies a severity degree
– errors (case trace syntactically wrong)
Activity Prediction
●
Given a process model and the current (partial) status of a new process execution,
guess which will be the next activity that will take place in the execution
– Might allow the (human or automatic) supervisor to take suitable actions to ease the next activities
– Most useful in domains involving much more variability
●
Predicting the process evolutions is not just a trivial consequence of conformance checking -> more difficult
●
Problems
– How well a model can provide hints about what will happen next?
– How to assess the quality of a model?
●
(it was learned automatically exactly because the correct model is not available)
Activity Prediction
●
Prediction module SNAP (Suggest Next Activity to be Performed)
●
Returns a list of possible next activities
– Based on statistics computed on the set of the statuses
– Ranked by likelihood based on heuristics
●
Activities expected in more statuses are more likely to be carried out next
●
Activities expected in a status supported by more training cases are more likely to be carried out next
●
Activities expected in statuses that raised less warnings are more likely to be carried out next
Evaluation
▷ Performance evaluated on several datasets of different complexity
●
Ambient Intelligence (Aruba, GPItaly)
●
much more variability and subjectivity
●
no correct underlying model
●
Chess Playing
●
correct model not available
Evaluation
Aruba CASAS benchmark repository
Home activities of an elderly person
220 days Case = Day Process = daily routines Transitions = terminating some activitities and starting new ones
GPItaly GiraffPlus Project Movements of an elderly person in the various rooms 253 days
Case = Day Process = typical movements of people in the homeTasks = rooms; Transitions = leaving a room and entering another
Chess
Italian Chess Federation website
400 reports of top matches Case = Match Processes = match outcomes (white wins, black wins or draw) Task = occupation of a square by a piece Transitions = moves: each move of a player/
resource (black or white)
▷ Datasets
▷ Different complexity
●
Ambient Intelligence (Aruba, GPItaly)
●
much more variability and subjectivity
●
no correct underlying model
●
Chess Playing
●
correct model not available
Dataset/Model Statistics
Cases Events Activities Tasks Transitions
overall avg overall avg overall avg overall avg
Aruba 220 13788 62,67 6674 30,34 10 0,05 92 0,42
GPITaly 253 185844 369,47 92669 366,28 8 0,03 79 0,41 White 158 36768 232,71 18226 115,35 681 4,31 4083 25,84 Black 87 21142 243,01 10484 120,51 663 7,62 3006 34,55
Draw 155 32422 209,17 16056 103,59 658 4,25 3434 22,15
▷Extreme reliability: 97-99% of the times next activity present in the ranking and always in the top 10%
●
always first in the ranking for chess processes
▷Effective on models of very different complexity
●
Ambient Intelligence domain : worth spending effort to prepare the enviroment in order to facilitate that activity
●
Chess domain : first tool to make the machine able to play autonomously
Model Statistics
▷More cases for the Ambient Intelligence datasets
●
Aruba cases feature many short loops and some concurrency (up to 2 activities), optional and duplicated activities
●
Same for GPItaly, except for concurrency
▷Chess datasets more different tasks and transitions (many rare or even unique)
●
Very high concurrency
●
Each game starts with 32 concurrent activitites
●
Progressively decreases as long as the game proceeds
●
Short and nested loops, optional and duplicated tasks
Experimental Procedure
1)10-fold cross-validation procedure on each dataset
2)WILD (learning functionality of WoMan) to learn reference models 3)On each event in the test sets:
1)WEST checks compliance of the new event and suitably updates the set of statuses
2)SNAP uses the resulting set of statuses to predict the next activity Performance statistics
▷ Pred : ratio of cases in which SNAP returned a prediction
▷ Recall : ratio of cases in which correct activity is present in the ranking
▷ Tasks : average length of the ranking (the lower, the better)
▷ Quality : product of the previous metrics ranging between 0 and 1
●
Indication of the overall activity prediction performance
▷ Differences with respect to previous (pre-extension) outcomes
●
“+” improvent, “-” degradation, “=” equal performance
●
Folds : number of folds in previous (pre-extension) k-fold cross validation
Prediction statistics
Pred Recall Rank Tasks Quality
Aruba 0.88 0.97 0.86 6.3 0.78
GPITaly 1.0 0.99 0.98 8.2 0.97
White 0.53 0.98 1.0 11.09 0.51
Black 0.55 0.98 1.0 10.9 0.5
Draw 0.65 0.98 1.0 10.6 0.64
Chess 0.58 0.98 1.0 10.9 0.55