A method for activities of daily living recognition based on Bayesian belief networks

(1)

Polo territoriale di Como Master of Science in Computer Engineering

A Method for Activities of Daily Living

Recognition based on Bayesian Belief

Networks

Supervisor: Prof. Sara Comai

Assistant Supervisor: PhD. Fabio Veronese

Master Graduation Thesis by: Mattia Benzoni Student Id. number: 840750

(2)

(3)

Secondo i dati del World Population Prospects, la popolazione mondiale sta invecchi-ando. La sfida è quella di poter migliorare la qualità della vita delle persone anziane, garantendone una maggiore indipendenza ma allo stesso tempo un controllo da parte delle famiglie. Una possibile soluzione è fornita dallo sviluppo di nuove tecnologie che permettono di costruire sistemi di monitoraggio automatici delle persone. In questo lavoro di tesi presentiamo un metodo per eseguire il riconoscimento delle attività della vita quotidiana (ADL) basato sulle Bayesin Belief Networks, utilizzando i dati forniti da una Smart Home. Per poter mettere in relazione i sensori con le ADL abbiamo definito un insieme di tag che rappresentano i sensori ad alto livello. A ciascun sensore sono state associate alcune di queste etichette in base alla propria semantica. Abbiamo in oltre definito cinque tag temporali per rappresentare le fasce orarie della giornata (mattina, mezzogiorno, pomeriggio, sera, notte). Utilizziamo le reti bayesiane per rappresentare la relazione tra le etichette ed ogni ADL. Abbiamo proposto due approcci per questo metodo: uno supervisionato e uno a-priori. Nel primo caso, la rete viene costruita utilizzando dati pubblicati, disponibili on-line; invece nel secondo caso, la rete è progettata a mano. Per prima cosa, abbiamo standardizzato i tre dataset utilizzati, rimuovendo gli errori e uniformando le attività riconosciute. Quindi abbiamo etichettato i dataset, associando i tag temporali e quelli dei sensori attivi. Abbiamo quindi usato le reti ottenute durante gli addestramenti, per testare questo metodo, classificando i dati di un nuovo dataset, generati da un simulatore.

La validazione dei risultati è stata eseguita utilizzando Precision, Recall e Accuracy. Come atteso, l’approccio a-priori classifica meglio i dati rispetto a quello supervision-ato, poiché, progettando manualmente la rete possiamo definire una rappresentazione più generale delle attività. Al contrario, creando la rete in modo automatico dai dati, le relazioni tra le attività e le etichette sono influenzate da errori e peculiarità contenute nei dati stessi.

(4)

(5)

According to data from the World Population Prospects, the world’s inhabitants are getting older. What is needed is a way to improve the quality of independent living of elderly people and their families. A solution is presented by Information and Communication Technology (ICT) which allows to build automatic health monitoring systems. In this thesis work we present a method to perform Activities of Daily Living (ADLs) recognition based on Bayesian Belief Networks, using the data provided by a

Smart Home.

We have defined a set of tags to relate the sensors with the ADLs, independently from the physical implementation of the sensors network. These labels provide a high level representation of each sensor. Few of these tags are associated to each sensor, according to its semantics. We have also defined five temporal tags to represent time slots of the day (Morning, Noon, Afternoon, Evening, Night). Bayesian Networks can be used to represent the relation between these tags and each ADL.

We have proposed a supervised and an a-priori approach of the method: in the first, the network has been trained using public datasets; instead, in the second it has been designed manually. First of all, we have standardized the three datasets used for the training, removing noise and making uniform the recognized activities. Then, we have labeled the datasets with the temporal and the active sensors’ tags.

The validation of the method has been performed by training the networks on these datasets and by classifying a new dataset generated by a simulator. The results evaluation has been performed using Precision, Recall and Accuracy. As expected, a-priori approach performs better than the supervised one, since by designing the network manually we obtain a more general activities representation. Conversely, by creating the network from the data, the relation between activity and tags is less general since it is influenced by noise and peculiarities of the training datasets. Concluding, we believe that the a-priori approach would work well to perform activities

(6)

recognition on different people using the same network. Instead, a supervised one would work better performing the training of the network and the classification on data of the same person.

(7)

Desidero innanzitutto ringraziare la Prof. Sara Comai per la sua guida, i suoi consigli e il suo sostegno durante questo lavoro. La ringrazio per avermi dato l’opportunit`a di lavorare su un progetto cos`ı interessante.

Ringrazio sentitamente il mio correlatore Fabio Veronese, sempre pronto a capire le mie difficolt`a, nel risolvere qualsiasi problema incontrato e ad offrirmi preziosi consigli.

Un ringraziamento a tutti i membri del gruppo ATG di Como, in particolare Fabio Salice, `e stato un piacere potervi conoscere e lavorare insieme a voi.

Un pensiero anche a tutti i colleghi di studio e progetto di questi cinque anni, con cui ho avuto la fortuna di confrontarmi.

Inoltre vorrei ringraziare tutti i miei amici, in paticolare Francesca e Pep`e, che sono stati una grande fonte di motivazione ed incoraggiamento.

Voglio dedicare questa tesi alla mia famiglia: mio padre Elio, mia mamma Lorenza e mia sorella Valentina. Grazie per avermi supportato nelle mie scelte e aver sempre creduto in me.

Un ringraziamento speciale a Silvia, per essermi sempre stata vicina in questi anni, specialmente nei momenti piu difficili. `E solo grazie al tuo aiuto se sono riuscito a raggiungere questo traguardo.

(8)

(9)

Sommario iii Abstract v Ringraziamenti vii 1 Introduction 1 1.1 BRIDGe Project . . . 2 1.2 Thesis contribution . . . 2 1.3 Thesis organization . . . 3 2 State of Art 5 2.1 Smart Home . . . 5 2.2 Activity Recognition . . . 7 2.2.1 Supervised Learning . . . 8 2.2.2 Unsupervised Learning . . . 11 3 Bayesian Network 13 3.1 Bayesian Network Structure . . . 13

3.1.1 Reasoning With Bayesian Network . . . 16

3.2 Bayesian Networks Applications . . . 20

4 BBNs for Activity Recognition 21 4.1 Available Datasets . . . 21

4.2 Filtering . . . 24

4.3 Labeling . . . 25

4.4 BBN Implementation . . . 28

(10)

4.4.2 Manual Structure . . . 29 4.5 Data Clustering . . . 30 4.6 Classification . . . 33 5 Experimental results 35 5.1 Validation Metrics . . . 35 5.2 Results . . . 36 5.2.1 ARAS Dataset . . . 36 5.2.2 Generated Dataset . . . 47 5.2.3 Expert Network . . . 55 6 Conclusion 69 6.1 Future Work . . . 70

(11)

2.1 An example of Smart Home configuration . . . 6

2.2 Aras HouseA configuration . . . 9

3.1 Example of Bayesian Belief Network . . . 14

3.2 Cancer Example of BBN . . . 17

3.3 Types of reasoning with Bayesian Belief Networks . . . 18

3.4 Credit card fraud example . . . 19

4.1 Kasteren house - House A . . . 23

4.2 Kasteren house - House B . . . 23

4.3 Kasteren house - House C . . . 24

5.1 Test 1 ARAS - Confusion Matrix . . . 38

5.2 Test 1 ARAS - Precision . . . 39

5.3 Test 1 ARAS - Recall . . . 40

5.4 Test 1 ARAS - Accuracy . . . 41

5.5 Test2 ARAS - Confusion matrix . . . 43

5.6 Test2 ARAS - Precision . . . 44

5.7 Test2 - Recall . . . 45

5.8 Test2 ARAS - Accuracy . . . 46

5.9 Test3 ARAS - Confusion Matrix . . . 48

5.11 Test3 ARAS - Recall . . . 50

5.12 Test4 ARAS - Confusion Matrix . . . 51

5.14 Test4 ARAS - Recall . . . 53

(12)

5.16 Test1 Generated dataset - Confusion Matrix . . . 56

5.17 Test1 Generated dataset - Precision . . . 57

5.18 Test1 Generated dataset - Recall . . . 58

5.19 Test1 Generated dataset - Accuracy . . . 59

5.20 Test1 Manual - Confusion Matrix . . . 61

5.21 Test1 Manual - Precision . . . 62

5.22 Test1 Manual - Recall . . . 63

5.23 Test1 Manual - Accuracy . . . 64

5.24 Test2 Manual - Confusion Matrix . . . 65

5.25 Test2 Manual - Precision . . . 66

5.26 Test2 Manual - Recall . . . 67

(13)

4.1 Example of BRIDGe dataset . . . 23

4.2 Mapping between activities . . . 26

4.3 Dataset labeled example . . . 26

4.4 Mapping between Sensor and Labels . . . 27

4.5 Mapping between temporal tags and time intervals . . . 28

4.6 Example of activities instance . . . 30

4.7 Example of CPT inferred from data . . . 31

4.8 Example of CPT manually . . . 32

5.1 Confusion matrix showing the true positives (TP), total of ground truth labels (TT) and total of inferred labels (TI) for each class. . . 35

(14)

(15)

ADL Activity of Daily Living AAL Ambient Assisted Living

ARAS Activity Recognition with Ambient Sensing

(project developed by the Department of computer engineering of Bogazici Uni-versity in Istanbul, Turkey)

BBN Bayesian Belief Network CPT Condition Probability Table

CPD Condition Probability Distribution CRF Conditional Random Field

DT Decision Tree

FKL Fisher Kernel Learning HA Home Automation KNN K-Nearest Neighbor HMM Hidden Markov Model

ICT Information and Communication Technology RF Random Forest

RNN Recurrent Neural Network SVM Support Vector Machine

(16)

SE Smart Environment SH Smart Home

TDNN Time Delay Neural Network TWNN Time Windowed Neural Network

(17)

Introduction

Every country has a growing in the number and proportion of older people in its population. Between 2015 and 2030, the number of people aged 60 or older is expected to grow by 56 percent, going to double its size within 2050. In the same way, the number of people aged 80 is growing even faster than the number of older people overall. Aging is becoming one of the most significant social transformations of the twenty-first century, with implications for nearly all sectors of society, including the demand services, such as housing, transportation and social protection, as well as family structures and intergenerational ties.

The number of people aged between 20 and 64 years per person aged 65 years or more declines as the population ages. This means that there will be more elderly people but less young people able to take care of them and, at the same time, assistance facilities for the elderly will not be able to cope with the demand. On the other hand, families want an economically viable alternative to nursing home to assist their relatives. Technological Advances, in Information and Communication Technology (ICT), could give an efficient and accurate care and reduce care cost, allowing to build inexpensive automatic health monitoring system.

Elderly would also maintain their independence, allowing them to live longer in their own homes. Independent living could be supported by transforming the living place of the older adult into a smart environment, equipping the residence with appropriate technologies (sensors, actuators, etc.). This will result in a high quality of life for the elderly and delay the transition to care facilities. BRIDGe, developed by Politecnico di Milano, is an Ambient Assisted Living solution, aimed to increase autonomous and independence of the person. A common method for assessing the cognitive and

(18)

physical wellbeing of elderly are Activities of Daily Living (ADL), i.e. activities performed daily, such as cooking, toileting and showering. Using Home Automation data collected by Smart Environment (SE), it is possible to study ADLs in order to monitor the behavior of patients/users.

1.1 BRIDGe Project

BRIDGe (Behavior dRift compensation for autonomous InDependent livinG) [1] developed by the ATG (Assistive Technologies Group) of Politecnico di Milano provides support service for a person living independently at home and connections to his or her social environment. This project allows to develop an Ambient Assisted Living (AAL) environment to provide support to a broad variety of target people (elderly, mild cognitive or physically impaired people), according to the user’s needs. The aim of BRIDGe is to create a system between the inhabitants and a social environment aimed at reassuring both inhabitants and their families; giving to the inhabitant the opportunity to live autonomously and safe, knowing that she/he is monitored during the day. BRIDGe, based on a set of home services that include a wireless sensor-actuator network and a flexible communication system, allows to recognize Activities of Daily Living(ADLs) aimed to monitor inhabitant’s behavior over time (Behavioral Drift Detection).

1.2 Thesis contribution

In this thesis, we use a methods based on Bayesian Belief Network to perform Ac-tivities of Daily Living recognition. Using this method we make decision on which activity is taking place based on the sensor data provided. At first we have parse available dataset to uniform the structure and we have labeled the data with a tags based on the sensor active. Tagging the data allow to relate activities with the sensors, independently from the physical implementation of the sensors network. Then, we will use and compare two approaches of the method: a supervised approach, in witch the network is trained using the labeled data, and a priori approach, with the network created manually. We dealing with some issue such as ambiguity and generalization of the activities, since activities can be performed at the same time and the same activity can be accomplished in different ways, depending on the person that perform

(19)

it.

Moreover, we will try to use this method to transfer learning, in order to use the knowledge extracted from the data, collected in a home, to make prediction on data of the same type, collected in a different home.

1.3 Thesis organization

Next chapters are organized as follows:

• In Chapter 2, we introduce previous works concerning the topics of this thesis work. In particular we present Smart Homes and Activity Recognition.

• In Chapter 3, we discuss Bayesian Belief Network the method that we use to perform activity recognition.

• In Chapter 4, we describe the characteristics of datasets publicly released and how we have manged them to be able to use. Moreover we discuss how we train the network, in particular describing two training methods: using a real dataset and manually.

• In Chapter 5: we evaluate the characteristics of such model using a real dataset not used for the training. We report results of various tests carried out describing them.

• In Chapter 6, we report conclusions about the used method and possible devel-opments of the presented work.

(20)

(21)

State of Art

In this chapter, we provide an overview of the related work in the area of activity recognition. In Section 2.1, we define the concept of Smart Home (SH), in particular, as a Health Care Systems and we describe the different sensing modalities that are used to perform activity recognition. In Section 2.2, we describe the main algorithms used in literature to solve the Activity Recognition problem.

2.1 Smart Home

A Smart Home (SH) is an environment equipped with highly advanced automation systems, such that all the devices are interconnected to form a network, which can communicate with each other and with the user to create an interactive space. [2]. Smart Homes perform both monitoring and control of the home activities for convenience and better comfort of the users. The aim of a SH is to anticipate and respond to the needs of the occupants and promote their comfort, convenience, security and entertainment. What make an house smart are the technologies that it contains and the ability to respond and modify itself, according to the changing needs of its users. The goal of equipping the home environment with technology is not just to automate all the tasks that are performed by the residents; but also to provide tools and services that empower and enable people themselves to address their social, rational, and emotional needs. Figure 2.1 shows an example of a Smart Home, developed in the context of a Sweet-Home project [3], equipped with several kinds of sensors (microphone, temperature, infrared, etc.).

(22)

Figure 2.1: An example of Smart Home configuration

can contribute to improve home life. The improvement is due to the increase of offered services within the house such as remote control, improvement of memory and recall with specific remainder, simulation of occupancy when the house is empty. In [3], the information coming from the user (e.g., voice order) or from sensors are transmitted to the intelligent controller that interprets them and reacts by modifying the environment based on the user need. Smart Home has also the potential to manage and reduce energy demand, tuning the energy system and allowing user to know constantly the consumption. In [4] the authors demonstrate how to use cheap and simple sensing technology to automatically save energy consumption. Theirs approach uses wireless motion sensors and door sensors to infer when occupants are away, active, or sleeping and turns off house systems as much as possible without sacrificing occupant comfort. Thanks to the ability to solve problems in many fields, we can observe an increased interest in Smart Homes.

Among all the possible applications of a Smart Home, we will focus our attention on the possibilities of using SH as Health Monitoring Systems, to monitor the health status of the resident. ADLs (Activity of Daily Living) are a way to describe the functional status of a person: healthy individuals need to be able to complete all the activities, such as eating, cooking, drinking, etc. Automating the recognition of activities is an important step toward monitoring the functional health of a person. A smart home is an ideal environment to perform continuous monitoring of the health status of a smart home residence, during the daily living.

It is possible to distinguish two types of health systems: Alarm Systems and Long-Term Monitoring Systems. In the first case, the system equipped in the apartment, is used to detect critical situation such as falls and other dangerous situations. The easiest alarm systems are buttons that send an alarm when used. However, to use

(23)

this type of system, the user should be able to recognize a dangerous situation and in addition should have the physical ability to push the button. In most of the study cases, the detection is done using different kinds of sensors or camera and microphones to detect an excessively long durations of stay in a room. On the other hand, long-term monitoring systems maintain information about the user over time and can therefore be used to track changes and detect anomalies in the health status of a person. The systems recognize ADLs, from sensor data, and how every activity is performed to determine possible health disorders.

The type of sensors is an important aspect in the design of activity recognition system. The two main categories of sensors are: body and environment sensors. The most used environment sensors are: cameras, microphones, contact switches for open-close states, pressure mats to measure sitting, mercury contacts for objects’ movement, passive infrared sensors to detect motion, float sensors to measure flushes, temperature sensors, humidity sensors and accelerometers. The commonly used body sensors are: accelerometer/gyroscope, blood glucose and pressure, ECG sensor, pulse oximetry, humidity and temperature sensor.

Three main factors must be considered when choosing the sensors to be installed: what the sensor measures, how intrusive the inhabitant perceives the sensors and the ease of installation.

In our case, the data used for Activity Recognition are collected from homes equipped only with unobtrusive sensors, such as motion sensors, pressure sensors, contact sensors, that are seen less intrusive than other types of sensor, as camera and microphones.

2.2 Activity Recognition

During the day, people perform common activities, called Activities of Daily Living (ADLs). Urwyler et al. [5] define ADLs as “Activities or tasks that people undertake routinely in their everyday life. ADLs are the essential activities a person needs to perform to be able to live independently, such as bathing, toileting, eating and sleeping”. Automating the recognition of Activities of Daily Living is an important step to monitor a smart home resident and improve his/her independence. Activity Recognition is the process that allows to map a sequence of state-change sensor events to a corresponding human activity [6]. To do this, we need consider some issues that make it difficult:

(24)

• Ambiguity: a single action primitive can occur during different activities. For example, the action primitive of opening the refrigerator door can occur during the activity cooking and during the activity of getting a drink. Or the action to be sitting at the table, can occur during the eating activity and during the reading activity. A single action primitive can be ambiguous with respect to the recognition of activities. Also, a sequence of observed action primitives can still be ambiguous with respect to the recognition of activities.

• Generalization: activities can be performed in a large number of ways accord-ing to the person’s habits. This means that it is hard to formulate a generalized description for an activity. This makes determining which sequence of action primitives corresponds to which activity difficult.

These issues make activity recognition a very challenging task. Data Mining and Machine Learning techniques can be used to build activity model which are then used as the basis for recognition. In the following we will report the most diffused techniques used for activity recognition differentiating between: supervised and unsupervised learning [7].

2.2.1 Supervised Learning

Supervised Learning [8] [9] [10] [11] is the task to fit a model that, given a set of samples each one composed of a pair of input (predictors) and output (response), relates the response to the predictor. The aim is to build a model that accurately predicts the response for future observations. Supervised Learning can be used to make prediction or understand the relation between inputs and outputs. However, this methods is expensive since it requires a huge amount of data.

Two classes of models exist: generative and discriminative. The former joins probabil-ity distribution of the samples (sensor data) and labels (activities); common generative methods are Hidden Markov Model (HMM) and Naive Bayes Classifier. The latter models conditional probability of the labels given samples; common algorithms in-clude Conditional Random Field (CRF), Support Vector Machines (SVMs), Random Forests(RFs) and Decision Trees (DTs).

Can Tunca et Al.[8], in ARAS (Activity Recognition with Ambient Sensing) project, collected and provided publicly data from two smart houses, instrumented using

(25)

Wire-Figure 2.2: Aras HouseA configuration

less Sensor Networks (WSNs), and they used the data to perform Activity Recognition. Data were collected for 30 days from two real houses, equipped with 20 sensors of different types and inhabited by two people of about 30 years old. Figure 2.2 shows the house setting.

The authors evaluated five different methods for Activity Recognition: KNN (K-Nearest Neighbor), DT (Decision Tree), HMM (Hidden Markov Model), MLP (Multi-layer Perceptron) and TDNN (Time Delay Neural Network). Classifiers were trained and tested with different portions of the datasets using leave-one-day-out cross-validation methodology. To measure the performances they used Precision, Recall, F-measure and Accuracy metrics, calculating the values for each class separately and then averaging them over all classes.

On average, HMM and TDNN performed better than the other three classifiers. The main limitation was that activities not performed every day tend to be confused with the frequent ones. In the same project, Ersoy et Al. [9] compared two methods for Activities Recognition, Hidden Markov model (HMM) and Time Windowed Neural Network (TWNN), on five different real-world datasets (two from ARAS [8] and three from Kasteren Project [12]). The analysis is done grouping the data in slices of 60 seconds and dividing the set in training and test, performing a one-day-left cross validation.

(26)

Since common metrics, such as Accuracy and F-Measure, do not consider the specific needs of human behavior analysis, such as recognizing duration, start time and fre-quency of the activities, they used a new methodology to evaluate how well a method recognizes the activities’ beginning, ending and duration. For the new evaluation methods they used sensitivity and specificity analysis based on time-slice performance. The result shows that HMM outperform TWNN. TWNN make errors during the recognition, splitting long activities performed by the users or recognizing activities not performed by the users.

Christian Debes et Al. [11] evaluated different methods for Activity Recognition: SVM, Random Forests (RF), HMM, and Fisher Kernel Learning (FKL) on three datasets. The first dataset, from Kasteren [13], contained data collected for 28 days in an apartment equipped with 14 sensors and inhabited by one 26 years old person. The second dataset was one of the CASAS project, containing data recorded over a period of two months using 34 sensors, with two people in a house. The third one contains data collected for one week in two independent households, equipped with 16 sensors. All the aforementioned methods needed a training phase and a test phase, requiring thus a large dataset. Authors evaluated classification using Time Slice accuracy and the Average Class accuracy metrics. The study concluded that hybrid methods, implementing kernel metric distances (such as FKL and RF), are superior than traditional generative methods such as HMM and its variants. Specifically, FKL showed the best performance on all the datasets with an average accuracy of 70%. HMM, conversely, was the worst with an average accuracy of 53%.

Narayanan C Krishnan [10] proposed a Sliding Window based approach to perform Activity Recognition in an on line fashion. Every sensor event is classified based on the information encoded in a preceding sensor events sliding window. Three approaches are discussed: Explicit Segmentation, SWTM (Time Based Windowing), SWMI (Sensor Event Based Windowing). The first tries to divide the data in segments so that each one corresponds to an activity. The second one divides sensors data in N segments of the same size. The third divides the sequence into windows containing equal number of sensor events. In the second and third case, each event in the window is weighted with respect to the last event, using SWTW and SWMI, to reduce the influence of events distant in time and from different functional areas, compared to the last event. The datasets were collected from three smart apartments for six months,

(27)

each inhabited by a person. The sequence of sensors events is divided into windows of equal numbers of events and then, one by one, transformed in a vector containing features, in order to capture its informations. All vectors are tagged with a label (activity) of the last sensor event, in window. This set of vectors is used as training

set of the classifier used to learn the activity models.

The performance of the Baseline method get worse, with increasing the number of events in the window. SWTW improves classification accuracy w.r.t. Baseline, considering a window of events within 64 seconds from the last. SWMI outperforms Baseline algorithm only on two datasets, however, the best performance is obtained combining SWTW and SWMI.

2.2.2 Unsupervised Learning

Unsupervised Learning [6] [14] aims to infer a hidden structure from unlabeled data. In contrast to Supervised Learning, we have only inputs measurements vector but no corresponding supervising output; it is not possible to fit a model to make prediction. The goal is to discover hidden relations among the input data. Common approaches to unsupervised learning include: Clustering methods (KNN, Hierarchical Clustering), Recurrent Neural Network (RNN). In activity recognition, Unsupervised methods are used to group input data in clusters that represent activities. Yongjin Kwon et al.[15] propose a method for human activity recognition, with sensor data, collected from smartphone sensors, even when the number of activities is unknown. D. Trabelsi et al. [16] use an unsupervised approach, for human activity recognition, from raw accelera-tion data measured using inertial wearable sensors. They combine an HMM-based model with the use of acceleration data acquired by the sensors. Their method is based on a version of HMM denoted MHMMR.

Unsupervised methods are used also in real time from streaming data to recognize common pattern and segmenting the data into classes on which to apply activity recognition. Cook D. at al. [6] discuss an unsupervised method to identify patterns in the observed data. They use an Activity Discovery algorithm to identify patterns in sensor data and than apply activity recognition on the identified segments.

This chapter gave an overview of related works on activity recognition and all the elements that make up such systems. We have discussed various methods for activity

(28)

recognition used in previous works. As we have seen there are same methods both supervised and unsupervised that were broadly used with a promising results. The most common supervised methods are: HMM, CRF, SVM, RF. Instead regarding unsupervised methods the most common are: Clustering and Neural Networks. In this thesis we will use a method called Bayesian Belief Network. This method has been used in many fields, such as: bioinformatics, medical diagnosis, speech recognition, video recognition and activity prediction. In our case, we will use it to perform activity recognition on a data collected in a smart environment. We use the network to represent the relation between information provided by the sensors and the activities performed. We will train the network using a supervised approach and a priori one. We validate the result obtained using the most common metrics used in literature: precision, recall and accuracy. We test our method on a data, not used for the training.

(29)

Bayesian Network

Bayesian networks (BNs) are Directed Acyclic Graphs (DAGs) where nodes represent variables (discrete or continuous) and arcs direct connections among them. Bayesian networks are often also referred to as Bayesian Belief Networks (BBN), or just Bayes Nets. BNs model both the causal dependencies among variables and quantitative strength of the connections. In this chapter, we will describe Bayesian Networks, theory beyond them and how to interpret the information encoded in a network. Moreover, We will explain how to model a problem with a Bayesian Network and the types of reasoning that can be performed.

3.1 Bayesian Network Structure

A Bayesian Network [17] is a type of Probabilistic Graphical Model (PGM), which can simultaneously represent a multitude of relationships between variables, in a system. The graph contains nodes, representing a set of variables X = {X1, ..Xi, ...Xn} from the

domain and directed arcs that connect the nodes. The arcs represent the relationships between pairs of nodes, of the type Xi → Xj, where Xi, Xj ∈ X . A node is a parent

of a child, if there is an arc from the former to the latter; it is an ancestor of another node if it appears earlier in the chain. Whereas traditional statistical models make a distinction between independent and dependent variables, Bayesian Networks do not make this distinction. This allows the researcher to carry out omni-directional inference to reason from cause to effect (simulation), or from effect to cause (diagnosis), all within the same model.

(30)

One of the main properties of BNs is their capability of encoding the notion of

Figure 3.1: Example of Bayesian Belief Network

causality. The arcs represent conditional dependencies between the nodes they connect. For instance, the arc from A to B in Figure 3.1 implies that node A causes node B. One of the conclusions of the notion of causality is that the directed graph in a BN must be acyclic (you cannot return to a node simply by following directed arcs), as a node cannot cause itself. Nodes can be either continuous variables or discrete variables. Common types of discrete nodes include: Boolean nodes, Ordered values and Integral values and each variable must take exactly one value at a time.

Bayesian Network should specify the conditional probability distribution (CPD) for each node, to quantify the relationships between connected nodes. If variables are discrete, the CPD is represented as a Conditional Probability Table (CPT). For each node we need to look at all the possible combinations of values of those parent nodes. Each such combination is called an instantiation of the parent set. For each distinct instantiation of parent node values, we need to specify the probability that the child will take each of its values. For example, if we consider Fig.3.1 the parents of NodeE are NodeA and NodeD and they take the possible joint values < T, T >, < T, F >, < F, T >, < F, F >. The Conditional Probability Table specifies in order the probability of NodeE for each of these cases, e.i. < 0.05, 0.02, 0.03, 0.001 >. Root nodes also have an asociated CPT containing only one row representing its prior probabilities. In our example (Fig.3.1), NodeA and NodeD are root nodes.

Bayesian networks requires the assumption of the Markov property: there are no direct dependencies in the system being modeled which are not already explicitly shown via arcs. In Figure 3.1, for example, there is no way for NodeA to influence NodeC except through NodeE.

Condition Probability Tablele Estimation Given a Bayesian Network structure, the Conditional Probability Tables (CPT) are typically estimated from the observed

(31)

frequencies in a training dataset. The Maximum Likelihood Estimate (MLE) method uses the counts from the dataset to estimate the distribution. For the root nodes, the MLE estimate the probability of each node Xi as the proportion between the number

of time that Xi appears in the dataset and the size of dataset :

PM LE(Xi) = Count(Xi)/N (3.1)

where, N is the size of the dataset. To estimate the conditional probability distribution, P (Xi|Xi−1), we can expand it with its definition:

PM LE(Xi|Xi−1) = P (Xi−1, Xi)/P (Xi−1) =

= (Count(< Xi−1, Xi >)/N )/(Count(Xi−1)/N )

= Count(< Xi−1, Xi >)/Count(Xi−1)

(3.2)

A serious problem arises computing the probability using MLE because the MLE assigns a zero probability to elements that have not been observed in the corpus. This means that it will assign a zero probability to any sequence containing a previously unseen element. One way to address these problems is to incorporate some prior knowledge to any sequence of elements, adding some small number λ at the beginning, before estimating the probability. This technique essentially give small probability of success, even to those events never observed in the dataset. This techniques is called Laplace Estimation.

Structure Learning Network structure can be built from human knowledge or it can be machine-learned from Data. In the first case, it needs a good understanding of the problem to represent. It is commonly used in theory for small examples or to model simple real situations, that do not need to model a complex network. The second case is the one commonly used in real situations since it allows to model complex networks with the possibility to discover the dependency among available data. Two families of methods are available for machine learn from data: Constraint-Based methods and Score-Based methods.

Constraint-based algorithms are based on the work of Pearl on causal graphical models and his Inductive Causation [18] algorithm which provides a framework for learning the DAG of a BN using Conditional Independence (CI) tests. A Common used test, proposed by Cheng et al. [19], is the Mutual Information (MI) test. They

(32)

used MI of two nodes Xi, Xj, i.e.: M I(Xi, Xj) = X xi,xj P (xi, xj) log P (xi, xj) P (xi)P (xj) ; (3.3)

as value to determine if the two nodes are related or not. Links are added or deleted according to the results of statistical tests. When I(Xi, Xj) is smaller than a threshold

the two nodes are independent.

Score-Based is based on a metric that measures the quality of candidate networks with respect to the observed data. The score-based algorithms are the process to assign a score to each candidate BN, typically one that measures how well that BN describes the dataset. Assuming a structure G, its score is Score(G, D) = P r(G|D). This kind of algorithms attempts to maximize this score. The above formula can be rewritten into a more convenient way using Bayes’ law:

Score(G, D) = P r(G|D) = P r(D|G)P r(G)

P r(D) (3.4)

Trying to assign a score to each possible candidate network, poses a considerable problem since the space of all possible structures is at least exponential in the number of variables. Application of heuristic optimization techniques can help to solve the problem. Bayesian Information Criterion (BIC) and Posterior Probabilities are typical choices of algorithms.

3.1.1 Reasoning With Bayesian Network

After introducing how to create the network structure and how to define the probability of each node, we focus on the usage of a Bayesian Network for reasoning and inference. Since Bayesian Networks provide full representations of probability distributions over their variables, we can make a reasoning in any direction of the graph. With Bayesian Networks, it is possible to perform three types of reasoning: diagnostic reasoning, predictive reasoning and intercausal reasoning. In the following we use Lung Cancer problem as example (Figure 3.2), to explain the three types of inference.

(33)

Figure 3.2: Cancer Example of BBN

child node to a parents nodes. For example (Fig. 3.3), a doctor observes Dyspnoea (Evidence) and then updates his belief about Cancer and that the patient is a Smoker (Result).

Predictive Reasoning occurs following the directions of the network arcs, from new information about causes (parents nodes) to new beliefs about effects (child node). In the example in Figure 3.3, the patient may tell his physician that he is a smoker; even before any symptoms have been assessed, the physician knows this will increase the chances of the patient having cancer.

Intercausal Reasoning occurs when there are exactly two possible causes of a particular effect. Initially, according to the model, these two causes are independent of each other. Even though the two causes are initially independent, with knowledge of the effect the presence of one explanatory cause renders an alternative cause less likely. In other words, the alternative cause has been explained away.

Bayesian Belief Network represents a complete probabilistic model of all the variables in a specific domain. Therefore, it contains all the information required to answer any probabilistic question about any variable in that domain, given the available observations or evidence. Once we have constructed a Bayesian network, we usually need to determine various probabilities of interest from the model.

Considering a BN consisting of N random variables, X1, X2, . . . , XN, the general form

of the joint probability distribution of the Bayesian Network can be represented as in Eq.3.5: P (X) = P (x1, x2, . . . , xN) = N Y i=1 (xi|x1, . . . , xi−1) (3.5)

(34)

Figure 3.3: Types of reasoning with Bayesian Belief Networks

conditional only on the values of its parent nodes, this reduces to: P (x1, x2, ..., xn) =

Y

i

P (xi|P arents(Xi)) (3.6)

provided P arents(Xi) ⊆ (Xi, ..., Xi−1)

For example, in the problem concerning Credit Card Fraud detection (shown in Figure 3.4, we want to know the probability of fraud given observations of the other variables. This probability is not stored directly in the model, and hence needs to be computed. In general, the computation of a probability of interest given a model is known as probabilistic inference. Because a Bayesian Network for X determines a joint probability distribution for X , we can use the Bayesian network to compute any probability of interest. For example, from the Bayesian Network in Figure 3.4 about the Credit Card Fraud, the probability of fraud given observations of the other

(35)

Figure 3.4: Credit card fraud example

variables can be computed as follows:

p(f |a, s, g, j) = p(f, a, s, g, j) p(a, s, g, j) = p(f, a, s, g, j) P f0p(f0, a, s, g, j) (3.7)

When all variables are discrete, we can exploit the conditional independences encoded in a Bayesian network to make this computation more efficient. In our example, given the conditional independences, Equation 3.7 becomes:

p(f |a, s, g, j) = Pp(f )p(a)p(s)p(g|f )p(j|f, a, s)

f0p(f0)p(a)p(s)p(g|f0)p(j|f0, a, s)

= Pp(f )p(g|f )p(j|f, a, s)

f0p(f0)p(g|f0)p(j|f0, a, s)

(3.8) The effect of the available evidence on any variable (hypothesis) may be ascertained by marginalizing the joint probability of the entire network to obtain the posterior probability of that variable. We will use the notation Bel(X) for the posterior probability distribution over a variable X:

Bel(X) =X yi Pr(X, yi) = X yi Pr(X | yi) Pr(yi) (3.9)

There is a fundamental assumption that there is a useful underlying structure to the problem being modeled that can be captured with a BN, i.e., that not every node is connected to every other node. If such domain structure exists, a BN gives a more compact representation than simply describing the probability of every joint instantiation of all variables. Sparse Bayesian networks (those with relatively few arcs, which means few parents for each node) represent probability distributions in a computationally tractable way.

(36)

3.2 Bayesian Networks Applications

Over the last years, Bayesian Networks have become a popular representation for encoding uncertain expert knowledge. BNs have been used in many fields obtaining well results, such as: bioinformatics, medical diagnosis, speech recognition, video recognition and activity prediction [20–23]. In the Biomedical field, Ronald Jensen et al. [24] have developed an approach using Bayesian Networks to predict protein-protein interactions genome-wide in yeast. Bayesian network is already used for activity recognition by video recognition. Anant Madabhushi et Al. [22] use an approach to track the movement of the head over consecutive frames, tomatch the sequences captured from a CCD camera against stored models of actions, to perform the activities recognition.

In Ambient Assisted Living, Bayesian Networks have been used to predict the next activity performed by the user, from the current activity. In the CRAFT project [23], Cook D. et al. propose an activity prediction approach that uses three features of the current activity (activity location, time of day, day of week) to predict the following activity. The CRAFT method is divided in two step: in the first step the three features associated with the next activity are predicted starting from the feature of the current activity. In the second step the next activity is predicted from the feature obtained in the previous step. Features used for prediction are obtained from the information received by the sensors in the smart home. In this case the system perform only a prediction of the next activity, in order to operate also as a reminder to prompt individuals to initiate important activities. However, in a home is not installed a network of sensors or other devices, that allow to perform activity recognition.

(37)

BBNs for Activity Recognition

Activity recognition is a challenging task: activities can be performed in different ways, the start and end time of the activities can be unknown and there can be noise in the observed sensor data. Probabilistic models, such as BBN, provide a way for dealing with the uncertainty caused by these issues. They are able to make decision about the activities performed by the user, based on the data provided by the sensors. In next sections, we will describe the datasets used to train the network and how we have processed them to remove noise and make the structures homogeneous. Moreover, we explain how the data are used to train the network and how it is used to perform activity recognition.

4.1 Available Datasets

In this section, we present the datasets that we used to train the network. Data are collected from a number of sensing ”nodes” equipped in a Smart Home that observe the action performed by a person. Many research groups collected and annotated data from their own test houses, equipped with sensors and inhabited by one or more persons, to record data and perform experiments. The data are composed by the streaming of sensors activations and the ground truth of the performed activities. We have decided to use many datasets to verify if the classification is more precisely and if it is possible generalize the classification, between data collected from different people. The datasets used are the following:

(38)

BRIDGe DATASET

In the context of the BIRDGe project we used a collection of data obtained in a real setting with a single inhabitant. The collection contains sensors data of 30 days and the id of activities performed. All the sensors used in this project, passive infrared, pressure sensor, door- and window-sensor status changes, are binary sensors. Those sensors know only two states: 0 (off, low, closed,false) and 1 ( on, high, open, true). Each file contains the sensors state, second by second, of one day and the id of the activities carried out by the inhabitant.

A second dataset are created using SHARON, a simulator of Activities of Daily Living (ADL), developed in the BRIDGe context [25]. Also in this case data are structured

as an array that contain the state of sensors and the id of the activity performed. A fragment of data is shown in Table 4.1: in the upper part of the figure we can see that 19th _{sensor, corresponding to a bed pressure sensor, is active and habitant is}

sleeping (activity with id 7); in the lower part of the figure we can see that 4th and 6th sensors (corresponding to Couch and Chair pressure sensor) are active and that the habitant is watching tv (activity with id 8).

ARAS

In the context of Activity Recognition with Ambient Sensing project (ARAS), data were collected in two houses with two residents, 2 males both aged 25, for one month each. The installation comprised 20 boolean sensors and 27 possible ADLs labeling. Activity labels are annotated for both of the residents thanks to devices placed in many strategic points in the house. In Figure 2.2 is depicted the configuration of the house. Dataset is available at the website of the project [26].

KASTEREN

Van Kasteren T.L.M. has collected data from three houses. Dara are freely available on the project’s website [27]. The first house was inhabited by a 26 years old man and was equipped with 14 sensors. The collection lasted 25 days during which the resident annotated 16 ADLs. The second house inhabited by a person of 28 old for 13 days. The house contained 22 motion sensors. The third is a house of two floors inhabited by a person of 57 years. This is the dataset that we used during the project. It contains data recorded from 21 sensors and labeled with 26 activities.

(39)

Second Sensors Status Activities ID 9265 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 0 9266 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 0 9267 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 0 . . . . . . . . 34269 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 34270 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0

Table 4.1: Example of BRIDGe dataset

Figure 4.1: Kasteren house - House A

Figure 4.2: Kasteren house - House B

Since datasets are collected in different projects’ contexts, the data structure and the type of data can be different. In the following we explain how we have filtered the data to create a uniform dataset.

(40)

Figure 4.3: Kasteren house - House C

4.2 Filtering

To be used for classification, data should be filtered to remove noise, due to error of the sensor or false activation by the users, and organized to have the same structure. We need also a mapping between activities since different projects can recognize different activities.

First of all, we take Kasteren datasets and we change the structure so as to be like the BRIDGe project. The data that we use are divided in time-slice, one for each second of the day, with the state of sensors, equipped in the home, and the id associated to the activity performed. Is is assumed that two activities can not be performed simultaneously. Since as described in the previous section (Section 4.1), this dataset has a different structure, we parse it to uniform the format.

Than we take Aras dataset and we split the data of each day in two part, one for each person, since, as described in previous section, ARAS dataset is collected in a house with two people.

In a second phase we standardize the activities that are recognized. BRIDGe project is able to recognize 13 activities: Other, Going Out, Breakfast, Lunch, Dinner, Cleaning, Sleeping, Toileting, Shower, Relax, Reading, Watching TV, Studying. However, other projects are able to recognize more activities. For example in Kasteren project are recognized activities such as: Brush Teeth, Take Medication, Get Snack, Put Items in Dishwasher. In ARAS project it is recognized Napping, Using Internet, Laundry, Talking on the Phone, Listening to Music, Having Guest, Changing Clothes. We have

(41)

Toileting, Shower, Relax, Reading, Watching TV, Studying, Going Out, Other. In addition we have chosen three sub-activities: Preparing Breakfast, Preparing Lunch, Preparing Dinner. In Table 4.2 is depicted an example of activities mapping, between ARAS activities and our main activities.

4.3 Labeling

Houses are spaces for daily living where the inhabitants perform several ADLs. House is organized in rooms or areas, devoted to specific activities: Sleeping is performed in Bedroom, Having Shower are performed in the Bathroom, Breakfast in Dining-room, etc. Moreover appliances, furniture or tools are fundamental entities involved in some of the ALDs: the activity performed by a person seated on a chair or sofa is indeed different. There are activities that change depending on the time of the day they are performed. For example ”eating” can be Breakfast, Lunch or Dinner based on the time of the day in witch is performed. Time, space and tools are core aspects to describe ADLs and identify them. In addition, although the areas and rooms of the houses are commonly the same, the physical implementation of each one varies depending on the specific case. Since we would like that our model be as general as possible, for every home and sensors configurations, using directly the sensor’s data would not give good results. What we need is a way to relate the sensors data whit the ADLs. We found the solution by defining a set of tags to describe the knowledge domain. An example can be: in case of a house that includes a fridge opening sensor we can associate ”Food” and ”Kitchen” tags, for a pressure mat under the bed we associate tags ”Bed” and ” Bedroom ”. Adding tags, as in the example, to each sensor we can divide physical structure of the sensors network from information that they provide. The Tags that we have identified are the following: Living, Seat, TV, Kitchen, Food, Hall, Water, Shower, Toilet, Bedroom, Bed, Morning, Noon, Afternoon, Evening, Night. An example of association between tags and sensors, of ARAS project, are reported in Table 4.4.

Labeling of the dataset is performed by a method that parses all file of the dataset, that have been filtered, as described in Sec 4.2. Each file contains the status of sensors, second by second, of one day. We read the active sensors, of one second, and we add the tags associated with the sensor, reading the mapping from a mapping file (as described in Table 4.4). To each line we associate also a temporal tags (Morning,

(42)

ARAS Activity Activity Mapping

Other Other

Going Out Going Out

Preparing Breakfast Breakfast Having Breakfast Breakfast

Preparing Lunch Lunch

Having Lunch Lunch

Preparing Dinner Dinner

Having Dinner Dinner

Washing Dishes Other

Having Snack Other

Sleeping Sleeping

Watching TV Watching TV

Studying Studying

Having Shower Having Shower

Toileting Toileting

Napping Relax

Using Internet Relax

Reading Book Reading

Laundry Cleaning

Shaving Toileting

Brushing Teeth Toileting Talking on the Phone Other

Listening to Music Other

Cleaning Cleaning

Having Conversation Other

Having Guest Other

Changing Clothes Other

Table 4.2: Mapping between activities

Second Sensors Status Activities ID Tags

9267 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Night 9268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Night 34269 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 Living TV Seat Morning 34270 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 Living TV Seat Morning

(43)

Sensor ID Sensor Name Tags

1 Wardrobe Bedroom

2 Couch Living, Seat

3 TV Receiver Living, TV

6 Chair Living, Seat

7 Chair Living, Seat

8 Fridge Kitchen, Food

9 Kitchen Drawer Kitchen, Food

10 Wardrobe Living

11 Bathroom Cabinet Toilet

12 House Door Hall

13 Bathroom Door Toilet

14 Shower Cabinet Door Toilet, Shower

15 Hall Hall

16 Kitchen Kitchen

17 Tap Toilet, Water

18 Water Closet Toilet

19 Kitchen Kitchen, Food

20 Bed Bedroom, Bed

(44)

Time Slot (Hours) Temporal Tags 6:30 - 11:30 Morning 11:30 - 14:00 Noon 14:00 - 18:00 Afternoon 18:00 - 00:00 Evening 00:00 - 6:30 Night

Table 4.5: Mapping between temporal tags and time intervals

Noon, Afternoon, Evening, Night) to classify the time of the day. Table 4.5 shows the mapping of temporal tags and time intervals. An example of labeling is show in Table.4.3.

4.4 BBN Implementation

We have explained in Chapter 3 that a Bayesian Network is implemented as a Directed Graph, consisting of nodes and arcs that link them. BBN do not distinguish between nodes types, in our case the nodes are Activities and Tags and links are the relations between them. To learn the network’s structure and the Conditional Probability Tables (CPT), associated to each node, we have used and compared two methods:

1. In the firsts, we learn the structure directly from data, using techniques described in Section 3.1.

2. In the second, we implement the structure manually follow our knowledge and weighting the connection only with high or low probability.

4.4.1 Inference Structure

In the first case, the structure is learned directly from the datasets. We have imple-mented a method that create the network’s structure and CPT from the data.

• First of all, the method counts for each activity how many time each tag appears with the activity. Since dataset may contain error and noise, to avoid that labels and activities that appear rarely will be linked together, we have set a threshold on the relations. Activities and tags that have the count smaller of the threshold are not linked in the network. With this first step we have created the network’s structure.

(45)

• In a second step, we learn the Condition Probability Table (CPT) for each node. We have used two different way to estimate the probability of each node.

1. For those that are root, the CPT table contain only two probabilities, that are a priori probabilities to be ”True” or ”False”. In our case, root nodes are the labels nodes and their probabilities are estimated as the ration between the numbers of time each labels appear in the dataset over the size of the dataset.

2. Instead, for those nodes that are not root, in our case are activities nodes, CPTs are estimated by taking from the network’s structure the labels asso-ciated to the activity and than counting how many time each combinations of these labels appears in the dataset with the activity id. In this case the methods take each labeled file and read it, line by line. For each activity instance, it calculate the duration of the activity and save the associated labels, each taken once.

In Table 4.6 its is show an example of Sleeping and Toileting instance. Sleeping start at 10174th _{and ends at 29881}th _{with a duration of 19707}

seconds. The associate labels, each taken only once, are: Bed, Bedroom, Night, Morning. Toileting start at 29882th _{and ands at 31821}th _{with a}

duration of 1939 seconds. The associate labels, each taken only once, are: Toilet, Morning.

The labels are converted in a numerical key by methods that make a unique key for each possible combinations of labels. Activity’s id and key are used as a row and column index of a matrix. For each instance of the activity is incremented the corresponding cell of the matrix with the total duration of the activity instance.

When all the files are parsed, reading the matrix it is possible create the condition probability table for each activity.

In this way, we have learned the CPT of all the activities in automatic way. In next section we will describe the second methods used to compute the CPTs.

4.4.2 Manual Structure

In this case, we have implemented the relations between nodes manually. We have created a file with the structure of the network. In the file we have set for each

(46)

Second Sensors Status Activities ID Tags

10174 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Night 10175 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Night 10176 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Night

. . . .

29880 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Morning 29881 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 7 0 Bed Bedroom Morning 29882 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 11 0 Toilet Morning 29883 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 11 0 Toilet Morning 29884 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 11 0 Toilet Morning . . . . 31820 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 11 0 Toilet Morning 31821 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 11 0 Toilet Morning

Table 4.6: Example of activities instance

activity the relation with each label or group of labels with the probabilities of the relation. The probability is only or a high probability (0.99) or a low probability (0.01). After the creation of the file we have created the CPT for each activity. The methods read, from the file named Structure Manual, all the activity-labels relations with the corresponding probability. After reading, we can create for each activity the file with the CPT. The method reads the structure of the net and for each activity creates the corresponding table with 2N combination, where N is the number of labels associated with the activity. For the combination of labels that appears in the file, the probability that activity is true given this combination is set to 0.99. For the remaining possible combinations the probability that the activity is true is set to 0.01. An example of CPT with manual implementation is showed in Table 4.8.

4.5 Data Clustering

Once the network has been created, we can perform recognition on new data. Data used to recognize the activity should be clustered. The method used for the clustering has been developed in the context of the BRIDGe project. Each file of the dataset is clustered independently. Clustering of the data is performed as follows:

• Labeled data are grouped into intervals of 20 minutes and merged together. The groups obtained are clustered by similarity.

(47)

Activity Labels Probability Sleeping Bedroom Bed Morning Noon Night

t t t t t t 0.0651 t t t t t f 0.0 t t t t f t 0.0 t t t t f f 0.0 t t t f t t 0.0 t t t f t f 0.0 t t t f f t 0.0 t t t f f f 0.0 t t f t t t 0.0 t t f t t f 0.0 t t f t f t 0.0 t t f t f f 0.0 t t f f t t 0.0 t t f f t f 0.0 t t f f f t 0.0 t t f f f f 0.0 t f t t t t 0.7957 t f t t t f 0.0662 t f t t f t 0.0725 t f t t f f 1.0 t f t f t t 0.0 t f t f t f 1.0 t f t f f t 0.0 t f t f f f 0.0005 t f f t t t 0.0 t f f t t f 1.0 t f f t f t 0.0 t f f t f f 1.0 t f f f t t 0.0 t f f f t f 1.0 t f f f f t 0.0 t f f f f f 1.0

(48)

Activity Labels Probability Sleeping Bedroom Bed Morning Noon Night

t t t t t t 0.9999 t t t t t f 0.9999 t t t t f t 0.9999 t t t t f f 0.99 t t t f t t 0.0 t t t f t f 0.001 t t t f f t 0.999 t t t f f f 0.99 t t f t t t 0.0 t t f t t f 0.0 t t f t f t 0.0 t t f t f f 0.0 t t f f t t 0.0 t t f f t f 0.0 t t f f f t 0.0 t t f f f f 0.0 t f t t t t 0.0 t f t t t f 0.0 t f t t f t 0.0 t f t t f f 0.0 t f t f t t 0.0 t f t f t f 0.0 t f t f f t 0.0 t f t f f f 0.0 t f f t t t 0.0 t f f t t f 0.0 t f f t f t 0.0 t f f t f f 0.0 t f f f t t 0.0 t f f f t f 0.0 t f f f f t 0.0 t f f f f f 0.0

(49)

• For each cluster, we computed the probability of each label contained. Labels probability is the ratio between number of seconds with which the label is present in the cluster and the total number of seconds of the cluster

An example of a clusters are:

Cluster26: {’Food’: ’1.0’, ’Kitchen’: ’1.0’, ’Noon’: ’1.0’, ’Hall’: ’1.0’} Cluster27: {’Seat’: ’0.02’, ’Living’: ’0.96’, ’TV’: ’0.94’, ’Morning’: ’1.0’}

Cluster28: {’Evening’: ’1.0’, ’Seat’: ’1.0’, ’Living’: ’1.0’}

Cluster29: {’Food’: ’0.99375’, ’Kitchen’: ’0.99375’, ’Noon’: ’1.0’, ’Hall’: ’1.0’} Cluster30: {’Bed’: ’1.0’, ’Evening’: ’1.0’, ’Bedroom’: ’1.0’}

Cluster31: {’Toilet’: ’1.0’, ’Water’: ’1.0’, ’Morning’: ’1.0’}

In this case, cluster 30 for example contains three label with probability of 1.0. This means that all the elements contained in cluster 30 are tagged with labels Toilet, Water, Morning. Labels and probabilities contained into the cluster are used as evidence during classification. In next section we explain how classification is performed by the Bayesian Network.

4.6 Classification

Classification is performed by the classifier taking as input the files containing clusters that can be classified. As described in the previous section, each cluster contains a set of labels and the probability of each label. We use the labels probability, of each cluster, as prior knowledge during classification. After we have read the cluster, the classifier computes for each activity the probability that the cluster can be classified with that activity. Classification is divided in four step:

1. First of all, we read one at a time each cluster and we extract the labels contained and the corresponding probabilities. For each label we modify the CPT file with the probability read in the cluster.

2. Classifier compare labels in the cluster with the labels associated with the activity in the structure file. For the labels that are both in cluster and activity structure we set the probability equals to the one in cluster. For the labels that are in the activity’s structure but not in the cluster, we set the probability as the one in the label’s CPT.

(50)

3. We take the probability of the activity corresponding to the labels configura-tion. If the label in the activity structure is in the cluster, label is set to True, otherwise is set to False. Give the configuration, from the CPT of the activity, we find the corresponding probability.

For example, if we consider the following cluster

Cluster1: {’Food’: ’1.0’, ’Kitchen’: ’1.0’, ’Noon’: ’1.0’}

and activity Breakfast. Reading from structure file, Breakfast is associated with labels: Living, Seat, TV, Kitchen, Food, Hall, Morning, Noon. In this case, CPT of the three labels (Food, Kitchen, Noon) are modified setting the probabilities founded in the cluster and we obtain the configuration ’FFFTTFFT’.

Once all the probabilities are retrieved, the algorithm computes the total proba-bility of the activity.

4. After that the classifier has computed the probability of each activity, cluster is classified with the activity that has the maximum probability.

An example of classification can be as follow:

Cluster26: PreparingLunch 0.6461916912 Cluster27: WatchingTV 0.0074381553143058965 Cluster28: Dinner 0.7181160866319953 Cluster29: PreparingLunch 0.6381395369229376 Cluster30: Sleeping 0.46439812123848007 Cluster31: Toileting 0.4110190766266219

We have set a threshold of 1%, so that if the probability of each activity is less than a minimum value the cluster is classified as Other activity. Instead, if the probability is zero for every activity, cluster is classified as Unknown.

(51)

Experimental results

Our tests were performed using the datasets introduced in Section 4. In each test, we used one dataset to train our system. The evaluation of the method concerned the classification of the clusters and the comparison of the results obtained with the ground-truth ADLs included in the dataset. The evaluation was carried out exploiting Precision, Recall and Accuracy. In the following sections we analyze these performances metrics and we discuss the results of classification.

5.1 Validation Metrics

In order to verify correctness and significance of the proposed method we performed a validation process. The expected outcome of the validation process is the quantified level of agreement between data and model classification. The evaluation metrics can be calculated using the confusion matrix. An example of confusion matrix is shown in Table 5.1. The rows show the ground truth activities as provided by a human annotation, while the columns show the activities inferred by the model. The diagonal

Infered

Activity 1 Activity 2 Activity 3 Ground Truth Activity 1 T P1 12 13 T T1 Activity 2 21 T P2 23 T T2 Activity 3 13 32 T P3 T T3 T I1 T I2 T I3 Total

Table 5.1: Confusion matrix showing the true positives (TP), total of ground truth labels (TT) and total of inferred labels (TI) for each class.

(52)

of the matrix contains the true positives (T Pi) – all the elements of i-th activity

classified correctly – while the sum of a row gives us the total of ground truth activity (TT), and the sum of a column gives us the total of activity instances classified (TI). Considering only one activity, we can define Precision as the fraction of the classified instances that are correct (Equation 5.1); instead Recall is defined as the fraction of instances of the activity that are classified (Equation 5.2). Accuracy measures the ratio of correct predictions to the total number of cases evaluated. We calculate precision, recall and accuracy for each activity separately.

P recision = # true postive

# true positive + # f alsepositive (5.1)

Recall = # true postive

# true positive + # f alse negative (5.2)

Accuracy = # true postive + # true negative

# true positive + # f alse negative + # true negative + #f alse positive (5.3)

5.2 Results

To test the methods we have used different datasets, as described is Section 4. The results, that we have obtained, are very different based on the data used to train the network. In the following, we will describe the results obtained in each test.

5.2.1 ARAS Dataset

The first test is run training the network using ARAS dataset, containing data collected in 30 days from two people, and we tested the network classifying a generated dataset of 89 days. The network was trained to recognize only the main activities Breakfast, Lunch, Dinner, Cleaning, Sleeping, Toileting, Shower, Relax, Reading, Watching TV, Studying, Going Out, Other. In the second test, we used again the entire dataset for training the network but in addition to the main activities, we recognized also the three sub-activities defined in Section 4.2: Preparing Breakfast, Preparing Lunch, and Preparing Dinner.

(53)

An overview of the results of the first test is depicted in Figure 5.1. We can notice that only Breakfast, Sleeping and Toileting, respectively with 86%, 100% and 58% Recall, as shown in Figure 5.3, are well classified. However, as depicted in Figure 5.2, the precision of Breakfast and Toileting are very low, approximately of 22% and 10%. Indeed, as we can see in Figure 5.1, Breakfast captures many instances of Lunch, Cleaning and Watching TV, whereas Toileting includes instances of Reading, Cleaning, Lunch and Going Out. The remaining activities are marginally recognized or in many cases incorrectly classified as other activities: Watching TV is confused with Breakfast or Lunch and only the 17% are correctly classified. Since these activities are performed in same ways and available sensors do not allow to know precisely when a person is watching TV, it is difficult distinguish among these activities. Furthermore, Watching TV captures the 77% of Dinner and 55% of Relax instances; for this reason its precision is approximately 1% and its accuracy is 75%, as shown in Figures 5.2 and 5.4. We can see that Lunch and Cleaning are classified incorrectly, both for a 35%, as Breakfast. Lunch is confused probably because the same activity is performed in two different ways in the training dataset (where it happens late in the morning) and in the one used for classification (happening early in the morning). Indeed, watching the ARAS dataset we can see that activity labeled as Breakfast is performed at Noon. A similar problem occurs with Cleaning activity because it is performed always in the Kitchen in the same way of Breakfast. Since the number of instance of Cleaning, in the dataset, are less compared to those of Breakfast, there is no way to correctly classify this activity. The 53% of Having Shower is unrecognized and classified as Other, while a 15% is confused with Toileting: only the 8% is correctly classified.

The main cause of those results can be found in the fact that the training dataset came from a different person with respect to the one classified. Moreover, dataset used for training contains noise in the data. One of the assumptions, it is that two activities cannot be performed at the same time. However, this is not completely true, in most of the cases two activities are perfectly separable, for example Breakfast and Toileting or Dinner and Shower, in few case, two activities are performed activating the same sensors, making more difficult to distinguish. them, as in case of Sleeping and Reading.

The second test is run training the network to recognize, in addition to the main activities also Preparing Breakfast, Preparing Lunch and Preparing Dinner. As depicted in Figure 5.5, activities are recognized, on average, better than in the