A neural network for the identification of tortoises' nesting phase

(1)

Dipartimento di Informatica

Corso di Laurea Specialistica in Informatica

A neural network

for identification of tortoises

nesting phase

Tesi di Laurea Specialistica in Informatica

Candidata:

Rita Pucci

Relatore:

Prof. Alessio Micheli

Relatore:

Prof. Roberto Barbuti

Relatore:

Prof. Stefano Chessa

Controrelatore:

Prof. Umberto Barcaro

11 Ottobre 2013

(2)

A mamma e Marina e a babbo

(3)

In last years, the environment pollution caused a global reduction of natu-ral assets (plants and animal biodiversity). Different animal species are in danger, among which tortoises are one of the most threatened ones. Both marine, terrestrial and pond species are classified as vulnerable or critically endangered in the red list of threatened species provides by International Union for Conservation of Nature (IUCN).

Our aim is to support the population growth of tortoise, to reduce their pos-sible extinction. A common approach that is used to protect and enhance wild population of tortoise, consists in localizing the nesting sites, in col-lecting eggs, letting them hatch in captivity, and in keeping the specimens in there condition during the first year of their life. The female tortoise cov-ers the nest after the deposition phase in order to make difficult the exact identification of nest location. To allow a timely and easy localization of the nesting sites, it has been proposed Tortoise [1] , a sensor based system ca-pable of localizing tortoises in the early stages of their eggs deposition phase through movement signal analysis, and transmitting their geographic coor-dinates to a remote control center in real time, through long-range wireless communication.

This thesis is part of this project studying the sensor data in order to find discriminating factors between the nesting movement signals and the other movement signals. This research realizes a data analysis to understand the best method to recognize the dig movements. As a matter of fact tortoise makes a characteristic movement during the nesting phase that can be identified by a periodic signal similar to square signal. Using machine learning we can discern dig signal from others. We implemented a feed forward neural network that can predict if a signal is a nesting one with a good probability. By the use of this computational intelligence mechanism we can monitor the life of the tortoise and increase the life expectancy of hatchlings.

Researches on the animal life control are the main way to prevent the animal extinction. The Charles Darwin Research Centre 1 _{through the}

ecological monitoring, studies the animal life and the relationship between animal and the biodiversity. Tortoises protection is also proposed for the European Union through the LIFE+2 _{program,which is aimed at funding}

research activities targeted to maintain and increase biodiversity.

1_{http://www.darwinfoundation.org/en/science-research/marine-research/} 2_{http://ec.europa.eu/environment/life/funding/lifeplus.htm}

(4)

1 Introduction 9

1.1 Ambient intelligence . . . 12

1.2 LIFE+ _{. . . .} ₁₃

2 Background 15 2.1 Localization system . . . 15

2.2 Wireless sensor networks . . . 16

2.2.1 Sensor Board Device . . . 18

2.3 Normalization . . . 20

2.4 Smoothing data and filtering . . . 21

2.4.1 Moving average filter . . . 21

2.5 Tortoise localization system . . . 22

2.5.1 Correlation analysis. . . 22 2.5.2 Machine Learning . . . 25 3 Datasets 38 3.1 Data Collection . . . 38 3.1.1 Registrations . . . 40 3.1.2 Data Analysis . . . 41 3.2 Datasets construction . . . 42 3.2.1 Normalization Data . . . 44 3.2.2 Normalization reductive . . . 47 3.2.3 Filtering . . . 48 3.2.4 Down sampling . . . 50

4 Learning Methods and experimental results 52 4.1 Signal Processing: Correlation . . . 52

4.1.1 Correlation on original signals . . . 54

4.1.2 Correlation on Normalization [0,1] datasets . . . 55

4.1.3 Correlation on Normalization by minimum subtrac-tion datasets . . . 57

4.1.4 Correlation on Mean Normalization datasets . . . 59

4.1.5 Observations . . . 60

4.2 Machine Learning: artificial neural network. . . 62

4.2.1 Input data . . . 62

4.2.2 Task and classification rules . . . 64

4.2.3 MultiLayer Perceptron (MLP) . . . 68 4

(5)

4.2.4 Convolutional Neural Networks (CNN) . . . 73

4.2.5 CNN with weight sharing [2] . . . 78

4.2.6 Final test module . . . 81

4.3 Validation and final results . . . 82

5 Conclusions 98 A Other Experimentals results 101 A.1 MLP . . . 101

A.2 CNN . . . 103

(6)

1.1 Fences . . . 11

2.1 MTS310 . . . 18

2.2 Sensor Board Architecture . . . 19

2.3 Samples of signals . . . 23

2.4 Samples of signals . . . 24

2.5 Learning with teacher . . . 27

2.6 ANN model: Perceptron . . . 28

2.7 Backpropagation algorithm . . . 31

2.8 Multilayer Perceptron . . . 32

2.9 Ipunt Delay Neural Nerwork (IDNN) . . . 35

2.10 IDNNSLide . . . 36

2.11 CNN . . . 37

3.1 Position of localizer . . . 38

3.2 Position during the data collecting . . . 39

3.3 SensorUse . . . 41

3.4 Data recorder . . . 41

3.5 Samples pattern of training set . . . 43

3.6 Characteristic dig pattern . . . 44

3.7 Original signals . . . 45

3.8 Min Max normalization. . . 46

3.9 Mean normalization. . . 46

3.10 Reductive normalization . . . 47

3.11 Outliers filtering. . . 48

3.12 Noise filtering . . . 49

3.13 Applying down sampling . . . 51

4.1 corr . . . 53

4.2 Target patterns dataset . . . 54

4.3 Correlation with normalized between[0,1] dataset: an exam-ple with walk signal . . . 55

4.4 Correlation with normalized between[0,1] dataset: an exam-ple with dig signal . . . 56

4.5 Correlation with normalizated subtracting minimum dataset: an example with walk signal . . . 58

(7)

4.6 Correlation with normalizated subtracting minimum dataset:

an example with dig signal . . . 58

4.7 Correlation with mean normalization dataset: an example with walk signal . . . 59

4.8 Correlation with mean normalization dataset: an example with dig signal . . . 60

4.9 Sliding window for IDNN . . . 65

4.10 Neural Network Output meaning . . . 66

4.11 Neural Network output . . . 67

4.12 Selection value for average . . . 68

4.13 Selection value over the threshold . . . 68

4.14 Neural Network architecture . . . 69

4.15 Sliding of window on signal . . . 70

4.16 Output with Min Max normalization . . . 71

4.17 Output with mean normalization . . . 72

4.18 Output with reductive normalization . . . 72

4.19 Characteristic pattern . . . 74

4.20 CNN architecture graph . . . 75

4.21 Example of sub sequences on two different patterns . . . 75

4.22 Output with Min Max normalization . . . 76

4.23 Output with Mean normalization . . . 77

4.24 Output with reductive normalization . . . 78

4.25 Characteristic pattern . . . 79

4.26 CNN with weight sharing Graph . . . 79

4.27 Output with min max normalization . . . 80

4.28 Output with Reductive normalization . . . 81

4.29 MSE learning curve with MLP . . . 85

4.30 Cross validation MLP (base model) . . . 86

4.31 Histogram negative threshold . . . 87

4.32 Histogram negative threshold . . . 87

4.33 Compare sample pattern with real pattern . . . 89

4.34 Cross validation CNN . . . 90

4.35 CMSE learning curve with CNN . . . 90

4.36 MSE learning curve with CNN with weight sharing . . . 91

4.37 Cross validation CNN with weight sharing (Optimized model) . . . 92

4.38 Validation MLP- Threshold values . . . 93

4.39 Validation CNN with weight sharing- Threshold values . . . 95

4.40 Validation CNN with weight sharing- Threshold values . . . 95

4.41 Test values . . . 96

A.1 Models presented for mean normalizations . . . 101

A.2 Models presented for min max normalization . . . 102

A.3 Models presented for reductive normalization . . . 103

A.4 Models presented on CNN . . . 104

(8)

3.1 Index Form Walk phase . . . 42

3.2 Index Form Digging phase . . . 42

3.3 Index Form Eating phase. . . 42

3.4 Range value of datasets without transformation . . . 45

3.5 Range value of datasets transformed by mean normalization 46 3.6 Range value of datasets transformed by reductive normalization 47 4.1 Means of correlation value with long signal normalized be-tween [0,1] . . . 56

4.2 Means of correlation value with long signal normalized sub-tracting minimum . . . 57

4.3 Means of correlation value with long signal mean normalized 59 4.4 Memory occupation of input data . . . 64

4.5 Memory occupation . . . 74

4.6 Memory occupation . . . 79

(9)

(10)

(Acknowledgment)

Alla fine di questa tesi ci tengo a ringraziare molte persone che mi hanno aiutata o semplicemente ci sono state durante la mia carriera universitaria. In primo luogo voglio ringraziare il Prof. Micheli che durante tutto l’anno di studio e sviluppo della tesi mi ha sostenuta ed incoraggiata nel migliorare il mondo Pucciano rendendolo un po’ pi´u rigoroso.

Grazie al Prof. Chessa e al Prof. Barbuti per avermi dato i mezzi per poter effettuare la ricerca e buoni consigli per proseguirla.

Voglio ringraziare mia madre che ha creduto nella mia voglia di studiare con la forza che la contraddistingue sempre.

Un ringraziamento speciale a mia sorella che mi ha sostenuta in tutto e mi ha aiutata nella stesura della tesi con il suo massimo impegno.

Un grandissimo grazie a Federico che in questo anno si ´e preso intera-mente il peggio di me ogni giorno senza pretendere nulla.

Ed infine, ma non da meno, un grande ringraziamento a quei Tutti amici e amiche sempre vicini, ognuno a suo modo, ognuno con i suoi punti di vista, ognuno con i suoi consigli e le sue idee ma comunque un Tutti che ha fatto la differenza.

Grazie per tutto a tutti e godetevi la vita.

(11)

Introduction

The first year of life of tourtoise hatchlings is the most dangerous in their life due to the numerous predators attacking both the eggs and the hatch-ling. It is necessary to localize the nesting sites in order to protect them. The project Tortoise@ deals with this specific issue and has developed a sytem to monitor the environment conditions of the tortoises1_{. This}

sys-tem use the wireless sensor board with accelerometer sensor, environment sensors, a Geographic localization system (GPS) and a radio. The size and weight of the device are very limited because this sensor board is attached to the carapace of the tortoise. The computational intelligence mechanism collect the signals of accelerometer sensor and classifies the movements: a nesting movement or any other movement. The mechanism has to make an event detection issue, the localization module is developed to identify the tortoise’s nests. The project Tortoise@ has been divided in four stages, which contribute in finding the nests.

• Environmental monitoring stage:

during this stage the environment is monitored continuously to indi-viduate the best values of temperature and light for a nesting phase. The temperature indicated for the nesting phase is between 20◦ C and 26◦C. During the period at Massa Marittima we could detect the best hours for the deposition. These are from 8 to 11:30 AM and from 6 to 9 PM. Whenever external conditions are appropriate for deposition, the system enters the Movements Monitoring stage

• Movement Monitoring Stage:

The accelerometer is activated and begin the monitoring of the tor-toise movements. The system stored five minutes of movement signals. This time interval allows the system to establish whether the tortoise is in movement or in rest. In this stage we made a first classification of the movement, if the signal is classified as a nesting signal the sys-tem switches to the Extended Movement Monitoring Stage, otherwise returns to the previous stage.

1_Prof. _{Roberto Barbuti, Prof.} _{Alessio Micheli, Prof.} _{Stefano Chessa and Prof.}

Giuseppe Anastasi submit this invention to the ”Camera di commercio Industria, Arti-gianato e Agricoltura di MILANO” on the 21st December 2011.

(12)

• Extended Movement Monitoring Stage:

This stage verifies if the tortoise is actually in the nesting phase. Each ten minutes a signal of the duration of five minutes are stored and classified on it. This period of time is the useful interval to have ap-proximately six repetitions of the periodic values, that characterized the right and left paw movements. The neural network makes three classification every ten minutes and then decides whether is a real nesting phase or not. The robustness and the universal approxima-tion property of neural network provide the flexibility of the approach for the approximation of arbitrary classification functions from ex-perimental data even without a theory of the pattern characteristics. This features is useful if we analyse noise signals recorded in natu-ral environment. In this situation the neunatu-ral network has to classify without knowledge of animal species and situations. We have n posi-tive classification and a threshold of k, if n > k the phase is classified positively. If the classification is positive, the system starts a radio communication to an operator.

• Data Communication Stage:

The system takes the precise position through the localization module, and communicates it to the operator. Then it restarts the cycle. Aim of the thesis presented here is to collect the data and detect the type of event. The data collection tool place in the Mediterranean tortoise protection center at Malfatto (Cecina), where the author spent a month in 2012. The observation of the animals movements helps to understand the characteristics of the signals.

The breeding consists of many fences, each of them group together a single species to avoid hybridisation. The species of tortoises are Testudo hermanni hermanni, Testudo hermanni boettgeri, Testudo hermanni her-manni from Sardinie, Testudo herher-manni herher-manni from Lybia, Testudo graeca and Testudo marginata. The dataset was composed of signals with data recorded of all species.

Each fence groups many individuals, both female and male to guarantee the deposition of new eggs and to increase the population. The fences (see fig: 1.1) are big enough to allow the tortoises to have free movements to simulate the natural life.

In order to identify a nesting phase it was necessary to recognise the characteristics of a nesting signal. Then we focused our attention on the choice of the better computational intelligence mechanism. This has to consider the hardware limitation and the goals of the research.

Two methods were chosen in this thesis to identify the signal. The correlation is used to analyse the data and the neural network for the main study of identification.

We identified a characteristic pattern that could be recognised in a sig-nal. The pattern is researched in the signal to classify it. We use it to achieve the training of neural network in order to use this mechanism for

(13)

Figure 1.1: Fences

classification. We could prove that this procedure works to identify the nesting location with good results. Then we start a process of optimiza-tion of the neural network structure in order to allow the limitaoptimiza-tion of the hardware.

This thesis begins with a background analysis to introduce the methods used for this research and the specific problem (chapter 2). Sections 3.2.1

and3.2.3deal with normalization and filtering applied on the signals. These methods are necessary to simplify the signal. Because the signal is recorded by a sensor board located on a animal. It includes noises and outliers.

The initial data analysis describes only the trend of the signal in order to emphasize those features that characterize and identify the nesting activity. This first data analysis allow us to develop methods (chapter 4 to identify signals.

The correlation (section 4.1) method uses cross section to identifies a particular nesting trend. We begun the research in the signal processing using the correlation between a nesting signal cross section and a complete tortoise signal with the aim of understanding the best way to identify the characteristic trends. By measuring the similarity of two signals forms, the correlation is a possible method to identify the nesting phase.

This process can be considered a preliminary stage of study of the sig-nals. This improves the knowledge of the problem, in fact using a corre-lation provides a different point of view and underlines some difficulties in classifying natural signals.

(14)

However, because the cross section of a nesting signal was generalized, it restricted the number of identified signals only to those which are very sim-ilar to the cross section itself. Therefore, the research was extended to the use of learning methods in particular the Artificial Neural Network (chap-ters 4.2 and 4.3). This methods tries to emulate the pattern recognition that characterize the human neural network. For this second method we have implemented a feed forward neural network that can identify a nesting signal. This result is achieved after phase neural network’s training with characteristic patterns.

1.1 Ambient intelligence

Wireless sensor network and neural network researches are combined to-gether in this thesis. Developing a learning process on Wireless sensors networks (WSN) represents a challenge respect data processing, and hard-ware limitations [3]. The Artificial Neural Network (ANN)is an important learning paradigm [4].

Researches on wireless sensors networks and artificial intelligence build together the interdisciplinary concept of ambient intelligence, one of its developing application area J. Cook[5] describes one of its developing ap-plication area as the monitoring of the entire life . .

WSNs are attracting great interest in a number of application domains concerned with monitoring and control of physical phenomena.

Paradigms of Computational Intelligence (CI) have been successfully used in recent years to address various challenges such as data aggregation and fusion, energy aware routing, task scheduling, security, optimal de-ployment and localization. CI provides adaptive mechanisms that exhibit intelligent behaviour in complex and dynamic environments likeWSN.

The CI includes neural network, evolutionary computation, fuzzy logic, swam intelligence, and many more paradigms [6].

Our research investigates the use of neural network to develop a intelli-gent behaviour of the sensor board.

We decided to use the input delay neural network for signals recognition. An application of this model is the speech recognition [7] [8]. We apply this method to recognise a characteristic pattern.

The memory limitation obliged us to use a Convolutional neural network to reduce the memory occupation of the weight values. The CNN is used for face recognition [9] and for visual document analysis [10].

The computational intelligence method developed in this thesis is an ambient monitoring system to support the tortoise population increase.

The idea of sensors network used for ambient monitoring is an idea used in many work for ambient monitoring and ambient intelligent as presented by Genovese [11] that use the sensor network to monitor the forest.

The sensors network for ambient intelligent is presented by Benini [12]. The authors summarize the trends of evolution in wireless sensor network nodes.

(15)

Our research is part of an advanced project that offers many researches areas. We constructed the research on the event detection by the compu-tational intelligence of the sensor. An application of event detection using sensors is presented also by Guralnik [13]: he use data-mining techniques to extract interesting patterns from time series data generated by sensors for monitoring varying phenomenon.

1.2 LIFE

+

The LIFE programme is the EUs funding instrument for the environment. The general objective of LIFE is to contribute to the implementation, up-dating and development of EU environmental policy and legislation by co-financing pilot or demonstration projects with European added value. The LIFE+ _{is the third financial project of LIFE programme, it operatives}

since 2007 and it is applied each year. LIFE+ Nature and Biodiversity is one of the main strands of the European Unions funding programme for the environment. This thesis is part of a project proposed in this pro-gram’s range. The Project objectives are the ri-population of Testudo her-manni ’s tortoises. The project LIFE T. herher-manni will be conducted in two Italian Natura 2000 sites, ”Pineta Granducale dell’Uccellina” and ”Monti dell’Uccellina”, both part of the Maremma Regional Park (Parco Regionale della Maremma), in Tuscany. The project is proposed by Pisa University and Florence University and it is presented to the programme as in succes-sive description. This description taken by the technical application forms presented to the programme.

The Testudo hermanni’s tortoise had a wide distribution in Italy until the 1980s. Since then, the range of the species has de-creased and now the species occurs in few, scattered populations across the Italian peninsula, Sardinia and Sicily. Two of the re-maining Italian wild Hermann’s tortoise populations are located in the two Natura 2000 sites ”Pineta Granducale dell’Uccellina” and ”Monti dell’Uccellina”, both part of the Maremma Regional Park (Parco Regionale della Maremma), Tuscany. Our project aims at increasing the density of these populations in the two Natura2000 sites integrating molecular, morphometric, informa-tion and communicainforma-tion techniques and practical conservainforma-tion actions. The main objectives of the project are: 1) To increase the density of the target tortoise populations through a rein-forcement program; 2) To monitor the populations after the re-inforcement; 3) To present the goals and results of the project to a generic and specialized public; 4) To export the meth-ods and techniques used in the project to other stakeholders. The reinforcement program will be based on adult captive in-dividuals, captive-reared individuals born from captive females, and captive-reared individuals born from local wild females. A

(16)

monitoring device, developed by the University of Pisa and currently patent pending, will be attached to wild females in order to identify nesting spots for subsequent egg collection. Selection of individual tortoises for release and captive breeding will be based on morphometric and genetic analyses. All the released individuals will be marked with a RFID microchip and data will be recorded in a database. Dur-ing population monitorDur-ing, a software will be used to update the database with new information on tortoises using a smartphone. Long-term population persistence will be predicted using com-putational and mathematical models. The project methods and results will be made public to other Italian and European stake-holders that can apply these techniques to similar case studies. The tortoise reinforcement program is developed according to the IUCN guidelines for species reintroduction (see expected re-sults). The project includes a feasibility study, a preparatory phase, a re-introduction phase and a follow-up phase.

This thesis provides the nest identification system using machine learning, it is the main point to develop the localization system.

(17)

Background

In this chapter we present the problem approached in this thesis. Then we describe methods and means that we have used to study the problem.

All are been done underlining the more interesting characteristics for this project. First we describe the architecture of the proposed tortoise lo-calization system. We discern the propitious environment asset for nesting and register signal by a accelerometer sensor on sensor board. To under-stand the signals structure, we study the application of correlation. This method is used for a preliminary study of signals. The correlation is one of methods available to identify a characteristic pattern in a signal.

After that, we study the signal register to understand the best method to be used for the system learning. The propos method is machine learn-ing. The machine learning is an adaptive system changing its parameters during a learning phase basing on the inputs. To understand the signals structure, we study the application of correlation. This method is used for a preliminary study of signals. The correlation is one of methods available to identify a characteristic pattern in a signal.

Then we implement the neural network and optimize it for the sensor board. We compare the results and the memory occupation of this method with the ones of correlation to understand the neural network performances.

2.1 Localization system

The localization system would to identify the nest position in order to pre-vent the tortoise extinction. As described in sec:1.2, this system needs a method to identify the nesting phase. It has dispose of tools in order to record signal and communicate position.

The tools are:

• a GPS to identify the tortoise’s position;

• a wireless communication system (such as Universal Mobile Telecom-munications System (UMTS));

• an analysis of data from sensors. 17

(18)

During deposition, a tortoise performs specific movements for the excavation of the nest. These movements allow us to identify the phase of deposition. We would to implement a neural network to obtain this result.

Since normally the deposition takes more than one hour, the operators involved in egg collection have the time to reach the reported position and identify the animal. Even in the presence of limited errors of the global positioning system.

The localization system includes :

• the steps of acquiring and processing the numerical data provided by sensors;

• a predictive system of classification;

• an output stage for returning the final response.

The nest identification is advanced by the predictive system. This is ”em-bedded” in a standalone device.

We developed the predictive system by using the machine learning meth-ods. For this task we implemented a neural network. It takes in input a signal from accelerometer and predicts the tortoise’s activity.

The nest identification is repeated two times with a delay of 10 minutes to verify the prediction’s truthfulness. This repetition of the identification allows to prevent the intervent in case of nest abandoned.

This system proposes a new method to monitor the tortoises’s life, it is necessary for repopulation program.

2.2 Wireless sensor networks

WSN are networks of distributed autonomous devices that can sense or monitor physical or environmental conditions cooperatively.

The WSNs are composed of a set of nodes that: • are small and non-invasive;

• communicate through wireless interfaces;

• have a set of transducers to acquire environmental data; • Have a microprocessor and a memory;

• can run a simple software programs; • are battery powered.

They are used in numerous applications such as environmental moni-toring, habitat monimoni-toring, prediction and detection of natural calamities, medical monitoring and structural health monitoring. We use aWSN. The network used for experimentation consists of two sensors board and a base system. The base system is a computer that represented a sink node. The

(19)

sink node receives the nodes communication of data. The future WSN will consist of a large number of small, disposable and autonomous sensor nodes that are generally deployed in an ad hoc manner in vast geographical areas for remote operations.

Sensor nodes are severely constrained in terms of:

• Storage resources: in most of sensor board used in the WSN is avail-able a limitated memory that conditions the program developed; • Computational capabilities;

• Power Supply: Nodes in most WSN have limited energy. Communi-cation task consume maximum power available to the sensor nodes. It is necessary to limit the use of it.

Normally the sensor nodes are grouped in clusters, and each cluster has a node that acts as the cluster head. All nodes forward the sensor data to the cluster head node which in turn routes it to a sink node. However, very often the sensor network is rather small and consists of clusters with single node. Real Deployments of WSNusually implement one of the three general applications:

• Periodic reporting: at regular intervals the sensors sample their envi-ronment, store the sensory data and send it to the sink node. Appli-cations is predictable of the data traffic and volume.

• Event detection: nodes sense the environment and evaluate the data immediately. If useful data is detected the data is transmitted to the sink node. In this case there are a small amount of data has to be exchanged for route management, only if the event is detected. • Database like storage: all sensory data is stored locally on the nodes.

Sink search for interesting data and retrieve it from the node directly. The CI is defined each node as: a computational model of intelligence ca-pables of inputting numerical sensory data directly, processing them by the chosen paradigm and generating a response.

We can identify our problem as an event detection. The CI combines elements of leaning, adaptation, evolution to create intelligent machines.

The sensor node is autonomous and programmed with a computational intelligent paradigms that can recognise the searched situation. The study of this paradigms can facilitate the intelligent behaviour of the sensor node in a complex environment. It can exhibit the ability to learn or to adapt to the situation. This research of thesis is based on the evaluation of two different paradigms of CI that are explained in sec:2.5.2. These two paradigms are the correlation and the neural network.

(20)

Figure 2.1: MTS310

2.2.1 Sensor Board Device

Tortoise carapace is a highly complicated shield for ventral and dorsal parts. It consists of modified bones like ribs, part of pelvis. The bone of the shell consists of both skeletal and dermal bones.

The complete enclosure of the shell probably evolved by including der-mal armor into the rib cage. The carapace size differs according to dif-ferent species of tortoise. At Massa Marittima there are many species of more diffuse terrestrial tortoises. The species studied are Testudo hermanni hermanni, Testudo hermanni boettgeri, Testudo hermanni graece, Testudo hermanni from Libia, Testudo hermanni from Sardinie, Testudo hermanni marginata and Testudo hermanni from Veneto. The plastron dimension of their shell ranges from a max of 19.4 cm with a carapace height of 11 cm for a T. hermanni boettgeri to a minimum of 10.4 cm with a carapace height of 7.6 cm for a T. hermanni hermanni. These data limit the dimension of device uses for registration. The fact that the tortoise walks under the plants and penetrates into narrow spaces requires a small device. Moreover, the device should also be light in order to not impede the movements of the carapace itself. In addiction, the device has to have a mold that does not interfere with the normal behaviour of tortoise.

Sensor Board MTS310

The device used for the registrations is the MTS310 it is shown in figure

2.1, it is flexible sensor boards with a variety of sensing modalities. These modalities can be exploited in developing sensor networks for a variety of applications including vehicle detection, low-performance seismic sensing, movement, acoustic ranging, robotics, and other applications.

The sensors we used at Massa Marittima are light sensor, temperature sensor and accelerometer sensor. The light sensor is a simple CdSe photocell. The maximum sensitivity of the photocell is at the light wavelength of 690 nm. The thermistor in MTS3101 _{is a surface mount component. It}

is configured in a simple voltage divider circuit with a nominal mid-scale reading at 25C. The Motes ADC output can be converted to degrees Kelvin

(21)

Figure 2.2: Sensor Board Architecture

using the following approximation over 0-50 C: 1

T (k) = a + b ∗ ln(Rthr) + c ∗ [ln(Rthr)]

3

where:

Rthr = R1(ADC_ADCFS−ADC)

a = 0.00130705 b = 0.000214381 c = 0.000000093 R1 = 10kΩ ADCFS = 1023

ADC = output value from Motes ADC measurement.

The accelerometer is a MEMS surface micro-machined 2-axis. It features very low current draw (< 1mA) and 10-bit resolution. The sensor can be used for tilt detection, movement, vibration, and/or seismic measurement.

This sensor board has to develops the entire process of data acquisition and data recognition.

We show in Fig.:2.2 the board sensor architecture. It includes the fol-lowing components:

• Acquisition Module: Environment sensors used are temperature, light intensity. Movements sensor used is accelerometer;

• Localization Module: It will be present in sensor board GPS, to loca-tion of the monitored tortoise;

• Processing Module: Sensor board consist in a Micro-controller and (RAM+Flash) memory that allows to store and process data received. • Communication Module: Long-range radio to communicate in real

time the tortoise’s location;

• Power Supply Unit: One or more battery and feds all the system components.

We used the device to register cross-sections of nesting, walking and eating during the period in the Massa Marittima tortoise livestock.

I spent one month there to observe their life style. I followed a pro-cedure studied to have enough data signal varied and consistent. A varied cross-sections data-set supports the data processing to understand the char-acteristic of signal. To register a signal we followed a procedure, which was chosen according to the following observations:

(22)

• Place the device on the carapace: when the female start to dig a possible nest, we place the device on a decided position;

• Start device software module to register data: we wait some second before starting registration because the placement of device can inter-fere with the tortoise female humor and she can stop to dig;

• Stop device: tortoise nesting phase finishes i.e. when she start the deposition phase.

The digging signals recorded compose the positive signals. But it is neces-sary to record not only the positive signals but also the negative to under-stand the differences.

2.3 Normalization

The real world data can be considered extremely complicated to interpret without data preprocessing. The first problem is the definition range of values.

The data values are very high that can provoke the saturation of machine learning synaptic weights during the learning phase, blocking it2. Then, we decided to transform the data signal to improves the effectiveness and the performance of the learning algorithms. There are many kinds of normal-izations but during our work, we decided to apply two types.

• Min-Max Normalization : The attribute data is scaled to fit into a specific range. It means that one linearly transforms real data values such as the minimum and the maximum of the transformed data take certain values, in this case 0 and 1. It is obtained by subtracting the population minimum and then dividing by the difference between maximum and minimum.

x = x − xmin xmax− xmin

Where x is the value to normalize, xmin and xmax are respectively the

minimum and the maximum of the data.

• Mean Normalization : The mean value is the average over the entire training set. This value is close to zero, or it is small compared to its standard deviation. A positive normalization represents a datum above the mean, while a negative ones represents a datum below the mean. It is a dimensionless quantity obtained by subtracting the

2

To appreciate the practical significance of this rule, consider the extreme case where the input variables are consistently positive. In this situation, the synaptic weights of a neuron in the first hidden layer can only increase together or decrease together.[4]

(23)

population mean from an individual current value and then dividing the difference by the population standard deviation.

x = x − ¯x σ

Where ¯x is the mean value of data and σ is the standard deviation. The absolute value of x represents the distance between the current value and the population mean in units of the standard deviation. The two proposed normalizations maps the signal in a range close to zero scaling they in a new range. This supports to have the signal definition using float numbers.

In order to reduce the memory occupation of the localization system, we research a way to represent the signals using the integer numbers. A third transformation of signal is presented in this end.

The simple remapping the signal close to zero without scaling. This is obtained by the simple deducting the minimum of all training set to each value.

To that end a third

2.4 Smoothing data and filtering

The natural movement data presents noise and outliers. The movements are subject to environment obstacles. This variations of the original movements could affect to the learning results. In this thesis we use the moving average filter to smooth these noises.

2.4.1 Moving average filter

Moving average filter is an outlier detection method in which a neighbor-hood of observations, called a filtering window, is used to assess the validity of a new observation on the basis of its relative distance from the clos-est neighborhood. The moving average filter is a type of Finite Impulse Response (FIR) filter operates by averaging a number of points from the input signal to produce each point in the output signal. Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then, the subset is modified by ”shifting forward”, i.e., excluding the first number of the series and including the next number following the original subset in the series. This creates a new subset of numbers, which is averaged. In a cumulative moving average, the data arrive in an ordered datum stream and the statistician get the average of all of the data up until the current datum point. Considering n data, i is the ith value to be create,

j is the index inside the subset interval and xi+j is the value jth after the

yi. yi = 1 n n−1 X j=0 xi+j

(24)

This process has the effect of a low-pass filter with the response of the smoothing given by the difference equation. As an alternative, the group of points from the input signal can be chosen symmetrically around the output point: yi = 1 n +n₂ X j=−n₂ xi+j

This filter is very simple but can eliminate all the noise in a signal keeping the original signal.

To understand why the moving average if the best solution, imagine we want to design a filter with a fixed edge sharpness. For example, let’s assume we fix the edge sharpness by specifying that there are eleven points in the rise of the step response. This requires that the filter kernel have eleven points. ... Since the noise we are trying to reduce is random, none of the input points is special; each is just as noisy as its neighbor. Therefore, it is useless to give preferential treatment to any one of the input points by assigning it a larger coefficient in the filter kernel. The lowest noise is obtained when all the input samples are treated equally, i.e., the moving average filter. [14]

2.5 Tortoise localization system

The signals obtained represent the wave made by carapace movement during the activities. In figures 2.3 and 2.4 are shown these waves. The figures exhibit that the dig wave is periodic and very likely to a squared wave rather than the walk wave that is very unpredictable. In order to understand the signal that we have registered by device we decided to present two method: the correlation analysis and usage of the machine learning. The two methods are able to control or learn the characteristic pattern that is present in the wave. In figures 2.4(a) and 2.4(b) the abscissa axis are indicated the time instant, each instant is a quart of a second; in the ordinate axis are indicated the sensor accelerometer values

2.5.1 Correlation analysis

Correlation is a mathematical operation that is similar to convolution. It uses two signals to produce a third signal. This third one is called the cross-correlation of the two input signals. In signal analysis,cross-cross-correlation is a measure of similitude between two wave-forms as a function of a time-lag applied to one of them.

Having two discrete signals,x[n], composed by n elements, and z[m], composed by m elements, we can define cross-correlation

(25)

(a) Signal nest digging 460 470 480 490 500 510 520 530 540 550 1 ₃₄ ₆₇ 100 133 166 199 232 265 298 331 364 397 430 463 496 529 562 595 628 661 694 727 760 793 826 859 892 925 958 991 x-axis y-axis (b) Signal walking

Figure 2.3: Samples of signals with discrete-time signal y[k].

y[k] =

n

X

i=0

x[i]z[k + i]

The y[k] is the correlation signal composes of correlation values. It is compute by the shifting of target signal on the original signal.

Our problem is to discover a known waveform in a noisy signal. The waveform we are looking for, z[n], commonly called the target signal. Each sample in y[n] is calculated by shifting the target signal left until the end of the moving signal.

(26)

(a) Signal nest digging 460 470 480 490 500 510 520 530 540 550 1 ₃₄ ₆₇ 100 133 166 199 232 265 298 331 364 397 430 463 496 529 562 595 628 661 694 727 760 793 826 859 892 925 958 991 x-axis y-axis (b) Signal walking

Figure 2.4: Samples of signals

To move the target signal we apply a time lag of 1 step. Next, the indi-cated samples from the moving signal are multiplied by the corresponding samples in the target signal. The sum of these products then moves into the proper sample in the cross-correlation signal. The amplitude of each sample in the cross-correlation signal is a measure of how much the original signal resembles the target signal at that location. It means that the value of the cross-correlation is maximized when the target signal is aligned with the same features in the original signal. If there is noise on the received signal, there will also be noise on the cross-correlation signal. Using correlation to detect a known waveform is frequently called matched filtering. The filter kernel of the matched filter is the same as the target signal being detected, except it has been flipped left-for-right.

(27)

The correlation shows the tendency of two variables, x and y, to vary together or covariance. To define the form of relationship between to signals is necessary to distinguish between entity and direction. We have positive direction if as increasing of x also y increases, negative direction if it is in the same situation but y decreases. Entity is refereed to the force of relation between variables. To convey the relation between two variables we use the linear correlation coefficient. This is standardaised and is defined in [-1,1], with -1 has been held the negative correlation and with 1 positive corre-lation, with 0 hasn’t been held any correlation. The Pearson’s correlation coefficient is defined :

r = cxy σxσy

from that can be deduce:

r = P xy − P x P y

p[P x2_{− (P x)}2_{][P y}2_{− (P y)}2_]

Pearson’s correlation coefficient between two variables is defined as the co-variance of the two variables divided by the product of their standard de-viations, cxy is the covariance of x and y. σx and σy are standard deviation

respectively of x and y.

The correlation coefficient is used to compute the correlation value on each shift movement. We obtain an output value defined in the range defi-nition of the neural network output.

2.5.2 Machine Learning

Machine learning allow us to learn the wave form with many possible small variations. This branch of artificial intelligence concern the construction and study of systems able to learn from data. In 1959, Arthur Samuel defined machine learning as a ”Field of study that gives computers the ability to learn without being explicitly programmed”[15]. Tom M. Mitchell provided a widely quoted, more formal definition: ”A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.[16] This definition is notable for its defining machine learning.

In fundamentally operational rather than cognitive terms, thus following Alan Turing’s proposal in Turing’s paper ”Computing Machinery and Intel-ligence” that the question ”Can machines think?” be replaced with the ques-tion ”Can machines do what we (as thinking entities) can do?”[15].Generalization in this context is the ability of an algorithm to perform accurately on new, unseen examples after having trained on a learning data set. Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. We are given a set of exam-ple pairs. The aim is to find a function in the allowed class of functions that matches the examples. For classification the samples is labeled with

(28)

”1” or with ”-1” to discern two different classes (labels of classes is chosen referred to range of output function). There are many areas where we can use machine learning to describe and resolve problems [3]. One of this is similarity learning. That is one type of a supervised machine learning task in artificial intelligence. It is closely related to regression and classification, but the goal is to learn from examples a function that measure how similar or related two objects are.

Using the neural network we can enjoy of advantages as [17]: • The generalization ability;

• Useful in problems of pattern recognition; • Dynamic behaviour;

• Ability to play or modeling non linear behaviour of a system; • Ability to classify complex patterns;

• Fault Tolerance: due to the absorption by the network of insignificant deviations of the input values;

But there are also some disadvantages to use neural networks:

• For the purpose of learning is necessary to provide the neural network many more examples of the more strongly non-linear is the system by model;

• The learning process may lead to solution sub-optimal;

• It is not possible to know a priori what the best topology network (number of nodes, layers, connections between nodes and type of func-tions that are required to implement the nodes in each layer) from assign to the neural network models to the best of our system. A neural network involves searching a large space of possible hypothesis to determinate which one has the better fit on the data inputs with the knowledge learn during the training phase.

Learning with a teacher

The learning with a teacher is a learning paradigm, it is referred by super-vised learning. The teacher has knowledge of environment and represent it with a set formed by pairs input-output of examples.

The teacher gives a training vector, that is able to provide the neural network with the desired response thank to his knowledge. The network’s response is the optimum action that it is able to do. The adjustment of neural network’s parameters is based on the training vector and the error signal. The error signal is the difference between the waited response and current output of neural network. The adjustment is an iteratively process to emulate the teacher. The teacher’s knowledge is passed to the neural

(29)

Figure 2.5: Learning with teacher

network by the training, as fully as possible. When the maximum emulation is reached the teacher is not needed anymore and the neural network is completely independent. Training through the teacher values is a loop, it is repeat in order to correct the error, which is made by the neural network. When the neural network’s output is calculated in an unknown environment, this calculation is not in the loop. The neural network performance has value being based on the mean squared error over the training samples, defined as a free parameters function. Is like a multidimensional error performance surface, where the free parameters are the coordinates.

The true error surface is the mean value over all possible couples input-output. All the system operation made under the teacher supervision are presented like a point on the error surface. The main operation has to move down toward a minimum point of error surface that increase the system performance over time and therefore learn from the teacher, . A supervised system is able to carry on this procedure on the base of a gradient of error surface. It corresponded to the current behaviour of the system.

The gradient is a vector that in each point corresponded to the direc-tion of steepest descent. The learning with teacher uses a instantaneous gradient estimate. The use of similar estimate proves on an error surface the operating point of a motion, that is in the form of random walk. Nev-ertheless given a algorithm to minimize the cost function, adequate set of input-output examples and enough time for training, the supervised learn-ing system is usually able to perform such task pattern classification and function approximation.

ANN[16].

The ANN a mathematical models inspired from biological neural networks. These are essentially simple mathematical models defining a function F : (X → Y ) or a distribution X over Y . A basic components of an artificial neuron are:

(30)

Figure 2.6: ANN model: Perceptron

between the n inputs to the jth neuron. The wji is the weight of the

connection between jth neuron and the neuron input xi, i=1,..,n;

• An aggregation function that sums the weighted inputs to compute the input to the activation function netj = 1w0 +Pn_i=1xiwji where

1w0 is product between the bias and the weight associated to it. It

is convenient to think of the bias as the weight for an input x0 whose

value is always equal to one, so that netj =Pn_i=0xiwji;

• An activation function σ maps netj to oj = σ(netj), where oj is the

output value of the neuron. Some examples of the activation function: step, sigmoid, tan hyperbolic and Gaussian function.

A neural network is a collection of formal neurons interconnected by having the output of each neuron function as input to any sub-collection of neu-rons. In addiction, a designated set of neurons receive external inputs, while another designated set of neurons is identified as a set of output elements. It processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system changing its structure during a learning phase. Neural networks are used for modeling complex relationships between inputs and outputs. Formally the architecture of the network is specified by a directed graph G =< V, E > [17].

• Vertex V : the set of vertices V is comprised:

– a set of source nodes: corrisponding to the set of external inputs; – a set of computational nodes: corresponding to the neuron in the

network;

– some nodes are designed as output nodes.

If the task is a classification task the node functionality of an output node is with an added threshold to produce a binary value for networks of sigmoid neurons.

• Edge E : A directed edge (i,j) is present in the set of edges E if, and only if, i is a source node, j is a computation node, and i is connected

(31)

to j, or i and j are computation nodes with the output of i serving as an input to j.

The neural network architecture that, we use, is acyclic. Network function-ality is not affected by the choice of a particular mode of operation for an acyclic network with external inputs fixed, the network will seattle into a stable steady state in a finite number of steps (epochs) at which point the outputs can be read out.

The particularity of ANN is that it can learn by samples and solves problem. The first type of ANN is based on a unit called a perceptron. It takes a vector of real-valued inputs, calculates a linear combination af these inputs, then outputs a 1 if result is greater than the threshold and -1 otherwise. yj = o(x) = sgn( n X i=0 wji∗ xi) where y = ( 1, if y > 0 −1, otherwise xi is the input ith;

y is the output function; w is a vector or weight; n is the number of inputs.

One single perceptron represents a boolean functions. The most interesting perspective is the training rule, the precise learning problem is to determine a weight vector that causes the perceptron to produce the correct ±1 output for each of the given training examples. First of all is important to begin with random weights, then iteratively apply the perceptron to each example. This is repeated until the perceptron classifies all training example correctly. Weight is modified in each step according with training rule. The weight is revises according to the rule:

wji= wji+ ∆wji

∆wji = η(t − o)xi

t is the target output for the current training example;

o is the output generated by the perceptron; η is the learning rate (moderate the degree to which the weights are changed). This rule finds successful weight vector when problem is linearly separable otherwise we have to use the delta rule. Delta is a training rule, it uses the gradient descent to search the hypothesis space of possible weight vector. Introducing the concept of gradient descent we provide the basis for backpropagation algorithm, that is explained below. Considering the task of training in perceptron. This concerns the linear unit if the output o is given by o(x) = Pn

i=0wji ∗ xi.

(32)

measure of training error of hypothesis, related to the training example. The common way to define the error is:

E(w) ≡ 1 2

X

dD

(td− od)2 (2.1)

D is the set of training examples;

td is the target output for training example d;

od is the ouput of the linear unit for training example d.

E is characterized as a function of w because the linear unit output o de-pends on this weight vector. Gradient descent research determines a weight vector that minimize E by starting with an arbitrary initial weight vec-tor, then repeatedly modifying it in small steps. At each step, the weight vector is altered towards the direction that produces the steepest descent along error surface. This process continues until the global minimum er-ror is reached. Therefore it is necessary to find the direction of steepest descent, by computing the derivative of E with respect to each component of the vector w. This is called gradient of E with respect to w written as ∇E(w). It is a vector whose components are the partial derivatives of E with respects to each of the w.

When interpreted as a vector in weight space, the gradient specifies the direction that produces the steepest increase in E. To have the steepest decrease is necessary to use the negative of the gradient. Now the training rule that use the gradient descent is:

w = w + ∆w where:

∆w = −η∇E(w) (2.2)

In other words the gradient descent method is deployed in three steps:Random initialization of weights, applying learning unit to all training neurons, and updating weights.

The learning rate (η) determines the step size in the gradient descent search. We want to move in the direction of decreases E that we use the negative sign. By remembering that ∇E(w) is composed by derivatives of E with respect to each of the wji we can describe the training rule like a

steepest descent achieved by altering each component wji of w in proportion

of _δwδE

ji. It is not difficult to obtain the ∇E(w), as it seems, it can be obtained

by differentiating E from 2.1 as: δE δwji

=X

dD

(td− od)xid

where xid denotes the single input component xi for training example d.

Now we can redraft 2.2 as follows: ∆wji = η

X

dD

(td− od)xid

Given a sufficiently small learning rate, this algorithm will converge to a weight vector with minimum error.

(33)

BackPropagation algorithm

In the next section we will introduce the Multilayer Perceptron. This model that is been applied to solve some difficult and diverse problem by training it in a supervised manner. It is interesting to observe the training rules needed for the model. One of training algorithm useful for this model is the error back propagation algorithm.

It employs gradient descent to attempt to minimize the squared error between the network output value and the target values for this outputs. It consists in two passes through the different layer of the network. In the

Figure 2.7: Backpropagation algorithm

forward pass an input vector is applied to the sensory nodes propagating the effects layer by layer on all the network. This pass start at the first layer and go on until the last layer. That produce an real responds of the network. In this pass the synaptic weights of the network are all fixed. In the backwards pass the synaptic weight are all adjusted in accordance with the error correction rule. The actual response of the network is subtracted from target to produce the error signal. that is propagated backward through the network from output neurons to the input neurons. In the next section we explain in detail the application algorithm.

MLP.

Universal Approximation, Cybenko-Hornik-Funahashi Theorem [18]

Let σ be any sigmoid function. Let all inputs be in [0 1] or other nite interval. Let Id the d-dimensional cube [0, 1]d. Then, a sum of the form: σ(x) = P

jwj(biasj+ wjx) can approximate any continuous

function f to any accuracy. It might requires any number of hidden neurons. Universal Logical Approximation, we demonstrate a weaker result:

that any logical expression can be converted to a neural network. However, the number of hidden neurons might be exponential in the number of inputs.[16]

(34)

Only one perceptron can express linear decision surface. The kind of mul-tilayer networks learned by the backpropagation algorithm are capable of expressing a rich variety of non linear decision surface. MLPis a feedforward artificial neural network composed by three or more layers. A feedforward neural network is an artificial neural network where connections between the units do not form a directed cycle. This start from input neurons and go on to output neurons. An MLPconsists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. It maps sets of input data onto a set of appropriate outputs. Except for the input nodes, each node is a neuron (or processing element) with a non-linear activation function. We discussed before the MLP and differentiable threshold units

Figure 2.8: Multilayer Perceptron

explaining linear units an how to derived the gradient descent for learning rule. The activation function particularly affects the neural network result. The linear function can represent the linear separability problems. We made an experimentation phase in order to understand to test the usability of the linear function to our problem. That are same cases of impossibility to learn the pattern using the reduced values and the performance reached is not better than the ones made by a sigmoid function. This is experimented during a preliminary analysis on the models.

We need to have units whose output is a non-linear function of its inputs and also a differentiable function. The neurons are the sigmoid unit, e.g. perceptron, computes a linear combination of its inputs, then applies a threshold to the result.

netj = n X i=0 wjixi oj = σ(netj) = 1 1 + e−netj

σ is called logistic function and has an output range [0, 1]. For the gradient descent the more useful property is the possibility to be derivative. In particular

dσ(net)

dnet = σ(net) ∗ (1 − σ(net))

Other differentiable activation function is the hiperbolic tangent. That unit, called tanh. The hiperbolic tangent is a nonlinear and differentiable

(35)

function.

σ(net) = 1 − e

−2net

1 + e−2net

Its output ranges -between -1 and 1. Increasing with its input.It is easy to calculate its derivative:

σ(net) = 1 1 − net2

The gradient descent learning rule makes use of this derivative. For the

MLPwe referred to a training rule that use the gradient descent employing it on the more complex MLP’s structure. The back propagation algorithm employs gradient descent to attempt to minimize the squared error between the network output values and the target values for these outputs. This algorithm needed the supervised learning. The E we refer in 2.1 now has to be redefin as the sum of the errors of over all network output units

E = 1 2 X dD X koutputs (tkd − okd)2 (2.3)

Where t and d are targets and the output values are associated with the k-th output unit and training example d.

If the number of output neurons is major of one the error function is the

2.3. The learning problem faced by Back-Propagation is to search a large hypothesis space defined by all possible weight values for all the units in the networks.

Each training example is a pair of the form <x,t>, where x is the vector of network input values, and t is the vector of target network values.

We always have η, the learning rate. The back-propagation consists of two phases [16]. Having a feed-forward network, we initialize all network’s weights to random values between a decided range.

It is important to establish with certainty decide the weight range be-cause values which are too big can induce the learning rate to fail.

For each < x, t > in training examples do:

• propagate the inputs forward through the network; • propagate the errors backward through the network.

In the first phase, input an instance x to the neural network and compute the output o of every unit in the network.

In the second phase, during the backward propagation, we have to up-date the weight based on output error. For each output unit k is calculated the error with:

δk = ok(1 − ok)(tk− ok)

then go back to hidden units and calculate its error on the strength of output error using:

δh = ok(1 − ok)

X

koutputs

(36)

Now we have all the errors and we can update each weight to the new values calculate on error and gradient descent:

wji = wji+ ηδjxi

where δk varying depending the reference unit (either hidden or output).

The algorithm updates the weight incrementally, following the presen-tation of each training example, if we are making an online training. If we make a Batch training, we update all weights after having calculated the output of all examples. After that, we modify weights with the average of the sum of updates (gradient) of all examples. The back propagation loop can be stopped if we achieve the desired error threshold or if we make enough epochs. It is important to control the over-fitting if we made too many epochs. It is possible to use parameters to control the weight increase and to regularization. The basic idea is that the weight update on the nth iteration depends partially on that occurred during (n − 1)th iteration. We can express this dependence with a variation on equation of weight update:

∆wji(epoch) = ηδjxji+ α∆wji(epoch − 1)

With epoch we indicate the actual learning epoch. α is a constant called momentum and it is defined between [0, 1]. It has the effect to increase the step size of the search regions where the gradient is does not change. Back propagation is able to reach the complexity of the hypothesis, increases with the number of weight-tuning iterations.

Given enough weight-tuning iterations, it will be able to create overly complex decision surface that fit noise in the training data.

This is the over-fitting problem in back-propagation. One approach is weight decay. It consists of decreasing each weight by some small factor during each iteration. The weight decay penalty term causes the weights to converge to smaller absolute values than they otherwise would. Large weights can hurt generalization in two different ways. Excessively large weights leading to hidden units can cause the output function to be too rough, possibly with near discontinuities. To put it another way, large weights can cause excessive variance of the output ([19]). This can avoid the overfitting on the training set.

It is important to take in account the number of epochs that we use for the training phase. During iterative training of a neural network , an epoch is a single pass through the entire training set, followed by testing of the verification set.

The epoch in back-propagation learning is one weight update or training iteration. For each epoch, the back-propagation learning algorithm builds a different model a network with a different set of weights.

E(w) = X

p

(dp− o(xp))2+ λ

X

(k w k)2

This is equivalent to modify the definition of E to include a penalty term. The motivation of this approach is to keep small weight values.

(37)

The momentum is a weight adjustment. Momentum basically allows a change to the weights to persist for a number of adjustment cycles. The magnitude of the persistence is controlled by the momentum factor. If the momentum factor is set to 0, then the equation reduces to standard weight upgraded. If the momentum factor is increased from 0, then increasingly greater persistence of previous adjustments is allowed in modifying the cur-rent adjustment.

This can improve the learning rate in some situations, by helping to smooth out unusual conditions in the training set.

In next subsections we analyse different MLP structure in the base on our task and our hardware limitation.

Input Delay Neural Networks [20] [21] [20]

Creating a neural network that is pure reactive can’t capture the real situa-tion of tortoise. Dynamical network is different from pure reactive network in the sense that dynamical models have memory elements. Memory ele-ments are the key to preserve the state of dynamical network. This feature gives to the network the capability to handle the time depending problems. Temporal tasks and static tasks have fundamentally different characteris-tics. If we deal with a static task the N-vector input can be randomly permuted without affecting the classification. The temporal task is order sensitive. Time is important in many cognitive tasks such as vision, speech, signal processing, and motor control. A memoryless network model is use-less for a temporal task. The network should have dynamic properties that make the network responsive to time-varying signals. The simplest given

Figure 2.9: IDNN

memory is the unit time delay which have the transfer function: σ(y) = y−1

The simplest memory architecture is the tapped delay line, which consists of a series of unit time delays. Using a Time Delay Neural Network (TDNN), Network can make decision based on the memory. That is the most popular neural network that uses ordinary time delays loop to perform temporal

(38)

processing. TDNN is a multilayer feed-forward network with additional memory input nodes. This allows for temporal learning, makes decision based not only on the present state but upon previous ones.

The TDNN model has characteristics searched for our task. We are interested on a sub-class of TDNN, the IDNN. The memory in this case is made by an input window on a temporal signal. The size of window depends on the period that is necessary for classification. The IDNN are

Figure 2.10: IDNNSLide

concentrated only on the inputs of the network. Along with present input pattern, the desired amount of previous input patterns are fed into theIDNN

at the input layer (Fig.:2.9). An advantage of the IDNN is having a less complex network than the originalTDNN, but preserves the same temporal processing capability. Our task is a time task because each movement value from the accelerometer sensor depends on the values that the sensor detected the previous instant. In this case we can use this kind of artificial neural network to understand the situation of tortoise by the signal of movements. In figure2.5.2 shows the structure of IDNN. The network slides the inputs window on the signal and acquires one n new signal together with old m signal so as study gradually the history of movement.

For each input window the network returns the output value for clas-sification. The next input shift the window n steps forward and repeats process.

Convolutional Neural Networks [22] [9]

CNN are variants of MLPwhich are inspired from biology. MLP has some drawback when it works in the real world application. This is due mainly to three reasons[22]:

(39)

Figure 2.11: CNN

• it offers little or no invariance to some types of distortions; • the distribution of input data is completely ignored.

TheCNNis introduced by LeCun and Bengio to find the locally sensitive and orientative nerve cells in the visual cortex of cat.

They designed a network structure that implicitly extract rel-evant features, by restricting neural weights of one layer to a local receptive field in the previous layer.

The CNN proposed is shown in figure 2.11.

The first convolutional layer contains n feature maps, each of these has a slightly smaller resolution than the original input. Each subsampling layer consists of neurons with a trainable coefficient for the local average and trainable bias. The last subsampling layer is the output layer and contains only one features map. This kind of neural network is particularly indicated to recognize image like written works or faces. However this particular network can be applied in this situation. Therefore the pattern is a kind of structure made by the trend of characteristic signal.

(40)

Datasets

3.1 Data Collection

The data collection phases lasted one month, starting on June 1st 2012

until July 1st 2012. During this month we observed the tortoises life and

the environment conditions useful for nesting. For the registrations, we have used a mock-up, described in Ch.2.2.1, during the data collection phase in Massa Marittima. Tortoises are usually very curious so that the position of the device should not impede their movements but also high enough in order no tot damaged by other tortoises. We have been able to decide the best position on the carapace by observing the tortoises. They attempts to tear of the device from carapace, endanger that functionality. During nesting phase the most important movements are of back paws hence the device has to be enough downward to register the tortoise paws movements. We preferred to chose an high position (as it is visible Fig.: 3.1) on the carapace. That can also prevent the sexual coupling with males. Fig: 3.2, shows the natural position for the device chosen for the data collecting. The chosen position is between the third and fourth neural bone of the carapace, and it is uses for all registrations, and with the radio antenna oriented to the top. The orientation is important to understand the meaning of accelerometer sensor data. We studied the movements that characterized

Figure 3.1: Position of localizer