A neural approach for multi-reservoir system operation

(1)

Master of Science in Environmental and Land Planning Engineering

A neural approach for multi-reservoir

system operation

Supervisor:

p r o f. giorgio guariso

Master Graduation Thesis by: m at t e o s a n g i o r g i o Student Id n. 836954

(2)

First and foremost, I would like to express my gratitude and appre-ciation to my supervisor, Professor Giorgio Guariso, for helping me with his valuable advices and experience.

Special thanks are also due to Dr. Marc Jeuland, who shared with me some of the results of his PhD dissertation. They have been the foundations on which this thesis is built.

Un grosso ringraziamento va a tutti i compagni di corso con cui ho condiviso quest’esperienza. Non posso non ringraziare tutti gli am-ici con cui ho avuto il piacere di passare molti bellissimi momenti in questi anni. In particolare ringrazio Claudio e Alessio, per le lunghe serate alla ricerca di qualche spunto interessante per questa tesi. Un grazie particolarmente sentito va anche a tutti i miei familiari. Trascor-rere del tempo insieme a voi è sempre un grande piacere.

Ringrazio le mie nonne, perché sanno essere un importante punto di riferimento. Ringrazio i miei genitori, per il loro appoggio incodizion-ato e per essere sempre in grado di trovare le parole giuste. Ringrazio mio fratello, per la sua disponibilità e per il suo aiuto. Desidero infine, last but not least, ringraziare Linda, per essere così speciale.

Matteo

(3)

Abstract ix

1 i n t r o d u c t i o n 1

1.1 Problem statement and research context 1 1.2 Proposed procedure 3

1.3 Procedure testing on Nile River case study 4 1.4 Dissertation outline 5 2 b a c k g r o u n d a n d r e l at e d w o r k s 6 2.1 Deterministic problem 7 2.1.1 Linear programming 7 2.1.2 Nonlinear programming 7 2.2 Random problem 13

2.2.1 Stochastic dynamic programming 13 2.2.2 Fixed-class cost-to-go 14

2.2.3 Fixed-class policy 15 2.3 Pareto efficiency 19

2.4 Artificial neural network 21 2.4.1 Single-layer network 23 2.4.2 Multi-layer network 24 2.4.3 Network identification 26 2.4.4 Neural networks applications 29 2.5 Nile basin modeling 29

3 p r o b l e m s e t t i n g 32

3.1 Schematic representation of the system 32 3.2 Variables of the model 32

3.2.1 State variables 35 3.2.2 Input variables 35 3.2.3 Internal variables 36 3.2.4 Output variables 37

3.3 Formal definition of the problem 38 3.4 Modeling of the physical system 40

3.4.1 White Nile 40 3.4.2 Blue Nile 41 3.4.3 Atbara 43 3.4.4 Main Nile 43

3.5 Synthetic dataset generation 45 4 o p t i m i z at i o n 48

4.1 Structure of the proposed procedure 48 4.2 Open-Loop Optimization 51

4.2.1 Initial population definition 52 4.2.2 Objective space description 52 4.2.3 Parameters setting 52

5 n e u r a l c o n t r o l l aw s 58

(4)

5.3 Khasm el Girba reservoir 65 5.4 Aswan High Dam 65

6 r e s u lt s 71 7 c o n c l u s i o n a n d f u t u r e w o r k s 77 7.1 Conclusion 77 7.2 Future work 79 b i b l i o g r a p h y 80 iv

(5)

Figure 2.1 Comparison between a 2D pseudo-random sam-pling and a quasi-random samsam-pling (Sobol

se-quence) 9

Figure 2.2 Flow chart of an iteration of aGA-based

proce-dure 10

Figure 2.3 A generic bidimensional objective space 20 Figure 2.4 An L-layer Neural Network 22

Figure 2.5 Structure of the j-th artificial neuron of the l-th layer 22

Figure 2.6 Structure of Rosenblatt’s perceptron 24 Figure 2.7 A 2-layerMLP 25

Figure 2.8 Rainfall regimes over the Nile basin 30

Figure 3.1 The four major tributary basins of the Nile River 33 Figure 3.2 Schematic representation of the Nile River

sys-tem 34

Figure 3.3 The flow regime of the Nile showing the typ-ical contribution from each of the three main tributary basins 41

Figure 3.4 Comparison between observed and synthetic dataset for the discharge of White Nile at

Khar-toum. 46

Figure 3.5 Comparison between observed and synthetic dataset for the inflow of lower Atbara catch-ment between Khasm el Girba dam and the junction with the Main Nile. 47

Figure 4.1 Schematic representation of the proposed

pro-cedure 49

Figure 4.2 Representation of the objective space and of

the corresponding standardized objective space 53 Figure 4.3 Objective space for two dry scenarios 55

Figure 4.4 Objective space for two wet scenarios 56 Figure 5.1 Optimal release from Lake Tana for one of the

5-year synthetic scenario 61

Figure 5.2 Error distributions for the Roseires control law 63 Figure 5.3 Comparison between the residual distributions

for Roseires reservoir 64

Figure 5.4 Error distributions for the Khasm el Girba con-trol law 66

Figure 5.5 Comparison between the residual distributions for Khasm el Girba reservoir 67

(6)

Figure 5.7 Comparison between the residual distributions for Aswan High Dam 69

Figure 6.1 Representation of the objective space with the ten considered management strategies 72 Figure 6.2 Comparison between the performances obtained

for each objective 73

Figure 6.3 Releases from Roseires, Khasm el Girba and AHDfor one of the scenarios 75

Figure 6.4 Objective space for one of the synthetic

scenar-ios 76

(7)

Table 4.1 Monthly average of the inflows for the four considered synthetic scenarios 57

Table 4.2 Objectives improvement with respect to the triv-ial policy with the open-loop control sequence for the deterministic case 57

Table 5.1 Variables given as input to the control laws for each of the considered cases 59

Table 5.2 Indexes used for the evaluation of the neural control laws identified for Roseires 62 Table 5.3 Indexes used for the evaluation of the neural

control laws identified for Khasm el Girba 65 Table 5.4 Indexes used for the evaluation of the neural

control laws identified forAHD 70

Table 6.1 Distances from the solution obtained with the trivial policy for all the considered cases 74

(8)

a h d Aswan High Dam

a n n Artificial Neural Network d m Decision Maker

d p Dynamic Programming d s s Decision Support System e a Evolutionary Algorithm f p e Final Prediction Error f s a Finite State Automaton g a Genetic Algorithm i q r interquartile range l p Linear Programming m l p multi-layer perceptron m o Multi Objective m p Mathematical Programming m s e Mean Square Error

n l p Nonlinear Programming n n Neural Network

n s d p Neural Stochastic Dynamic Programming o c Optimal Control

p d f Probability Density Function p s o Particle Swarm Optimization s d p Stochastic Dynamic Programming s o Single Objective

w g n White Gaussian Noise

(9)

Multi-reservoir systems management is complex because of the uncer-tainty on future events and the variety of purposes, usually conflict-ing, of the involved actors. An efficient management of these systems can prevent political crisis and conflicts between the stakeholders and can often improve resource allocation with respect to the historical management, providing a win-win strategy. Bellman Stochastic Dy-namic Programming (SDP) is the most famous among the many pro-posed approaches to solve this Optimal Control (OC) problem. Unfor-tunately, Stochastic Dynamic Programming is affected by the curse of dimensionality: computational effort increases exponentially with the complexity of the considered system (i.e. number of reservoirs), and the problem rapidly becomes intractable. This thesis proposes a novel approach for the solution of the OC problem. The core idea is using the Artificial Neural Networks (ANNs) for designing neural policies which well approximate the optimal multi-reservoir network man-agement policy obtained by an open-loop approach. A well trained ANN can adequately reproduce the relationship between input and output, providing an efficient control policy. The dataset used for the neural policy identification is composed by a large number of syn-thetic inputs and release series, generated under the assumption of deterministic knowledge of the future inflows.ANNsreplicate the op-timal management but with a closed-loop control system using only the current information, to take the decision in real-time. For this rea-son, they are effective real-time management policies. The proposed methodology is applied to the Nile River basin for two principal objec-tives: minimization of the irrigation water deficit and maximization of the hydropower production.

Keywords: Water Reservoir Network; Multi-reservoir System Op-eration; Optimal Control; Artificial Neural Network; Stochastic Dy-namic Programming; Multi Objective Optimization; Nile River Basin

(10)

La gestione di sistemi multi-serbatoio è un’attività complessa in quanto gli eventi futuri non sono noti con certezza e poiché gli attori presenti hanno obiettivi diversi, spesso conflittuali. Un’efficiente gestione di questi sistemi è utile a prevenire crisi politiche e conflitti tra i portatori di interesse e spesso comporta una migliore allocazione delle risorse rispetto alla gestione storica, accrescendo la soddisfazione di tutte le parti in gioco. Sono stati proposti molti approcci per la soluzione di questo problema di controllo ottimo: il più famoso è la ben nota Programmazione Dinamica Stocastica (SDP) di Bellman. Sfortunata-mente, laSDPrisente della maledizione della dimensionalità: il costo computazionale cresce esponenzialmente con la complessità del sis-tema considerato (nel nostro caso con il numero di serbatoi presenti), e quindi il problema diventa rapidamente irrisolvibile. Questa tesi propone un approccio originale per la soluzione del problema di con-trollo ottimo, basato sull’idea di utilizzare le reti neurali artificiali (ANN) al fine di progettare una politica neurale in grado di approssi-mare in modo adeguato la politica ottima per la gestione della rete di serbatoi. Una ANN ben addestrata può riprodurre in modo sod-disfacente la relazione tra ingresso e uscita, fornendo un’efficiente politica di controllo. Il dataset usato per l’identificazione della polit-ica neurale è composto da un gran numero di serie sintetiche, gen-erate calcolando la sequenza di controllo ottima in anello aperto, as-sumendo una conoscenza deterministica degli afflussi futuri. Le reti neurali sono in grado di replicare la gestione ottima così calcolata, ma in anello chiuso, basandosi cioè solo sull’informazione corrente (e non futura) e possono quindi essere impiegate in tempo reale per l’effettiva gestione della rete. La metodologia proposta è stata appli-cata al sistema del bacino del Nilo, considerando due obiettivi prin-cipali: minimizzare il deficit idrico relativo all’irrigazione e massimiz-zare la produzione di energia idroelettrica.

Parole Chiave: Reti di Serbatoi; Gestione di Sistemi Multi-serbatoio; Controllo Ottimo; Reti Neurali Artificiali; Programmazione Dinamica Stocastica; Ottimizzazione Multiobiettivo; Bacino del Nilo

(11)

1

I N T R O D U C T I O N

Multi-reservoir systems operation is a classical issue in water resources management. The problem is complex because of the amount of in-frastructure located in medium and large river basins and the variety of actors that play a role in the game which have different objectives that often lead to conflict. A win-win solution of the conflict, given that it exists, can only be achieved with the cooperation between the stakeholders and a coordinated management of the system. Multi-reservoir system operation optimization is vital to all countries, es-pecially those in which hydroelectric power provides a large part of the demand. Power production is not the only sector affected by the reservoirs management: agricultural district water supply, flood pro-tection, river navigation, water pollution, erosion and sedimentation phenomena and tourism are only the main examples of sectors that are strictly dependent on reservoirs operation, lots of other cases can be identified analyzing a specific situation.

This thesis work presents a novel approach to the optimal manage-ment of a multi-reservoir system. First, we modeled the system con-sidering major reservoirs, their interconnection and the requirements from some relevant stakeholders: agricultural districts and hydroelec-tricity producers. Then, we introduced an optimization technique to minimize the water deficit of the agricultural districts while satisfying the requirements for hydroelectric energy production. Finally, we val-idated the proposed approach using data from the Nile River basin. 1.1 p r o b l e m s tat e m e n t a n d r e s e a r c h c o n t e x t

Classical approaches to the optimal management of a multi-reservoir network can be split into two main groups. The first group follows a parametric approach. In fact this is a classical Mathematical Pro-gramming (MP) problem, that studies the general problem of deter-mining the minimum of a function, the objective function, possibly subject to some constraint. The term "parametric" indicates that the result of the optimization is a finite sequence of real numbers (param-eters). The more natural way to solve the problem with this approach consists of searching, for each reservoir of the system, the finite se-quence of the decision variables ut that optimizes a defined objective function which measures the satisfaction of the stakeholders. Even if decision variables are different from model parameters from a concep-tual point of view, they are the same from the optimization algorithm prospective. The second group uses a functional approach in order to

(12)

determine the optimal control policies for each reservoir. In this case, we are dealing with an Optimal Control (OC) problem. A policy is a sequence of optimal control laws, functions which take as argument the information on the system at a certain instant and return the op-timal decision variable. It is clear that this second problem is more complex than the first because we are searching for a sequence of functions instead of a sequence of parameters. It is important to un-derline that these two approaches are not disconnected: parametric approach, is in a certain sense, a particular case of the functional ap-proach. On the other hand, it is also possible to transform a functional problem into a parametric problem, but we must accept to introduce an approximation as it will be discussed in the following paragraphs. Multi-reservoir network operation can be modeled in two different ways, depending on the degree of knowledge of the system inputs. Inputs can be either control variables or disturbances. Controls are the argument of our optimization and thus we can set them. Disturbances are something that the Decision Maker (DM) cannot control but that in fact affect the system and thus influence the optimization process. Two kind of disturbance can be identified (Soncini-Sessa et al., 2007). Deterministic disturbances wt, so called because their value is known at time t, when the DM must decide the control. The other type of disturbances, indicated with εt+1, are random. They occur during the time interval [t, t + 1) and thus are not known at time t.

If there are only deterministic disturbances, an open-loop sequence of optimal controls is sufficient in order to optimize the objective func-tion. In this case solving the problem with a functional approach is superfluous and a policy does not have any advantage with respect to a sequence of decision variables. There are lots of algorithm to solve such a problem: the most famous is the Dynamic Programming (DP) (Bellman, 1957). Since in real world it is necessary to consider dis-turbances, computing an open-loop control sequence has a limited operative usefulness.

When aDMneeds a practical tool as a support for the decision mak-ing process, it is necessary to abandon the deterministic world and start considering the real world uncertainty and thus the random dis-turbances. In this case a fixed value of the control variable is replaced with a more flexible object: a function that allows considering the current state and taking the best decision for the particular situation. The fact that the control law provides an optimal decision variable value for each of the states that can potentially occur is the great ad-vantage of using a closed-loop policy instead of an open-loop control sequence. Given that, the solution of the optimal management of a system affected by random disturbance seems to be easy: we have to define an Optimal Control problem and compute the optimal pol-icy. The algorithm to solve this problem has been introduced in the early 1960s (Bellman et al., 1962) and is known as Stochastic Dynamic

(13)

Programming (SDP), which is in fact the stochastic version of the de-terministicDP. Theoretically, this approach works perfectly and allow us to solve the functional problem, but there is an operative issue: the so called "curse of dimensionality". The computational cost increases exponentially with state, decision and disturbance dimensions and so SDP cannot be used with water systems where the number of reser-voirs is greater than a few (2–3) units (Castelletti et al., 2010).

As we will see in chapter 2, there are lots of proposed approaches that try to overtake this issue. All of them are based on the idea that the curse of dimensionality can be overcome on condition that one abandons the functional approach, which aims at determining the optimal policy in the space of all the feasible policies, and accepts to deal with a parametric approach, which searches the best policy within a fixed class of functions. In fact, the parametric problem is easier to solve but its solution is deeply dependent on the choice of the functional form.

1.2 p r o p o s e d p r o c e d u r e

The proposed methodology is based on the idea of generating lots of synthetic samples (120,000 samples, each one representative of a month) for the random disturbances, with the same statistical prop-erties of the historical observations available. Working like this, the random disturbances are not unknown anymore, since they are deter-ministically known for all the time steps of the horizon.

Starting from this idea, the approach we proposed in this thesis work splits theOCproblem into two independent subproblems. First, we set up the MP problem related to the original OC problem, us-ing as random disturbances the synthetic data previously computed. Solving the MP problem, we obtain the optimal open-loop control sequence, which is a vector whose elements represent the values of the decision variable for each month of the considered horizon. In fact, the result we obtained after this first step of the procedure is a dataset composed by:

• the information on the system which is available at the time t, (It);

• the optimal decisions for the same interval, (ut).

In the second step of the procedure, the obtained dataset is used to identify the optimal policy for each of the considered reservoirs. In order to define a control law we have to fix two interconnected issues: the structure of the function and the variables to provide as argu-ments. With regard to the form of the control law, we selected a NN. This choice is the core of the whole procedure because let us over-take the complexity of the functional approach using a parametric approach. The unknown of the problem is now the parametrization

(14)

of the neural architecture we adopted for the control law. The reason why the procedure seems to be tricky is that we are able to obtain a parametric problem using at the same time a general architecture, since it is demonstrated that Neural Networks (NNs) are universal approximators (Hornik et al., 1989).

It is then necessary to select the arguments of each control law. Since we want to obtain a policy which can be used in a practical application, the input to the control laws has to be something which is known at the time t, when theDMhas to take the decision on the release for each of the considered reservoirs. In principle, one can not know which is the best input set and thus we will consider some different cases, considering the variables composing the information vector.

1.3 p r o c e d u r e t e s t i n g o n n i l e r i v e r c a s e s t u d y

The final step of the thesis focuses on the evaluation of the perfor-mances obtained on the Nile River system. On one hand, we want to understand how the performance varies as function of the infor-mation provided to the policy. On the other, it is important to make a comparison between the neural policy and the open-loop control sequence obtained solving the deterministicMPproblem. Evaluating the performance in this way, we have to keep in mind that there is a sort of information asymmetry between the two cases. In the first case, the scenario is deterministically known, while the second is char-acterized by a complete uncertainty about the future events since the only information given as argument of the control law is the one ob-servable at time t. The logical consequence of this is that it is nearly impossible for the neural control law to perform as well as the deter-ministic open-loop control sequence. That said, the interesting thing to investigate is how far the two solutions are. We do not have other control systems in order to compare with the neural policy, since an historical policy is not available. In order to fill this gap, we intro-duced linear and trivial policies which can help us understanding how good is the performance of the neural policy.

To evaluate our methodology we considered a model of the Nile River basin hydrological system. It is important to underline that we do not want to model every detail of the Nile River basin and propose an operative solution to the management of these system. To do that a lot more time and information (which often belongs to private com-panies that do not share them) are needed. We introduced a number of approximations in the modeling of the system components and did not consider some parts of the system in order to obtain a simplified problem, adapted to our purpose.

It is necessary to underline the following concept: nowadays there is not a strong Nile river basin authority. Some attempts in creating a

(15)

partnership between the riparian countries had been done in the past (the most important is the Nile Basin Initiative), but the quarrels be-tween the members and the wide range of interests and actors make it difficult to set up a coordinated and efficient water resources man-agement. The considered part of the basin belongs to three different country (Egypt, Sudan and Ethiopia) and the infrastructures located inside it are managed by public authorities and private corporations. Each of these actors has its own objectives that can not be aggre-gated as we will do in our modeling process: in fact we are going to suppose the only existing stakeholders are a hydro power produc-tion company and an ideal agricultural authority of the whole basin which cooperate in order to find a win-win solution. The simplified system obtained as a result of these hypotheses does not represent the real system in a detailed way but is useful to test the proposed methodology. Nevertheless, the analysis provides some interesting points, useful to understand some aspects of the real system.

1.4 d i s s e r tat i o n o u t l i n e

The rest of this thesis is structured as follows. Chapter 2 describes the state of art: an overview on the related works that regard the multi-reservoir optimal operation. Chapter 3 provides a detailed and formal description of the problem. Modeling aspects are presented and the attention is focused to two topics: the description of each component of the system and the generation of the synthetic series required as input. Chapter 4 shows the GAsetting and the optimization process for the deterministic problem. Chapter 5 provides the process for the NN based control laws identification. Chapter 6 is dedicated to the evaluation of the performance of the identified control laws and pro-vides a comparison between the policy obtained with this approach and some alternatives policies. In chapter 7 we report the conclusion and the future directions of research are explained.

(16)

2

B A C K G R O U N D A N D R E L AT E D W O R K S

Modeling the problem of complex multi-reservoir system manage-ment as an optimization problem, makes it possible to find an cient solution, providing the optimal sequence of releases or an effi-cient policy. Multi-reservoir network operation is one of the classical topics in water resources management and it is both interesting from the theoretical point of view, since many classical and innovative op-timization techniques have been used to solve it, and practically im-portant for theDMsbecause it provides them with a useful decision support tool. In order to efficiently manage a complex multi-reservoir system and to solve the disputes between the stakeholders, an opti-mization model has to be applied to determine the optimal releases from the reservoirs (Lin et al., 2016).

An optimization problem can be easily formalized in a general form as follows: min u J (u) (2.1) subject to f (u) = 0 g (u) > 0 (2.2) where:

• J (u) is the objective function vector. Each element of the objec-tive function vector is a function which measures a different performance of the system. When this vector has only one ele-ment, it collapses to a scalar and we are dealing with a Single Objective (SO) problem. Otherwise, the problem is Multi Objec-tive (MO) and thus the result of the optimization will be set of Pareto efficient alternatives. Here we are going to consider J (u) as a scalar (J (u)). TheMO problem will be discussed in the sec-tion 2.3.;

• f (u) and g (u) are vectors containing equality and inequality constraints respectively;

• u is the decision vector and its components are the decision variables: u ={ut} with t = 0, 1, . . . , h − 1.

Note that in this work we are considering dynamic systems and thus we use the notation ut which explicitly shows the time step, but this is not necessary since the same method could be applied also to non-dynamic systems. A solution to the problem is feasible when the

(17)

values of the selected decisions satisfy all the constraint simultane-ously. The goal of the procedure is to optimize the objective function by properly choosing the decision variables from the feasible set U, which is a vector whose generic element Ut represents the set of de-cision variables that respect the constraint at time t.

Looking at the problem 2.2, it seems that we excluded determin-istic and random disturbances. In fact determindetermin-istic disturbances are not a problem because they are known and thus they are implicitly considered in this notation. On the other hand, introducing random disturbances requires a greater effort since they affect the constraint system and the objective function transforming it in something that is not deterministically defined but random. In this case, a prospec-tive change is needed because the open-loop control sequence is not appropriate anymore due to the presence of the uncertainty.

2.1 d e t e r m i n i s t i c p r o b l e m

If we are dealing with a deterministic problem, all the exogenous variable values are known at each time step. If we fix a finite h-step long horizon, we are searching for a value of the control variable ut at each time step.

2.1.1 Linear programming

When the function J (·), f (·) and g (·) are linear the problem is a Lin-ear Programming (LP) problem. These problems are analytically solv-able, for example using the Simplex method (Dantzig, 1963), which is the most famous among the algorithms for solving aLPProblem. Af-ter the discovery of the Simplex method, two other algorithms were identified: the Ellipsoid method (Khachiyan, 1979) and the Projection method (Karmarkar, 1984). The advantage of these two methods is that for them the computing time is polynomial, while the time re-quired by the Simplex method is exponential. Therefore, when the number of decision variables is high, the new algorithms are more ef-ficient than the Simplex. The result is that these three algorithms can solve problem with thousands of constraint and variables in very lim-ited running time.LP, along with Dynamic Programming, is probably the most used method to solve the multi-reservoir operation problem. Some examples are given by Guariso et al. (1987) and Needham et al. (2000) which presented an operational management of a reservoir system to reduce the flood risk at downstream control points.

2.1.2 Nonlinear programming

s tat i c a p p r oa c h Unfortunately, a multi-reservoir operation prob-lem can be re-conducted to aLPproblem only in some specific cases

(18)

and introducing some approximations. In the most general and fre-quent case, the objective function and the constraint system are not linear and thus different methods to find the sequence of optimal con-trols have to be used. In this case, we are dealing with a Nonlinear Programming problem.

The most natural way to solve theNLPproblem is to see it as static, in the same way we do when it is linear. In this case it is necessary to use an algorithm that can deal with the non-linearity of the problem instead of one of the methods proposed for LP Problem. Nonlinear functions can have multiple local optima. This could be a problem because if the solution found by the algorithm is not the global op-timum, it will translate to a sub-optimal control sequence. However, there are some particular situations where we can find the global optimum. For example, when J (·) is a convex function and the feasi-bility set is convex too, we are dealing with a Convex Programming problems (Rockafellar, 1970). In this case, the existence of only one sta-tionary point, which is the global optimum, is guaranteed. Another specific case ofNLPproblem is the Quadratic Programming (see Wolfe (1959) and Beale (1959)). The application of the Quadratic Program-ing in control problems of water systems is shown in Van Overloop (2006). The advantage of this approach is the easy computation of the derivative of the objective function given as a quadratic scheme, which allows finding the global optimum.

The typical approach to solve a constrained NLP problem in its more general form (both the objective function and the constraints are nonlinear), is the family of algorithms based on gradient descend (or ascent). The idea behind these methods is that, starting from a randomly chosen parameter value, the procedure iteratively moves along the direction of maximum decrease of the objective function. Gradient-based methods are intuitive and this is probably the main reason why they are very diffused. However, there are some issues: they may terminate in a local minimum, depending on the initializa-tion, the algorithm may diverge if the step-size is not appropriated for a certain problem and the computation of the gradient is not al-ways possible. Karaeren (2014) solved aNLPproblem with a gradient-based technique in order to maximize energy generation of a cascade hydropower system.

Another possible approach is the so called Penalty Function method (McCormick, 1983) which transforms a constrained problem into an unconstrained one by adding to the objective function a weighted penalty term that takes into account the violation of the constraint.

The great computing power of modern processors enabled the cre-ation of new families of algorithms: the Heuristic optimizcre-ation meth-ods. The main characteristic of these new methods is that they are less formal and more pragmatic with the respect to the others developed in the past.

(19)

A first intuitive example of what we are speaking about is given by the Random Optimization methods (Matyas, 1965). They investigate the decision variables (or parameters) space by thousands of random generations of the decision vector. The most intuitive method con-sists of sampling each decision variable from a uniform distribution (pseudo-random sequence). When the decision vector is composed by a high number of variables, sampling from a uniform distribution does not guarantee to obtain a homogeneous sampling of the multi-dimensional space. In order to achieve this goal it is better to use a quasi-random, or sub-random, sequence instead of a pseudo-random one (figure 2.1). The most famous example of quasi-random genera-tion methods are Latin Hypercube sampling (McKay et al., 2000) and Low-discrepancy methods (Sobol’, 1967). Using a randomized search

(a) Pseudo-random sampling. (b) Quasi-random sampling.

Figure 2.1: Comparison between a 2D pseudo-random sampling and a quasi-random sampling (Sobol sequence). A set of 256 points sampled from a pseudo-random number source (left) is com-pared with the first 256 points from a Sobol sequence (right). It is evident that the Sobol sequence covers the space more evenly. The first ten sampling are red, sampling from 11 to 100 are blue and from 101 to 256 are green.

method there are no problems with local minima or of divergence and no analytical properties for the objective function or the con-straint system are required. On the other hand, due to the random nature of the procedure, it is not possible to know the dimension of the random dataset that allows obtaining a satisfying solution. An-other weakness is that, since we are sampling a continuous space, the probability of finding a global optimum (but also a local optimum) is zero. For these reason they can be used in cascade to other method, in order to provide them a set of initial conditions.

A second example of heuristic optimization method are the Genetic Algorithms (GAs), which belong to the larger class of the Evolution-ary Algorithms (EAs).GAs are commonly used to generate solutions

(20)

to optimization problems by relying on bio-inspired operators such as mutation, crossover and selection (Holland, 1975), see figure 2.2. At the beginning, the use ofGAin determining the optimal reservoir

selection

crossover

mutation

evaluation

Figure 2.2: Flow chart of an iteration of aGA-based procedure. In one iter-ation of the genetic algorithm’s evolution, it operates in three stages: Selection, where it chooses a relatively fit subset of in-dividuals for breeding; Crossover, where it recombines pairs of breeders to create a new population; and Mutation, where it po-tentially modifies portions of new chromosomes to help main-tain the overall genetic diversity. Arrows in the diagram indicate transitions into the next genetic operation within one generation. Image from Numerical Optimization.

operating policy has not received significant attention from water re-sources engineers. Oliveira et al. (1997) were the first who explored the potential ofGAin reservoir operation. Two year later, Wardlaw et al. (1999), appliedGAto a deterministic finite horizon multi-reservoir system operation and concluded that the approach can be easily ap-plied to non-linear and complex systems. From there, the application of this method has become really diffused.

d y na m i c a p p r oa c h Next to_LP, Dynamic Programming has been the most popular optimization technique applied to water resources planning and management in general, and to reservoir operations in particular (Yakowitz, 1982). It is important to remark that the ap-proaches presented in paragraph 2.1.2, as suggested by its title, are in a certain sense static. On the other hand, a multi-reservoir opera-tion problem is dynamic because we are considering a dynamic sys-tem where a number of decisions on the control variable has to be taken at each time step.DPeffectively exploits the sequential decision

(21)

structure of reservoir system optimization problems. Another differ-ence between static and dynamic approach is that in the first case a discretization of state, control and deterministic disturbance is not required. On the other hand, in order to apply a dynamic approach, the system has to be transformed in a Finite State Automaton (FSA), whose evolution in time is usually represented with a weighted di-rected graph. Once the graph is obtained we are in fact dealing with a shortest path problem. In graph theory, the shortest path problem consists of finding a path between two nodes of the graph such that the sum of the weights of its constituent arches is minimized. The directions are established since, at time t, it is not possible to go back at time t − 1, but we must move to the following time step.

If we suppose we have an infinite computation power, the most intuitive procedure would consist of testing each possible path, se-lecting the best one (i.e the one with the min, or max, sum of weights on the arches). This is a problem solving technique called brute-force search or exhaustive search. This approach is possible only if the sys-tem that we are considering is really simple but unfortunately situa-tion like this rarely occurs in real world applicasitua-tions. There are mainly two approaches to overcome this issue. The first is based on the idea that, in order to find the best control sequence, it is necessary to find a way to restrict the number of evaluations needed. Otherwise we can use a Heuristic method, which has the same properties explained in paragraph 2.1.2 and thus finds a suboptimal solution.

The most famous method which allows finding the exact solution of the shortest path problem, reducing the number of required eval-uation is the Dynamic Programming (Bellman, 1957). In this work we will not use the Dynamic Programming and thus we are going to report only its fundamental concept. One of the key elements is the Bellman function, H∗· (·), which allows computing the minimum cost for the transitions between a generic step t and the final instant of the considered horizon. H∗_t(xt) =min ut g_t(xt, ut) + H∗t+1(xt+1) (2.3) The 2.3, is known as Bellman equation, and states that the optimal cost-to-go at time t, when the system is in the state xt, can be com-puted optimizing the sum of the step cost gt(xt, ut, wt), relative to the interval between t and t + 1, and the optimal cost-to-go H∗_t+1(xt+1) from t + 1 to the end of the h-step horizon. As suggested by the struc-ture of the equation, Bellman’s algorithm decomposes the original problem into subproblems (the transition between t and t + 1 is con-sidered as a single subproblem) that are solved sequentially over each stage. First, the optimal cost-to-go at the end of the horizon, H∗_h(xh), is initialized with the penalty, which represents the cost of leaving the system in a certain situation. In fact, when we solve a finite horizon problem, it is necessary to consider that the real system will not end

(22)

at the time h − 1. The mathematical instrument that let us take into ac-count the future after h − 1 is a cost relative to the fictitious time step [h − 1, h)called penalty. Once the procedure has been initialized, it is possible to compute the value of the cost-to-go, from H∗_h−1(xh−1)to H∗₀(x0), with a backward procedure expressed by the deterministic Bellman equation 2.3. Knowing the Bellman function is a sufficient condition for knowing the optimal policy. The decision sequence can be computed with the following equation:

u∗_t = argmin ut

gt(xt, ut) + H∗t+1(xt+1)

(2.4) where u∗t is the optimal decision at the time t, since it produces the optimal cost-to-go H∗_t(xt).

DP is a really strong tool because, under some hypotheses, allows finding the optimal solution for the considered graph. The genera-tion of this graph requires a discretizagenera-tion of something that is con-tinuous (i.e. the reservoir storage) and thus we find the optimal so-lution of an approximated problem. On the other hand, when using one of the methods described in paragraph 2.1.2 for Nonlinear Pro-gramming (NLP) problems, we find a suboptimal solution for the non-approximated problem.

The main limitation of the DPis the so called "curse of dimension-ality". The number of required computations growth exponentially with the dimension of the state and control vectors. This issue has a great importance since the algorithms became operatively unsolvable because of the unsustainable computational effort required, when the considered system is complex as the multi-reservoir network are.

It is clear that there is a trade-off regarding the dimension of the discretization step: if we select a coarse grid, the computational cost will be limited but we will introduce a bad approximation; if we select a dense grid, the graph will give a good approximation of the system but it will entail a high computational cost.

Another limitation of the DP is that it is not possible to explicitly consider a deterministic disturbance if it is represented as a dynamic process. This issue can be bypassed with a state enlarging which is a procedure that combines the original state and disturbance dynamic in a new enlarged state. Since this procedure increases the state di-mension, it causes additional computational problems (Soncini-Sessa et al., 2007).

The first application of DP to water systems management is prob-ably owed to Hall et al., 1961. Since then, the method has been system-atically applied to reservoir management, particularly for hydropower production (see e.g. Hall et al., 1968). Many authors make various attempts to limit the effect of the curse of dimensionality for the de-terministic case. Among them we mention the coarse grid approxi-mation technique (Bellman et al., 1962), State Incremental Dynamic Programming (Larson, 1968) and Differential Dynamic Programming

(23)

(Jacobson et al., 1970). More recently, Kumar et al., 2003, propose an interesting iterative DPmethod that tries to overcome the same prob-lem.

When the problem is deterministic, finding a sequence of controls is sufficient to solve efficiently the problem. Even if we find a se-quence of control laws, this does not produce a better management. However, many times the direct searching of the control sequence is not feasible due to the huge number of decision variables. Just to have a practical sense of what we are speaking about, I propose a simple ex-ample: a four-reservoir network on a 10-year horizon with a monthly time step. In this situation we are dealing with a 480-decision vari-ables optimization problem. Bellman approach is not feasible due to the state vector dimension. The problem is usually solved in a two-step procedure: first, the functional form of the control laws is fixed, with a few unknown parameters (e.g. a 3-parameter stationary con-trol law for each reservoir), and then the a MP Problem is solved in order to find the best parametrization. The advantage of this proce-dure is that we reduced the argument of the optimization from 480 controls to 12 parameters. The con is that we do not find the best control sequence but the optimal sequence for the fixed functional form. If we generate some synthetic series, this procedure allows us obtaining a policy that can be used in real application, simply solv-ing a deterministic problem (Ahmed et al., 2005). An analysis of the described procedure, which will be recalled in the subsection 2.2.3, shows that we transform a parametric problem to a functional one, searching a policy instead of a control sequence, and then from func-tional to parametric again when we fix the control law shape and search for the best parametrization.

2.2 r a n d o m p r o b l e m

2.2.1 Stochastic dynamic programming

Bellman also extended DP, which was initially defined only for the deterministic case, to the stochastic case (Bellman et al., 1962). The procedure is the same ofDP, but with the introduction of the random disturbance εt+1 (it is referred to the time step [t, t + 1), the subscript t + 1 indicates that the disturbance value will be known only at the end of the considered interval), the step cost gt(xt, ut, εt+1), is not deterministic but becomes a random variable. Therefore, it is neces-sary to introduce a statistical operator (i.e. expected value operator) in order to deal with this uncertainty. The Bellman equation for the random case is:

H∗_t(xt) =min

ut εt+1∼WNE t(·)

g_t(xt, ut, εt+1) + H∗t+1(xt+1)

(24)

As said, the only differences between deterministic (2.3) and stochas-tic (2.5) equation are the presence of the random disturbance εt+1 and of the expected value operator. We can easily compute the ran-dom corresponding equation of the 2.4:

m∗_t(xt) = argmin

ut _ε_t+1∼WNE _t(·)

gt(xt, ut, εt+1) + H∗t+1(xt+1)

(2.6) where m∗_t(xt) is the optimal control law, since it produces the op-timal cost-to-go H∗t(xt). These differences are fundamental because, while with the DP we obtain a sequence of control variables, here the result will be a sequence of control laws. In other words, we are now solving an Optimal Control problem, instead of a Mathematical Programming problem.

The course of dimensionality, which we introduced for the deter-ministic case in paragraph 2.1.2, affectsSDPtoo. Adding the random disturbance, which is assumed to be a white noise, the complexity of the procedure, and thus the running time, exponentially grows with both state and control dimensions, as happened for the deterministic case, and with random disturbance dimensions.

The curse of dimensionality can be overcome on condition that one abandons the functional approach, which aims at determining the optimal policy in the space of all the feasible policies, and accepts to use a parametric approach, which limits itself to finding the best policy in a given class. Soncini-Sessa et al. (2007) identify two possible strategies, which will be analyzed in the following sections.

2.2.2 Fixed-class cost-to-go

The first defines the class of policies indirectly, determining the class to which the functions expressing the cost-to-go must belong. In fact, we know that these specify the policy since the knowledge of the Bell-man function is a sufficient condition to determinate the related con-trol law (see equation 2.6). It is therefore still anSDP-based approach, but it makes it possible to reduce the discretization grid drastically.

The problem with the classicalSDPalgorithm is that in order to com-pute the optimal cost-to-go function H∗_t(xt), at each time step a great effort is required. It is possible to suppose that the Bellman function H∗_· (·) has some kind of regularity in its structure. The practical im-plication of this hypothesis is that we can introduce an approximate Bellman function ˜H (·, θ·), which has a stationary form since it does not depend explicitly on t. The time dependence is implicit because the parametrization θt is assumed to be periodic (e.g. the most natu-ral choice seems to be selecting a different parametrization for each month, but it is possible to do an additional simplification use only two different set of parameters: one for wet season and one for dry season. The choice depends on the considered system characteristics).

(25)

Finding an adequate structure for ˜H (·, θ·) is not easy. It is more difficult, for example, than trying to reproduce the control law, as we will do in paragraph 2.2.3. The reason is that it has to represent something that is less intuitive than a relationship between the state and decision on the release. Some interesting attempt to overcome this problem had been done using linear polynomials by Bellman et al. (1963) and Tsitsiklis et al. (1996). However, the most interesting architecture is that of the NNs, which are general structures with a great flexibility. This variation of SDP which use a neural structure to approximate the Bellman function, is called Neural Stochastic Dy-namic Programming (NSDP) and has been principally introduced by Bertsekas et al. (1995). More recent works on the same topic are have been developed by De Rigo et al. (2001) and Castelletti et al. (2007).

Once we fixed the structure of ˜H (·, θ·) and identified the best cy-clostationary parametrization θ∗_t for each time step of the period, it is possible to derive the related policy, which will be periodic with the same period of the parametrization.

˜ m∗_t(xt) =arg min ut _ε_t+1∼WNE _t(·) gt(xt, ut, εt+1) + ˜Ht+1 xt+1, θ∗t+1 (2.7) As said, the policy obtained is suboptimal because of the introduc-tion of the approximated Bellman funcintroduc-tion instead of the original one. Note that working like this, we need some tricky procedure in order to identify the best parametrization, since we have not train-ing and testtrain-ing sets (and we can not obtain them without solvtrain-ing the non-approximated problem with the classicSDPalgorithm).

2.2.3 Fixed-class policy

In the last subsection, we saw how it is possible to abandon the func-tional approach and use a parametric one in order to obtain an ap-proximated optimal cost-to-go function within a class of functions whose form is defined up to some parameters. Now we will follow the second strategy cited in the last part of the subsection 2.2.1.

The concept behind this strategy is similar to that used in the last subsection (2.2.2), but probably more intuitive. The idea of perform-ing the search in a parametric, rather than functional, space can be applied directly to the policy, as anticipated in the paragraph 2.1.2 for the deterministic problem. It is important to underline that this approach is not based on the SDP: we skip the computation of the cost-to-go function whose values remain implicit in the minimization process for the total cost.

The optimal policy p∗ is the sequence of the optimal control laws m∗_t(·) for each time step t, p∗ = {m∗_t(·) ; t = 0, . . . , h − 1}. This pol-icy is the unknown of our problem, but we can select reasonable

(26)

structures for the control laws ˜m·(·, θ·). We can make a simplifica-tion, considering a cyclostationary or a stationary policy or consider the general case of a time-varying policy and thus a different con-trol law for each time step. Since the optimal policy is substituted by that we have supposed, the obtained solution will be suboptimal. Once the supposed policy structure ˜p is fixed the problem becomes aMPproblem, because each control law is completely defined by his parametrization θt ={θt,i; i = 1, . . . , nt}. It is now possible to define the vector of vectors Θ ={θt; t = 0, . . . , h − 1} which is the unknown of the parametric problem 2.8.

Θ∗=arg min Θ εEt+1 "_h−1 X t=0 gt(xt, ut, wt, εt+1) + gh(xh) # (2.8a) subject to the constraint system

x_t+1= f_t(xt, ut, wt, εt+1) (2.8b)

ut =m˜t(xt, θt)∈ Ut(xt) (2.8c)

εt+1∼ WNt(·) (2.8d)

wh−1₀ given scenario (2.8e)

x₀= ¯x0 (2.8f)

Θ =|θ0, . . . , θh−1| (2.8g)

any other constraint

where t = 0, . . . , h − 1. Once this Mathematical Programming prob-lem is solved, the optimal value of the parameters is known and thus we identified the policy.

The problem 2.8 admits the existence of a deterministic disturbance wt, whose dynamics is described through a given scenario. Nothing prevents the argument of the control law ˜m_t(·, θ_t) being the pair (xt, wt), instead of the single state xt. Technically we say that the pair (xt, wt)is the sufficient statistic It.

In order to reduce the number of parameters, if the system is pe-riodic of period T, we can suppose that the functions ft(_{·), gt}(_·),

˜

mt(·, θt), which are time-varying in the most general case, has a cy-clostationary structure and parametrization.

It is important to underline also another idea expressed by Soncini-Sessa et al. (2007): a deterministic scenario can also be used to rep-resent, in a model-free way, the components of the system whose outputs are not influenced by the control.

Just to fix this fundamental concept, we will consider the example of the inflow from a catchment to a certain reservoir. If we want to provide a full model-based representation of the system, we have to consider a state vector composed by the storage of the reservoir, st, but also one or more variables which describe the catchment dynam-ics. The catchment model takes as input the disturbance and returns

(27)

as output the inflow at the reservoir. This part of the system is obvi-ously not influenced by the reservoir state and thus by the control.

If a recorded historical time series of the inflow, ih

1, is available we can abandon the model-based representation of the catchment and provide a model-free description. The catchment output, in fact, can be represented by the historical time series of the inflow at the reser-voir, which assumes the role of deterministic disturbance scenarios. Obviously, in this case, the control law must not depend on such dis-turbances (it+1), since they are not effectively known at time t and thus, use them as arguments of the control law would be equal to suppose that at time t we perfectly know what will happen in the future. On the other hand it is possible to consider the value it rela-tive to the step [t − 1, t) as a control law argument. It could be useful because we are effectively taking into account the hidden state of the catchment. The only issue is that we have to know the initial value of the inflow i0 which can be seen as the initial state of the catchment.

In order to set up the optimization problem that makes it possi-ble to find the best parametrization for the considered case, there is another issue to take into account. In the case we are now consider-ing, the random disturbances sequence (εh₁) has been replaced with a sequence of deterministic variables (ih

1). For this reason, it does not make sense anymore to compute the expected value with respect to the random disturbances, as we did in the equation 2.8a. However, given that by hypothesis the system that generates the disturbances is cycloergodic of period T, if the time horizon (t = 0, . . . , kT − 1) is sufficiently long, the expected value can be substituted by the mean value in this horizon and we can assume the penalty on the final state to be zero (Soncini-Sessa et al., 2007).

Now we are able to formalize the optimization problem 2.9.

Θ∗=arg min Θ 1 k k−1_X j=0   (j+1)T −1_X t=jT gtmodT(xt, ut, wt, it+1)   (2.9a)

subject to the constraint system

x_t+1= f_t_modT (xt, ut, wt, it+1) (2.9b) u_t =m˜_t_modT (xt, wt, it, θtmodT)∈ UtmodT(xt) (2.9c)

ikT₁ given scenario (2.9d)

wkT −1₀ given scenario (2.9e)

x₀= ¯x0 (2.9f)

i0 = ¯i0 (2.9g)

Θ =|θ0, . . . , θT −1| (2.9h)

(28)

where t = 0, . . . , kT − 1 and the expression tmodTindicates the rest of

the division t/T1 .

Up to now, we did not consider the structure of the control laws ˜

mt(·, θt) , t = 0, . . . , h − 1, which in fact is not known. It is necessary to fix a certain functional form for the control laws. The procedure will then return the best policy of all the policies belonging to the fixed class. This means that the result will be largely affected by the initial choice, and thus if we make a gross mistake in the selection of the policy class we are going to obtain a bad result.

Baglietto et al. (2006) proposed to approximate the control law se-quence with aNNsequence. This choice seems to be the best one for the same reason explained in the previous parameter. In fact we trans-form the functional problem in a parametric one (aNLPproblem), at cost of searching the best solution in a subset of the functional space containing all the possible functions. Using a neural structure, the class of function of which we search the best parametrization still remain really general.

It is not always necessary to have recourse to classes of functions which are as general asNNs. When the state dimension of the system is small it can be easy for the analyst to intuit the forms of efficient control laws (Soncini-Sessa et al., 2007).

To conclude this subsection, we will analyze pros and cons of this approach, which is the one considered in this thesis, with respect to the one based on SDP. Abandoning the SDP framework, gives three main significant advantages:

• it does not require the system to be a FSA. The problem of di-mensionality is thus completely overcome;

• it allows the design of a policy for a system affected by deter-ministic disturbances (readers will remember that this requires a state enlarging in order to capture the disturbance dynamics when we use SDP);

• the policy designed is more intuitive and comprehensible for a DM than a policy defined by the (2.6).

This approach has also some disadvantages:

• we will never know the performance given by the optimal pol-icy. We can only compare the performances obtained with dif-ferent policies for a historical scenario with the effective perfor-mance of the system. This can be done only when the system exists. If we want to use the method on something hypotheti-cal (e.g. to establish the best position of a reservoir) we can only 1 When the considered period is a 12-month interval, the expression t_modT takes val-ues from 0 (January) to 11 (December) and thus is the index that counts the month of each year.

(29)

make comparisons between the performances obtained with dif-ferent policies;

• when the number of parameters is high it could become really difficult to solve theMPproblem. Thus it is not completely cor-rect saying that we will obtain the best policy between the ones of the selected classes.

In this section we saw some approaches for the optimal manage-ment or a multi-reservoir system, when the system is affected by a random disturbance. Starting from the early 1980s, SDP is the most commonly used technique and many researches are based on it (see the reviews by Yakowitz (1982) and Yeh (1985) and the contributions by Piccardi et al. (1991), Castelletti et al. (2007), Goor et al. (2010) and Castelletti et al. (2010)).

There are also other works that propose a solution obtained mixing different approaches, extending solutions of the deterministic case to the stochastic one or using advanced techniques in order to improve the performances of classical methods. Reis et al. (2005) use a hy-brid genetic algorithm and linear programming approach, Nagesh Kumar et al. (2007) and Ostadrahimi et al. (2012) propose a Particle Swarm Optimization approach. To conclude, a general overview on algorithms to solve both deterministic and stochastic problems is pro-vided by Labadie (2004) and, more recently Lin et al. (2016).

2.3 pa r e t o e f f i c i e n c y

The problem we described in sections 2.1 and 2.2 are Single Objective problems. When we are considering a multi-reservoir system, we are almost certain that there will be more than one objective.

Pareto efficiency, or Pareto optimality, makes it possible to deal with a MO problem. Pareto efficiency is a state of allocation of re-sources in which it is impossible to make any one individual better off without making at least one individual worse off. Pareto improve-ment is defined to be a change to a different allocation that makes at least one individual better off without making any other individual worse off, given a certain initial allocation of goods among a set of individuals. An allocation is defined as "Pareto efficient" or "Pareto optimal" when no further Pareto improvements can be made.

When the problem is a MO problem, the goal is to find the Pareto front, instead of a single value as it happens for SO problem. Once the front is obtained, it is possible to make a posteriori a negotiation between the stakeholders and to select a compromise solution within all the efficient solutions which constitute the front.

(30)

If we assume to have a two-objective problem, J (u) =|J1(u) , J2(u)|, where both the objectives have to be minimized, the optimization problem assumes the following form:

min

u∈U|J1(u) , J2(u)| (2.10)

As said, the solution of problem 2.10 does not define a single optimal solution, at least when there is a trade-off between the two objectives, but a set of optimal solution: the Pareto front.

To find the Pareto front it is possible to use some methods such as the Weighting Method, the Constraint Method and the Reference Point Method.

In this work we will use the Weighting Method. The strategy used by this method is based on deriving from the MO problem, a para-metric SO problem. This is done computing a weighted sum of the two objectives, where the weights are λ and (1 − λ), for J1 and J2 respectively:

S (u, λ) = λ · J1(u) + (1 − λ) · J2(u) , λ ∈ [0, 1] (2.11a) min

u∈US (u, λ) (2.11b)

Solving the problem 2.11 for each possible value of the parameter λ, we obtain the Pareto front, which can be represented on the objective space (see figure 2.3).

J₁ J₂ J₂ J* J₁ J* A B U

Figure 2.3: A generic bi-dimensional objective space (J1, J2). Both the jectives have to be minimized. Point A represents the point ob-tained solving the problem taking into account only the objec-tive J1. B is computed considering only the objective J2.U is the utopia point.

In is important to underline that when we solve the problem for λ = 0, we find the optimal value of J2, without taking into account

(31)

J1 (2.12b). Likewise, if λ = 1, we will obtain the optimal value of J1, without considering the other objective (2.12a).

J∗₁=min

u∈UJ1(u) (2.12a)

J∗₂=min

u∈UJ2(u) (2.12b)

The point J∗₁, J∗₂ is termed utopia. It could be useful for the negotia-tion on the value of λ, to compute J∗₁ and J∗₂ since they represent the upper bounds of the objective ranges.

When it is not possible to set up a negotiation between the stake-holders, the selected point is generally the one where the curvature of the Pareto front is maximum. The reason why this seems to be a good choice is that moving from a point with a high curvature, means that a little improvement of an objective causes a great worsening of the other objective.

2.4 a r t i f i c i a l n e u r a l n e t w o r k

An Artificial Neural Network (ANN) is a complex of mathematical operators which tries to emulate the operating principles of human brain. The concept was first introduced by McCulloch et al. (1943) and is based on the idea of sharing the elaboration activity in parallel among many simple processing units, neurons, which are linked by connections, called synapses.

From a technical point of view, an ANN, is a non-linear regressor which expresses the functional relation between a vector of input vari-ables

|I1,t, I_2,t, . . . , Im,t| ,

and one or more output variables |y1,t, y2,t, . . . , yq,t| .

Artificial neurons are organized in layers, as shown in figure 2.4. An ANNcould have one (single-layerNN) or more layers (multi-layer NN). Every input is linked through the synapses to a set of neurons, each of which is constituted by an input function, usually a weighted sum, and an activation function. Each of the artificial neuron takes as inputs some variables from others neural units, with the relative weights, and a bias and returns a single output that will be sent to others neurons through the synapses. Its structure is shown in figure 2.5. The computation of the output of the generic neuron (the j-th neuron of the l-th layer) is performed by two different operators: first the input function combines inputs and bias, then the activation func-tion σ (·) transforms this combinafunc-tion of signals, ξl

j, to a single output ol_j which will be sent to others neurons.

(32)

Figure 2.4: An L-layer Neural Network. For simplicity’s sake, the same number of neurons n has been assumed in every hidden layer. The considered neural structure takes as input m variables and returns q outputs. The first l − 1 layers are termed hidden lay-ers, the l-th layer is the output layer and has the same number of neuron q as the dimension of the output vector yt. Image from Soncini-Sessa et al., 2007. Σ σ(•) ξl ξ_j βl β_j il 1,j γl 1,j γl n,j γl 2,j il 2,j il n,j ol j

Figure 2.5: Structure of the j-th artificial neuron of the l-th layer. The neuron receives as input the variables il·,j of the l-th layer, each one characterized by its weight γl

·,j, and the bias βlj. First a sum of these terms is computed. The result is the argument of the activation function σ (·) which returns the output, ol_j, relative to the considered neuron. Note that since there are n inputs, this neuron does not belong to the first hidden layer. A neuron of the first hidden layer would have m inputs following the structure of figure 2.4.

(33)

• the Heaviside step function (or threshold function) if the output has to be in the interval{0, 1}:

σ ξ_j =    0, if ξj< 0 1, if ξj> 0 (2.13)

or the sign function, which return an output in{−1, 1}:

σ ξ_j =    −1, if ξj< 0 1, if ξj> 0 (2.14)

• the piecewise-linear function, for example:

σ ξj =          0, if ξj < −1₂ ξ_j, if −1₂ 6 ξj< 1₂ 1, if ξj > 1₂ (2.15)

setting the parameters in order to obtain an output in the be-tween 0 and 1 as in the proposed formulation, or in the interval [−1, 1];

• the sigmoid function, which can return a result in [0, 1] if we use the logistic function:

σ ξ_j = 1

1 + eξj (2.16)

We can use the hyperbolic tangent if the output has to be in [−1, 1]:

σ ξj = tanh ξj = e

ξj_{− e}−ξj

eξj_{+ e}−ξj (2.17)

The typologies of neural structure we are going to consider in this work are the feedforwardNNs. In this networks, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops, typical of the recurrentNNs, in the network.

2.4.1 Single-layer network

When there is only one layer between inputs and outputs, we have a single-layer NN. In this architecture, there are not any hidden lay-ers and thus inputs and outputs of the network are separated only by the output layer. The most famous example of this typology is the perceptron (Rosenblatt, 1958). A perceptron is a single-layer NN,

(34)

whose neuron uses a weighted sum as the input function and a Heav-iside step function (or a sign function) as the activation function (see figure 2.6). In fact, it is a simple classifier which returns 1 when the linear combination of the weights and the inputs is above zero, oth-erwise 0. In this case, the weights are the parameters that have to be identified in order to calibrate the network. There is a simple

algo-Σ ξ β I1,t γ₁ y_t I2,t Im,t γ₂ γ_m

Figure 2.6: Structure of Rosenblatt’s perceptron. The neuron first computes a weighted sum of input variables and bias, then uses the acti-vation function (in this case a sign function). The single output of the perceptron is the scalar value yt.

rithm to do this, the delta rule. A procedure that calculates the errors between computed output and sample output data, and uses this to create an adjustment to the weights. This idea is very simple but also so important because it inspires the modern Neural Network training techniques.

2.4.2 Multi-layer network

Single-layerNNsappropriately work when we are dealing with a sim-ple function or system. In most cases, they are not adequate and it is necessary to implement a more complex neural architecture. When there is one or more hidden layers, we have a multi-layer network. A feedforward structure with one or more hidden layers is defined multi-layer perceptron (MLP). This architecture is very diffused nowa-days because it is a universal approximator: it is demonstrated that, using aMLP, it is possible to approximate every continuous function, under mild assumptions. The important result that guarantees this is the universal approximation theorem. This theorem states that a feedforward network with just one hidden layer containing a finite number of neurons (i.e. a MLP), can arbitrarily closely approximate continuous functions defined on a closed and bounded subset ofRn_, under the assumptions that the activation function σ (·) is

(35)

differen-tiable and monotonically increasing, and the input to the j-th neuron (ξj) enjoys the following property:

−∞ < lim ξj→−∞

σ ξj < lim ξj→∞

σ ξ_j <∞ (2.18)

The theorem thus states that simple Neural Networks can represent a wide variety of interesting functions when given appropriate pa-rameters; however, it does not touch upon the algorithmic learnabil-ity of those parameters. One of the first versions of the theorem was proved in 1989 for sigmoid activation functions (Cybenko, 1989). Hornik (1991) showed that it is not the specific choice of the activa-tion funcactiva-tion, but rather the multilayer feedforward architecture itself which givesNNsthe potential of being universal approximators.

In figure 2.7 we can see an example of MLP which respects the assumptions of the universal approximation theorem. Once the

archi-Figure 2.7: A 2-layer MLP with n neurons in the hidden layer, with sig-moidal activation function, and one neuron in the output layer, with linear activation function. Image from Soncini-Sessa et al. (2007).

tecture of theNN has been fixed, the relationship between input and output is not fully defined: the parameters of the model have to be identified. The vector of the parameters θ is composed by the weights

(36)

of the synapses and the biases relative to the neurons. Referring to the MLPin figure 2.7, the parametrization is:

θ =γ1_1,1, . . . , γ1_1,n, . . . , γ1_m,1, . . . , γ1_m,n, β1₁, . . . , β1_n, γ2₁, . . . , γ2_n, β2 (2.19) 2.4.3 Network identification

As for every other model, the identification of a NN requires two steps:

• the definition of the network structure;

• the identification of the value of each parameter.

These two steps are usually performed only one time, but when we are considering aNN, it is not easy to obtain a satisfying performance at the first attempt. The consequence is that it is better to set up an iterative identification process based on trial and error, in order to adjust the structure at each step basing on the obtained performance. d e t e r m i nat i o n o f t h e n e u r a l a r c h i t e c t u r e First of all it is necessary to define the structure of theNN, which is defined basing of three elements strictly dependent on one another:

• selection of the input variables;

• determination of the number of layers; • determination of the number of neurons.

Choosing the features of the network structure, we have to take into account that it is important to select the right complexity in order to guarantee a balance between having sufficient parameters to en-sure a good representation capacity and having too many parameters, which can result in overfitting.

Many authors suggest empirical criteria in order to help choosing the complexity of the network (Weigend et al. (1990) suggest selecting a number of parameters lower by at least one order of magnitude with respect to the dimension of the training set). The main problem with these criteria is that the optimal complexity is generally high problem-dependent. For this reason some systematic approaches have been proposed: pruning and constitutive algorithms.

Pruning algorithms are based on the idea that the optimal complex-ity can be obtained starting from an over-parametrized one, removing unnecessary synapses and/or neurons. These algorithms can be used also to prune possible insignificant input variables.

On the contrary, constitutive algorithms start from a very reduced network and then start adding new elements (i.e. layers, neurons and connections) in order to improve the performances of the model.