Comparative evaluation of grey- and black-box methods for forecasting building heating demand

(1)

black-box methods for forecasting building

heating demand

By

Paolo Vecchiolla

Dipartimento di Ingegneria dell’Energia,

dei Sistemi, del Territorio e delle Costruzioni

.

Università di Pisa

May 2019

Thesis advisors:

Prof. Daniele Testi

Dott. Ing. Paolo Conti

Dott. Ing. Eva Schito

(2)

(3)

T

he climate control of buildings has recently developed new methods for the purpose of energy saving. Modern control techniques require the knowledge of the building loads not only in real time, but also in the future. For this reason, a mathematical model of the building is necessary in order to make a prediction. These models are known in literature as white, grey or black box. A review of several models has been conducted.

The objective of this thesis is to compare four methods: CTSM (grey box), SEAS (grey box), correlation analysis (black box), NARX neural network (black box). The models have been tested on several building white-box models, all with ideal control in temperature, and with no internal gains. For each model, negative and positive aspects have been found.

The most effective model in terms of mean squared error of heat power prediction and relative error of energy prediction turned out to be the NARX model. This model has also been tested in other situations, such as with a real temperature control and in the presence of internal gains.

(4)

(5)

una laura, che deve tenere la testa al solito posto, cioè... sul collo.

(6)

Page List of Tables 7 List of Figures 7 List of Acronyms 10

I

Opening remarks

11

1 Introduction 12 1.1 Building models . . . 12

1.2 Purpose of this work . . . 13

1.3 Organization of this thesis . . . 14

2 State of art 15 2.1 White boxes . . . 15 2.1.1 CFD methods . . . 15 2.1.2 Zonal approach . . . 16 2.1.3 Nodal approach . . . 16 2.2 Grey boxes . . . 17 2.2.1 Model structure . . . 17 2.2.2 Identification algorithms . . . 19 2.3 Black boxes . . . 21

II

Grey boxes

24

3 Continuous Time Stochastic Model 25 3.1 Stochastic processes . . . 25 3.2 Itô calculus . . . 27 3.2.1 Itô integration . . . 27 3.2.2 Itô SDEs . . . 28 3.3 Parameters estimation . . . 29 3.4 Circuits . . . 31

(7)

4 SEAS model 34

4.1 Modelling of heat transfer through opaque walls . . . 35

4.2 Grey box . . . 36

4.2.1 Model . . . 36

4.2.2 Training . . . 36

III

Black boxes

37

5 Correlation analysis 38 5.1 Single input . . . 38

5.2 Multiple input . . . 39

5.3 Study case . . . 41

6 Nonlinear autoregressive exogenous model 42 6.1 Neural networks . . . 42

6.2 Recurrent Neural Networks . . . 43

6.3 NARX . . . 44 6.4 Study case . . . 46

IV

Auxiliary algorithms

48

7 Kálmán filter 49 7.1 Linear case . . . 49 7.2 Nonlinear case . . . 53

8 Trust region methods 55 8.1 Levenberg–Marquardt algorithm . . . 55

8.2 Trust region . . . 56

9 Nelder-Mead algorithm 58 9.1 Simplex . . . 58

9.2 The algorithm . . . 59

9.3 The initial simplex . . . 61

10 Limited-memory BFGS algorithm with box constraints 62 10.1 Quasi-Newton methods . . . 62 10.2 Broyden-Fletcher-Goldfarb-Shanno method . . . 63 10.3 Saving memory . . . 64 10.4 Box constraints . . . 64 11 Backpropagation 65 11.1 Gradient-based optimization . . . 65

(8)

11.3 Chain rule of calculus . . . 66

11.4 Levenberg-Marquardt backpropagation . . . 66

12 Preprocessing data for black box algorithms 67

V

Simulations and results

69

13 Training and validation data 70 13.1 Nodes model . . . 70

13.2 TRNSYS model: the Gagno building . . . 71

13.3 Palazzo Blu . . . 75

13.4 Climate data . . . 76

13.4.1 Reference year . . . 76

13.4.2 MERRA-2 database . . . 78

13.4.3 The whether in the weather . . . 79

14 Results 81 14.1 Comments for each method . . . 81

14.1.1 CTSM . . . 81

14.1.2 SEAS . . . 82

14.1.3 Correlation analysis . . . 84

14.1.4 NARX . . . 84

14.2 Comparison . . . 88

15 Getting more complicated 92 15.1 Sapiens influence . . . 92

15.2 Real control . . . 95

16 Conclusions and further development 98 A Model Predictive Control 101 B Other neural networks 102 B.1 Convolutional Neural Network . . . 102

B.1.1 The architecture . . . 102

B.2 Long short-term memory . . . 104

B.2.1 The problem with classic RNNs . . . 104

B.2.2 Solution . . . 104

(9)

Table Page

1.1 White, grey and black box differences . . . 14

3.1 Thermal-electrical analogy . . . 31

13.1 Walls of the simulated building . . . 71

13.2 Layers of vertical walls . . . 71

13.3 Layers of the roof . . . 72

13.4 Layers of the floor . . . 72

13.5 Windows properties . . . 72

13.6 Gagno apartment walls characteristics . . . 73

13.7 Gagno apartment windows characteristics . . . 75

13.8 Weather forecasting errors . . . 79

14.1 Qualitative comparison . . . 88

14.2 Quantitative comparison . . . 91

List of Figures

Figure Page 2.1 Grey-box methods scheme . . . 19

2.2 Data driven modelling techniques . . . 22

3.1 Random walk . . . 26

3.2 Standard Wiener process . . . 27

3.3 Grey-box circuits . . . 32

6.1 ANN . . . 43

6.2 RNN . . . 44

(10)

6.4 Simple open-loop NARX . . . 46

6.5 Closed-loop NARX used in this work . . . 47

7.1 Real system and its mathematical model . . . 50

7.2 Feedback and Kálmán filter . . . 51

9.1 2- and 3-simplexes . . . 59

9.2 Nelder-Melder algorithm example . . . 60

9.3 Nelder-Melder reflection . . . 61

13.1 Gagno apartment 1 . . . 73

13.2 Gagno apartment 2 . . . 74

13.3 Palazzo Blu . . . 76

13.4 Palazzo Blu visitors . . . 77

14.1 CTSM output . . . 83

14.2 SEAS output . . . 85

14.3 Correlation analysis weights . . . 86

14.4 Correlation analysis output . . . 87

14.5 NARX output . . . 89

14.6 RMSE comparison . . . 90

14.7 Relative error comparison . . . 90

15.1 Internal gains . . . 93

15.2 NARX with Sapiens influence . . . 94

15.3 On/Off controller . . . 95

15.4 NARX output with real control . . . 96

(11)

a scalar

a vector

a • b scalar product between two vectors a × b cross product between two vectors

A matrix

I identity matrix with implied dimensionality

A tensor

R the set of real numbers

Ai,k element on rawiand columnkof a matrixA

A| transpose of matrixA

det(A) determinant of matrixA

diag(A) Diagonal matrix that has the same diagonal as the matrixA

diag(a1, . . . , an) n × ndiagonal matrix with elementsa1, . . . , an on the main diagonal

d y

dx derivative ofywith respect tox ∇xf or ∂f ∂x Jacobian matrixJ ∈Rm×noff :R n →Rm E[ f (x)] expectation off (x) V[ f (x)] variance off (x) C[ f (x), g(x)] covariance off (x)andg(x)

N (x;µ,Σ) Gaussian distribution overxwith meanµand covariance Σ

log(x) natural logarithm ofx

exp(x) exponential function ofx

(12)

CTSM Continuous-Time Stochastic Model ANN Artificial Neural Network

RNN Recurrent Neural Network

NARX Nonlinear AutoRegressive model with eXogenous inputs MPC Model Predictive Control

HVAC Heating, Ventilation, and Air Conditioning RMSE Root mean squared error

TES Thermal Energy Storage

SEAS Simplified Energy Auditing Software SDE Stochastic Differential Equation

UT Unscented Transform

CTI Comitato Termotecnico Italiano

L-BFGS-B Limited-memory BFGS algorithm with Box constraints CFD Computational Fluid Dynamics

BFGS Broyden-Fletcher-Goldfarb-Shanno method GA Genetic Algorithm

PID Proportional–Integral–Derivative controller SISO Single-Input and Single-Output system MIMO Multiple-Input and Multiple-Output system

(13)

(14)

C

h

a

p

t

e

1

Introduction

I

n Europe the building sector is responsible for 40% of the total energy demand and for about 33% of the CO2 emissions [32]. In 30 years this sector should become CO2-free,

that means that:

• the heating demand has to be reduced by improving buildings;

• the remaining heating and cooling demand has to be covered by renewable energy sources.

Apart from the retrofitting and modernizations of the buildings, the cheaper and recently very popular approach for energy consumption optimization is to deploy advanced control algorithms. The most widespread solution is represented by Model Predictive Control.

Modern Predictive Control (MPC) is a family of controllers in which there is a direct use of an explicit and separately identifiable model [28]. The main advantage of MPC is the fact that it allows the current time step to be optimized, while keeping future time steps in account. In other words, model predictive controllers rely on dynamic mathematical models of the process. In the case of this work, the process is the thermal behavior of the building. This mathematical model is used to predict the future states of the building.

Hence, building energy consumption modelling and forecasting is a key tool to achieve smart and sustainable designs. It can guide energy management at local and global scales.

1.1 Building models

Before discussing the purpose of this work and presenting a general overview of the state of art, an explanation of the used terminology is required.

Thermal behavior of a building is based on some physical aspects of its thermodynamics. In order to model this phenomena, generally three types of approaches are used:

(15)

• black box; • grey box.

White box White box models are also called physics-based. Indeed, the system equations are derived using the governing laws of physics and the detailed knowledge of the underlying process.

Black box Black box models are also known as the data driven approach. The system data is collected under normal use or under a specific test1. Purpose of the black box is to find a relationship between the input and output variables using mathematical techniques [1].

Hence, black boxes are statistical models. Their particularity, compared with physical methods, is the fact that they do not require any physical information. No heat transfer equations, no thermal or geometrical parameters are preliminary needed. Indeed, black boxes are based on the implementation of a function deduced only from samples of training data describing the behaviour of a specific system.

In contrast, they are totally based on measures. Usually a huge amount of data is required, and in some cases it is difficult to collect the correct amount [26].

Grey box White-box approaches suggest that we are able to describe all physical mech-anisms with a high accuracy. However, although most of the thermal phenomenon are wellknown, some of them are based on assumptions and remain difficult to model accurately.

Grey boxes are models with parameters to be determined empirically (on the basis of measured data of real system) and have a physical significance. They are used in situations where prior knowledge of the object is not comprehensive enough for satisfactory modelling (white box) and, in addition, purely empirical (black box) methods do not suffice for the purpose of model making, or collecting data is too difficult [10].

Table 1.1, by Foucquier et al. [26], outlines the main differences between these three methods.

1.2 Purpose of this work

The purpose of this work is the comparison of four building modelling techniques: • CTSM (grey box);

• SEAS (grey box);

• Correlation Analysis (black box); • NARX (black box).

These methods are compared on the basis of the mean squared error on the power prevision, the relative error on the energy prevision, the computational cost, the physical interpretation and the amount of required data.

(16)

White box Black box Grey box Building geometry A detailed description

of the building geome-try is required

A description of the

geome-try is not required A rough description ofthe building geometry is required

Training data No training data are

re-quired A large amount of trainingdata collected over an ex-haustive period of time is re-quired

A small amount of training data collected over a short period of time is required Physical interpretation Results can be

inter-preted in physical terms

There are difficulties to in-terpret results in physical terms

Results can be inter-preted in physical terms

Table 1.1. White, grey and black box main differences.

1.3 Organization of this thesis

This work is organized in five parts.

Part I will continue with a comprehensive review of several grey- and black-box methods used in literature.

Part II will explain in details the two grey boxes used for this work. Chapter 3 is dedicated to the CTSM method and Chapter 4 to the SEAS model.

Part III will explain in details the two black boxes. Chapter 5 is dedicated to the Correlation Analysis and Chapter 6 to the NARX networks.

Part IV can be ignored without undermining the comprehension of the work. It includes several algorithms used for this thesis as supports to the grey and black boxes. Chapter 7 describes the Kalman filter, an algorithm used for stachastic estimation of variables. Chapters 8, 9, 10 and 11 describe four different optimization techinques. Chapter 12 explains how data are preprocessed (and postprocessed) before (and after) the optimization.

Part V shows the results and how they are obtained. Chapter 13 describes the datasets used for training and validation of the models. Chapter 14 presents a comment for each method and a quantitative comparison between them. Chapter 15 explains how the assumptions used for the model can be modified. Chapter 16 summirises all the models and the results of this work, and describes possible future development.

Appendix A describes the MPC, while Appendix B proposes other black boxes for the forecasting.

Unfortunately every model has its own notation and terminology. For this reason, also the Chapters of this work have different notation.

(17)

C

h

a

p

t

e

2

State of art

I

n this Chapter, a brief review of grey- and black-box methods in literature is given. First of all, a short comment on terminology is necessary. For this work, the terms white, grey and black boxesare used with the significance described in Section 1.1. However, in literature other terms are used to indicate the same types of modelling, too:

• white boxes are often called physics based modelling, transparent box or forward ap-proach;

• black boxes are most commonly called data driven, but also inverse approach; • grey boxes are also called hybrid models or, less often parametric modelling.

2.1 White boxes

Although white-box modelling is not the purpose of this work, it could be useful to report the main physics-based way to model buildings.

The physical techniques are based on equations describing the physical behaviour of heat transfer. Three main thermal building models are currently used: CFD, zonal and nodal methods.

2.1.1 CFD methods

The CFD approach is the most complete method in the thermal simulation of buildings. They are based on microscopic approach of the thermal transfer modelling allowing to detail the flow field. They basically solve the Navier-Stokes equations. For this reason, they produce a detailed description of the different flows inside buildings.

A huge number of CFD software are available such as Fluent, COMSOL Multiphysics, MIT-CFD, PHOENICS-CFD.

(18)

The main disadvantage of the CFD approach resides in its huge computation time. For this reason this approach is often substituted by the zonal approach.

2.1.2 Zonal approach

The zonal approach consists in dividing each building zone into several cells. One cell corresponds to a small part of a room. Thus, it is possible to evaluate the spatial distribution of different fields like temperature, pressure, concentration or air velocity remaining at a quite reasonable computation time. Wurtz et al. [51] showed that the zonal simulation is a suitable method for an accurate estimation of the temperature field in a room and of the indoor thermal comfort. Moreover, it allows also the visualization of building system airflows.

2.1.3 Nodal approach

The nodal approach is the simplest one. It relies on the following assumption: each building zone is an homogeneous volume characterized by uniform state variables.

Every one of the following energy simulation software is de facto a white box with nodal approach.

• TRNSYS, for example, is a commercial software package developed at the University of Wisconsin, that allows building simulations in a clear, physics-based way.

• EnergyPlus is a whole building energy simulation software developed by the U.S. Department of Energy Building Technology Office. It is used to model both energy consumption (for heating, cooling, ventilation, lighting, and process and plug loads) and water use in buildings. EnergyPlus is a console-based program that reads input and writes output to text files. Several comprehensive graphical interfaces for EnergyPlus are also available.

Also other types of software are currently used for the same purpose: • ApacheSim; • Carrier HAP; • DOE-2; • ESP-r; • IDA; • SPARK; • TAS;

For each of these softwares, all the characteristics of the building are specified upfront. Several toolboxes to simulate the building and HVAC systems have also been developed for Matlab Simulink [47].

American society of heating, refrigerating and air-conditioning engineers (ASHRAE) has published several handbooks for HVAC system fundamentals, that include dynamic, steady-state and quasi steady-steady-state models, both with zonal and nodal approach [24].

A big effort has been made by other Organizations for Standardizations. A lot of procedures are standardized for the dynamic simulation of buildings.

(19)

2.2 Grey boxes

Grey-box models are usually considered in literature the best modelling technique for the control of HVAC system [1, 73].

A grey box is usually a model involving several parameters. Once the model structure is defined, the methodology for estimating such parameters is called in literature system identification[85], that is, basically, an optimization procedure.

All the grey-box methods used for dynamics simulation of buildings differentiate each other for two characteristics:

• the type of model structure; • the optimization algorithm.

2.2.1 Model structure

Model structures used in literature could be very different, and the imagination of the author really plays its part. However, there are two common ways to model the power of a building: the phenomena approach and the nodal approach.

The "phenomena" approach The first model consider the heat exchange subdivided in parts, each related to a particular phenomenon. For each phenomenon, the heat is calculated through one or more parameters. Examples of phenomena are:

• heat conduction through opaque walls; • heat conduction through glazed surfaces; • ventilation;

• infiltration; • heat gains; • solar radiation.

and so on. Then, a combination of these phenomena is made.

The SEAS model (see References [89, 91]), described in chapter 4 is an example of this approach.

Another example is decribed by Singaravel et al. [87]. In this case, the heat flux is calculated not only for each phenomenon, but also for each part of the building envelope.

Nielsen et al. [67] used the same approach, particularly underlining the advantages related to the absence of state variables (nodes) in a grey-box method.

Ghrab-Morcos [29] showed how this approach could be applied to Mediterranean residential buildings for both hot and cold seasons. His procedure is called CHEOPS, and requires minimum input data.

(20)

The nodal approach Despite the fact that the "phenomena" approach is the easiest one, the second approach is by far the mostly used grey-box model in literature: the nodal approach. It implies the presence of state variables, i.e. temperatures of nodes. A set of equation (linear or nonlinear) relates these state variables. The number of these nodes may vary a lot, but it is usually very small because of the computational cost for optimizing the parameters.

In most cases the relations between nodes are visually described by an electric-thermal analogy (see Chapter 3), i.e. with circuits.

Himpe et al. [37] used a three-states model, for example. The state variables are: the temperature of the interior, a temperature in the building envelope, and a temperature in the walls and floors to the boundary zones. In this case, the effort to find a physical sense for the state variables can be noted.

Reynders et al. [77] compared a one-state, a two-states and a three-states models of buildings. They showed how in most cases the three-states model is the most accurate, although the two-states model is enough accurate with much less computational cost. Therefore the two-two-states model could be the best choice.

De Coninck et al. [17] trained several models for a single-family building with the number of states varying from 2 to 6, and the number of parameters varying from 2 to 14. They identified in particular the relationship between the forecasting and the identification data set.

Hudson et al. [40] used a state-space description of the building implemented in Matlab. Their model has six states: two external walls, the floor, the ceiling, one partition and the internal air. It is also explicitly used the common notation according to which the derivative of the state variables vector is a linear combination of the state variables themselves and the inputs.

The same notation is used by Déqué et al. [25]. Their purpose was to represent residential building envelopes based on the entry of a limited number of parameters. They found good results with ten parameters, comparing the results with software outputs and experimental measurements.

A good way to indicate a building model is proposed by Brastein et al. [69]. In their notation R3C2 indicates a circuit model with three resistances and two capacitors. Their paper investigates the dispersion of parameter estimates by use of randomisation. They show that, in order to assign a physical interpretation to grey-box model parameters, it is required that the estimated parameters converge independently of the initial conditions and different datasets.

Harb et al. [35] developed a model with the same notation, adding several parameters for the internal gains related to the occupancy of the buildings. The analysis revealed that a two-capacity model structure with an additional consideration of the indoor air as a mass-less node (4R2C-model) enables the most accurate qualitative prediction of the indoor temperature.

Afram et al. [3] concentrated their work on the grey-box modelling of residential HVAC system, i.e. the generator. Although, the same methods could be applied when the grey-box models the dynamic of a building, instead.

(21)

Grey-box models Combination of Structure model Identification method "Phenomena" approach Circuit Least squares Statistical method

Figure 2.1. Grey-box methods scheme.

2.2.2 Identification algorithms

A list of all the identification algorithms was made by Ljung [61]. However, regarding the building dynamic forecasting, two different identification method types has mostly been identified:

• minimizing-the-error methods (from here on out they will be called least squares methods, although different measures of the error could be used);

• statistical methods.

Least squares methods According to this approach, the measured output is compared to the model output, that depends on the parameters. The error is calculated for each measure-ment. The overall solution minimizes the sum of the squares of the residuals made in the results of every measured value. In general, the method can be linear or nonlinear depending on the model structure. However, the purpose is always the minimization of a cost function of the parameters.

Afram et al. [3], for instance, used a nonlinear least squares optimization implemented in the Simulink Control and Estimation Tools Manager.

Fux et al. [27] used a Nelder-Mead simplex algorithm (described in Chapter 9) for minimizing the mean squared error.

Lee et al. [58] used a classical (non numerical) method: the Kuhn-Tucker equations. Almost all the the papers cited in Section 2.2.1 use other types of least squares optimization, but the particular algorithm is not often specified.

Recently the most common algorithms for optimizing such a cost function are the Genetic Algorithms (GAs). GAs are adaptive heuristic or metaheuristic search algorithms that belong to the larger part of evolutionary algorithms, inspired by the process of natural selection: the

(22)

basic techniques of the GAs are designed to simulate processes in natural systems necessary for evolution [75]. As such they represent an intelligent exploitation of a random search used to solve optimization problems. Although randomised, GAs are by no means random, instead they exploit historical information to direct the search into the region of better performance within the search space.

How do GAs simulate the process of natural selection? In nature, those species who can adapt to changes in their environment are able to survive and reproduce and go to next generation. In simple words, they simulate survival of the fittest among individual of consecutive generation for solving a problem. Each generation consists of a population of individuals and each individual represents a point in search space and possible solution. Each individual is represented as a string of character/integer/float/bits. This string is analogous to the Chromosome.

A nodal approach is used together with a Genetic Algorithm optimization by several authors: Lauret et al. [56] , Wang et al. [93, 94], Znouda et al. [97].

Siddharth et al. [86] developed a grey box software with nine control variables to be determined through a GA.

Tuhus-Dubrow et al. [92] developed a tool to optimize the building geometry, described by a comprehensive list of parameters. Purpose of this optimization, however, was not finding a grey-box modelling, but rather finding an optimal building shape for residential houses. The results of the optimization indicate that rectangular and trapezoidal shaped buildings consistently have the best performance.

Statistical methods Statistical methods imply the use of the maximum likelihood and/or stochastic differential equations, better described in Chapter 3.

One example of such methods is the work by Brastein et al. [69]. Their circuit parametric model is described by stochastic differential equations, and the maximum likelihood is used to find the parameters. For this particular work, the COBYLA optimization algorithm is used.

Stochastic differential equations are also used by Prívara et al. [73, 74]. They stated that this model (called 4SID) is particularly suitable when prior information about the building is known.

Much work has been done by the researcher of the Technical University of Denmark [6, 7, 45, 46, 55, 62, 68]. Their work regards the computation of the likelihood based on the Kalman filter. More details about these efforts are shown in Chapter 7.

Reynders et al. [76] used the same method to calculate how much the uncertainty increases when the building is occupied by people.

The Kalman filter is also used after the parameters estimation, for a more accurate output forecasting. Fux et al. [27] proposed a self-adaptive thermal building models. The building model parameters are tuned automatically based on online measurement data of the reference room temperature.

(23)

2.3 Black boxes

According to Bourdeau et al. [11], among the three main approaches in building energy consumption modelling and forecasting, data-driven techniques emerge as the most suitable option to ensure the integration of buildings in smart environments. These frameworks include algorithms that take benefit from the recent significant developments in the field of machine learning in recent years, providing flexibility and reliability to modelling and forecasting tools.

As already showed in Table 1.1, a black-box method is not interpreted in physical terms. This inconsistence with physical reality when applied under hard conditions implies that black-box models are mainly used for error detection. Their advantage is the rapid and automated identification of outputs of thermal energy building consumption [1, 5].

Afram et al. [1] made an accurate review of all the black-box methods used in literature. Although this paper actually concerns the HVAC systems modelling, the same observations could be made for the building modelling with a few adjustments. Figure 2.2, from this paper, organizes all the black boxes in nine subcategories.

Frequency Domain models They are first and second order model developed for SISO or MIMO systems. They are based on the transfer function of the model, with one or two poles (first or second order). The term dead time means the time delay due to the heavy thermal inertia of the system.

G(s) = K

τs + 1e−Ls (2.1)

G(s) = 1

as2_{+ bs + c}e

−Ls _(2.2)

The models have three or four parameters to be determined from the measured data. Due to the wealth of the literature on first and second-order systems, the controller design is also straight forward [9].

Sometimes this method is classified as grey box because of the presence of parameters. Data mining Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Nowadays the mostly used data mining methods are the artificial neural networks (ANNs), described in Chapter 6. A comprehensive survey of ANN applications in building energy systems was provided in Reference [48]. This paper analyzes all the applications in literature: not only forecasting the energy consumption, but also the useful energy of a solar water heating system, solar radiation, wind speed, airflow in a building, indoor temperature.

In particular, Kalogirou et al. [49] showed how the ANNs are able to predict the daily energy needed for heating an unoccupied building of which some physical elements are known. In an other paper [50], Kalogirou also propose a statistical method to verify the quality of the ANN methods.

Ben-Nakhi et al. [8] used General Regression Neural Networks for cooling load prediction for buildings, in order to optimally control the TES.

(24)

Black box models Statistical models Auto Regression eXogeneous Polynomial Time Series Regression Fuzzy Inference Adaptive Network

Takagi-Sugeno Models Fuzzy Adaptive Network

Support Vector Machine Artificial Neural Network Over-damped process with Dead Time

ARMAX

Sub-Space State Identification Thin Plate Spline Approximation Topological Case Base modelling Probability Density Function Approx.

Just In Time (JIT) model Frequency Domain models

Data mining Fuzzy logic State-Space models Geometric models Case-based Reasoning Stochastic Models Instantaneous Models

(25)

Fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1 inclusive. A fuzzy model predictive control (FMPC) approach is introduced to design a control system for a highly nonlinear process. In this approach, a process system is described by a fuzzy convolution model that consists of a number of quasi-linear fuzzy implications. See References by Killian et al. [51–53] for more information. A general time series modelling technique called fuzzy time series has been developed by Song et al. [88]. Li et al. [59] presented an alternative approach, namely hybrid genetic algorithm-adaptive network-based fuzzy inference system. This model is used for building energy prediction, and the authors stated that it has better performance than ANNs in term of prediction accuracy.

Statistical models The statistical black box models consist of single and multivariate regression, autoregressive exogenous (ARX), autoregressive moving average exogenous (AR-MAX), autoregressive integrated moving average (ARIMA), finite impulse response (FIR), Box Jenkins (BJ), and output error (OE) models. See References [85] or [33] for more information about the mathematical bases of these methods.

A review of all these forecasting technique in literature for buildings energy consumption is made by Deb et al. [19].

Gustin et al. [31] compared two of these methods, ARX and ARMAX, for forecasting the indoor temperatures. They demonstrated that whilst they produce almost identical one-step-ahead forecasts when longer multi-step-one-step-ahead forecasts are performed, with a recursive strategy, the ARX models were simpler to derive and offered slightly more consistent, reliable and accurate predictions.

Statistical models have been also used for other types of forecasting. Li et al. [60], for example, take in account temperature, precipitation,insolation and humidity in order to forecast the power output of a grid connected 2.1 kW photovoltaic system. The method used is ARMAX, and this paper contains a comparison with a previous work on the same photovoltaic system (ARIMA).

Stochastic models Stochastic models deal with the stochastic processes. Some processes in the buildings act as random variables and can be modelled using the probability density functions. A description of what a stochastic process is can be found at Section 3.1.

According to Afram et al. [1], a large amount of data is required to obtain the accurate shape of the PDF of a random variable. The model predictions suffer if the PDF is not modelled properly. Nevertheless, stochastic models has been recently used more and more. Gaussian processes forecasting are used in several situations: finance [78], energy [96], demography [95]. In particular, Gaussian processes are widely used for time series forecasting in general. The argument is really broad, but Roberts et al. [79] provided a detailed explanation and good examples to get a closer look to this subject. The argument is really visual rather than mathematical, hence many video tutorial are easily available on Internet with the tag Gaussian processes forecastingor Gaussian process machine learning.

(26)

(27)

C

h

a

p

t

e

3

Continuous Time Stochastic Model

S

tochastic differential equations provide a grey-box modelling in system identification (Söderströhm and Stoica, 1989, [85]), namely by combining a priori knowledge about the system and statistical methods for parameter estimation and model validation [68]. Purpose of this Chapter is to describe of method of identification of parameters of an Itô nonlinear stochastic differential equations (SDEs) driven by Wiener processes using discrete-time measurements.

3.1 Stochastic processes

A stochastic process is usually defined as a collection of random variables, usually viewed as points in time, giving the interpretation of a stochastic process representing numerical values of some system randomly changing over time. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner [23].

A stochastic process may be discrete-time or continuous-time. When continuous-time, they are also called random function because it can also be interpreted as a random element in a function space [70].

Here, only some relevant stochastic processes are explained informally. For a formal definition of SPs and for more examples, see Reference [4].

Bernoulli process A Bernoulli process is a sequence (finite or infinite) of indipendent random variablesx₁,x₂,. . ., such that:

• for each i, the value ofxiis either 0 or 1;

• the probability ofx_i_{= 1}is p, with_{0 ≤ p ≤ 1}and pis euqal for all values ofi.

(28)

0 0.2 0.4 0.6 0.8 1 time -1 0 1 2 3 4 5

Figure 3.1. A random walk stochastic process.

Random walk A random walk describes a path that consists of a succession of random steps on some mathematical space. One example of this discrete-time stochastic process is the random walk on the integer number line Z: starting at 0, for each time step it moves +1 or -1 with equal probability. Figure 3.1 shows an example of random walk with time step1/50and

equal probability to go up or down.

Wiener process Wiener process is a continuous-time stochastic process also called Brow-nian motion process due to its historical connection with the physical process known as Brownian movement.

The Wiener processw(t)is characterised by the following properties:

• w(0) = 0almost surely;

• w(t)has independent increments: for everyt > 0, the future incrementsw(t + u) − w(t),

withu ≥ 0, are independent on the past valuesw(s), withs < t;

• w(t)has Gaussian increments:w(t + u) − w(t)is normally distributed with mean0and

variance u, in probability theory notation:_{w(t + u) − w(t) ∼ N (0, u)};

It can be noted that:

lim

(29)

0 0.2 0.4 0.6 0.8 1 time -0.5 0 0.5 1 1.5 2 2.5

Figure 3.2. A standard Wiener stochastic process.

and so the Wiener process here defined is always a continuous function. Figure 3.2 shows an example of Wiener process with unit variance.

It can be demonstrated that a Wiener process can be viewed as a limit of a random walk for small increments.

3.2 Itô calculus

Kiyosi Itô (1915 – 2008) was a Japanese mathematician who pioneered the theory of stochastic integration and stochastic differential equations, now known as the Itô calculus. Its basic concept is the Itô integral [43].

3.2.1 Itô integration

Letw(t) be a standard Wiener process, i.e. w(0) = 0andw(t + u) − w(t) ∼ N (0, u). The

change in Brownian motion in timedtis formally calleddw(t). The independent increments

property implies thatdw(t)is independent ofdw(t∗)whent∗_{6= t}.

Let f (t)be a stochastic process. Corresponding to the Riemann sum approximation to the

Riemann integral we define the following approximations to the Ito integral:

Y_∆t₌ X

tk<t

(30)

where ∆w_k= w(t_k+1− w(tk)). If the limit exists, the Itô integral is

Y (t) = lim

∆t→0Y∆t(t) (3.3)

and it is possible to use the notation:

Y (T) = Z T

0

f (t)dw(t) (3.4)

3.2.2 Itô SDEs

A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, therefore the solution is also a stochastic process [44].

The following notation is mostly the same of Reference [55]. LetS be a dynamic system and letxt∈Rd,t0≤ t ≤ T be the states ofS.

The systemS may be described by a set of nonlinear SDEs with measurement noise, i.e.: dxt= f (xt, ut, t,θ) + σ(ut, t,θ)dwt (3.5)

where:

• t ∈R is the time variable;

• xt∈ X ⊂Rnis a vector of state variables;

• ut∈ U ⊂Rm is a vector of input variables;

• θ ∈Θ⊂Rp is a vector of (possibly unknowns) parameters; • f (·) ∈Rnandσ(·) ∈Rn×nare nonlinear functions;

• wt is ann-dimensional standard Wiener process.

The first term on the right-hand side is commonly called drift term and the second term is commonly called diffusion term.

The systemS is discretely partially observed through the output variable y_k∈ Y ⊂Rl:

y_k= h(xk, uk, t,θ) + vk (3.6)

where:

• k = 0,..., N are sampling instants;

• h(·) ∈Rl is a nonlinear function;

• vk∼ N ¡0, R(uk, tk,θ)¢is anl-dimensional white noise process withR(·)the covariance

(31)

Linear model The sub-class of linear models is described by the equations: dxt= £ Φ(xt, ut, t,θ) xt+Ψ(xt, ut, t,θ) ut ¤ dt + σ(ut, t,θ)dwt (3.7) y_k= C(xt, ut, t,θ) xt+ D(xt, ut, t,θ) ut+ vk (3.8)

where A(·) ∈Rn×n, B(·) ∈Rn×m, C(·) ∈Rl×n, D(·) ∈Rl×m,σ(·) ∈Rn×nare nonlinear func-tions.

In addition, if the model is also time invariant (LTI), it can be described by equations: dxt=£ A(θ) xt+ B(θ) ut

¤

dt + σ(θ)dwt (3.9)

y_k= C(θ) xt+ D(θ) ut+ vk (3.10)

where this timevk is anl-dimensional white noise process withvk∼ N¡0, R(θ)¢

3.3 Parameters estimation

The extended Kalman filter (Section 7.2) works on linearized and discretized models. Dis-cretization and reshaping [62] of Equations 3.9 and 3.10 gives:

xk= Axk−1+ Buk+ wk (3.11) y_k= Cxk+ Duk+ vk (3.12)

The dependence of A,B, C, D on the parameters θis implicit. If unknown, even the

co-variances of noisesQ =E[wk· w|_k]andR =E[vk· v|_k]can be considered parameters, and be

included inθ.

Such parameters can be determined by maximizing the likelihood function of a given sequence of observation y₀, y₁, . . . , y_N. Let the following notation be used:

Yk= [yk, yk−1, . . . , y1, y0] (3.13)

The likelihood functionL(θ;Yk)is defined as a function of the parametersθequal to the

density of the observed data with respect to a common or reference measure, for both discrete and continuous probability distributions. In other words, the likelihood functions describes the plausibility of a model parameter value, given specific observed data:

L(θ;YN) = p(YN|θ) (3.14)

wherep(YN|θ)is the density function for the valuesYNfor the parameter valueθ. Technically,

L(θ;YN)should not be considered a conditional probability density, whereasp(YN|θ)should.

If the axiomP(AS B) = P(A|B) P(B)is applied, Eq. 3.14 turns into: L(θ;YN) = Ã N Y k=1 p( y_k|Yk−1,θ) ! p¡ y₀|θ¢ (3.15)

(32)

Since the diffusion term in Eq. 3.5 does not depend on the state variablex, and since the

increments of a Wiener process are Gaussian, it is reasonable to assume that the conditional densities can be well approximated by Gaussian densities. A method based on the extended Kálmán filter, which is linear, is applied [55]. See Chapter 7 for information about how a Kálmán filter works.

The following notation is used:

ˆy_k|k−1=E[ y_k|Yk−1,θ] (3.16)

S_k|k−1=V[ y_k|Yk−1,θ] (3.17)

²k= yk− ˆyk|k−1 (3.18)

where E[•]and V[•]are the expectation and the covariance operators. Note thatS_k|k−1is a

l × lmatrix. The likelihood function can be written as: L(θ;YN) =    N Y k=1 exp¡ −1 2²TkS−1k|k−1²k ¢ p det(S_k|k−1)¡p2π ¢l   p( y0|θ) (3.19)

Conditioning on y₀and taking the negative logarithm gives:

− log¡L(θ;YN|y0)¢ = 1 2 N X k=1 ³ log¡ det(S_k|k−1)¢ + ²TkS−1k|k−1²k ´ +1 2 µ N X k=1 ¶ log(2π) (3.20)

The last term does not depend on the parameters θ. At this point, the estimate of the parameters (and initial state if unknown) can now be determined by solving the following nonlinear optimization problem:

ˆ θ =argmin½XN k=1 ³ log¡ det(S_k|k−1)¢ + ²T kS−1k|k−1²k ¢ ¾ (3.21) It is important to notice thatS_k|k−1 is the same matrix defined in Chapter 7, used in the

first equation of the update step of the filter.

S_k|k−1= CPk|k−1C|+ R (3.22)

wherePk=V[xk− ˆxk]is the variance of the error.

Of course when the computation starts it is necessary to make an assumption aboutP_1|0,

whereasxˆ_1|0 can either be pre-specified or estimated along with the unknown parameters as

a part of the overall problem.

The likelihood has been maximized with two different optimization algorithms: Nelder-Mead (described in Chapter 9) and L-BFGS-B (described in Chapter 10). The difference in results of these two methods are showed in Section 14.2.

(33)

Thermal Electrical

Physical quantity Unit Physical quantity Unit Heat flux [W] Current [A] Temperature [K] Voltage [V] Thermal resistance [K/W] Electrical resistance [Ω] Heat capacity [J/K] Capacitance [F]

Table 3.1. Thermal-electrical analogy

3.4 Circuits

In order to simulate the dynamics of the building, the thermal-electrical analogy is applied [80]. Table 3.1 summarise this analogy.

Figure 3.3 shows several types of circuits used to model the building. See Section 2.2 for more References.

In this Section, the circuit in Figure 3.3b is analyzed, because of its better performance (described in Chapter 14.2). Equations 7.3 and 3.12 are written for this circuit. The inputs are:

• the internal temperatureT_i;

• the external temperatureT_e;

• the solar horizontal irradianceI;

Therefore, the vector of inputs at thek-th time step is represented by:

uk=   T_i,k T_e,k I_k   (3.23)

The vector of states is the vector of the temperature of the internall nodes (wall);

xk=

·T_1,k T2,k

¸

(3.24) The output is the measured state, i.e. the power P:

y_k= Pk (3.25)

The vector of parameters to be determined is:

θ = £R1 R2 R3 R4 C1 C2 Aw q1 q2 r1 T1,0 T2,0

¤|

(3.26) where:

(34)

R1 R2 R3 C1 C2 I Te P Ti T1 T2 R0 C0 (a) R1 R2 R3 R4 C1 C2 A I Te P Ti T1 T2 w (b) R1 R2 R3 C1 A I Te P Ti T1 w (c)

(35)

• C1,C2are heat capacities;

• A_w will be multiplied by the solar irradiance and it is a measure of the quantity of

irradiance is captured by the internal air of the building (for instance A_w could be

considered as the surface of windows);

• q₁andq₂are two positive real number such that diag(q1, q2)is the covariance matrix

of the error processwk in Eq. 3.11;

• r₁represents the covariance of the processvkin Eq. 3.12;

• T_1,0 andT_2,0 are the initial guess for the two states (first time step).

The equations are written in a discretized implicit way. The matrix shown in Equations 3.11 and 3.12 are: Ak=   1 d1D1 dt R2C1d1d2D1 dt R2C2d1d2D2 1 d2D2   (3.27) Bk=   dt R1C1d1D1 dt2 R2R3C1C2d1d2D1 0 dt2 R1R2C1C2d1d2D2 dt R3C2d2D2 0   (3.28) Ck= h −_R1₁ 0i (3.29) Dk= h 1 R1 + 1 R4 − 1 R4 −Aw i (3.30) where: d₁_{= 1 +} dt R₁C₁+ dt R₂C₁ (3.31) d₂_{= 1 +} dt R₂C₂+ dt R₃C₂ (3.32) D₁_{= 1 −} dt 2 R2₂C₁C₂d₁d₂ (3.33) D₂_{= 1 −} dt 2 R2₂C₁C₂d₁d₂ (3.34)

(36)

C

h

a

p

t

e

4

SEAS model

T

he SEAS model is a simplified model for the dynamics of buildings developed by Testi et al. [89, 91]. The model has lumped parameters and can be used for the estimate of power requirement both during winter and summer. It has a single-node, i.e. a single value for the internal temperature is defined. Balance equations are defined for the internal node, and the power requirement is computed. These balance equations take into account: ventilation, internal heat gains, solar radiation, transmission through opaque walls, transmission through glazed areas.

Lettbe the time variable and let ∆tbe the time step. Purpose of the balance equation is to

calculate the energy requirement for the time step that starts intand ends in ∆t. To calculate

the energy requirement, two different contributes are needed:

• energy needed to bring the temperature back to the set point,_Q˙_C

A;

• energy needed to overcome the thermal dispersion through the building envelope,_Q˙_C

B; Q_C_A_{= β}_CK_z¡Ts− Tz(t − 1) ¢ (4.1) K_z∆Tz ∆t = X d is p ˙ Q =£ _˙

Q_trans_{+ ˙}Q_sky,op_{− ˙}Q_rad,op¤

(4.2)

+ ˙Q_trans,win_{− α}_sunQ˙_rad,win_{+ ˙}Q_vent_{− ˙}Q_gain (4.3)

Q_C_B_{(t) = β}_C X

d is p

˙

Q∆_{t = β}_CK_z∆T_z (4.4)

where:

• _Q˙_trans _{is the power dispersion through opaque walls;}

• _Q˙

trans,windis the power dispersion through windows (glazed areas);

(37)

• _Q˙

rad,op is the power gain due to the solar radiaton impacting on opaque walls;

• _Q˙_rad,win_{is the power gain due to the solar radiation passing through windows (glazed}

areas);

• _Q˙_vent_{is the power dispersion due to ventilation;}

• _Q˙_gain_{is the power internal gain;}

• K_zis the effective capacity as defined in [42];

• βCis a tuning parameter with respect to the capacity;

• αsunis a tuning parameter with respect to the solar gains.

For the purpose of this work,_Q˙

transwill approximate the dispersion through all the envelope,

including_Q˙

trans,wind,Q˙sky,opandQ˙vent. The power gain due to solar radiationQ˙radincludes,

for this work, the radiation through opaque walls and the radiation through glazed areas. As the SEAS model, this work uses the modelling of heat transfer through opaque wall described in the following section.

4.1 Modelling of heat transfer through opaque walls

Opaque walls are massive heat transfer means, and for this reason dynamic effects due to fluctuations of outside temperature must be considered. The model wants to simplify these effects in one equation.

Two contributes of thermal dispersion are taken into account: • static:_Q˙_s_;

• dynamic:_Q˙

d.

Both are intended as product of a thermal transmittance and a difference in temperature. Let

A_pbe the surface of the opaque wall: ˙

Q_s_{(t) = U}_sA_p∆T_s(t) (4.5)

˙

Q_d_{(t) = U}_dA_p∆T_d(t) (4.6)

It remains to determineU_s, ∆T_s(t),U_d, ∆T_s(t).

• For the static term, static transmittance of the wall is considered, and the temperature difference is calculated between the indoor temperatureT_iand the outside temperature To, both averaged over the last 24 hours:

∆T_s_{(t) = ¯}T_i_{(t) − ¯}T_o(t) (4.7)

where_T¯ _{is the mean over the last 24 hours.}

• For the dynamic term, dynamic transmittance introduced in standard UNI EN ISO 13786 (see Reference [42]) is considered. Reference is made to the exact solution of a simplified case of a multi-layer wall subject to sinusoidal boundary outdoor condition and constant (zero) indoor condition. The dynamic transmittance is a complex number identified by its normU_dand its phaseφ_d.U_dis the ratio between heat flux on internal

surface and variation of outdoor temperature, whereas:

(38)

4.2 Grey box

4.2.1 Model

The model has six parameters to be optimized. In particular, two parameters are related to the transmittance through the envelope:

• static transmittanceH_s_{= U}_sA_p;

• dynamic transmittanceH_d_{= U}_dA_p;

• phaseφ.

In addition, two parameters are related to the solar radiation. Let I_e(t)andI_w(t)the solar

radiation at time step tper unit area on a east-facing vertical wall and a west-facing vertical

wall respectively [W/m−2]. Thenαeandαware the tuning parameter with resptect with the

radiation. Therefore,αeandαware measures of area.

Finally, at time steptthe power requirement is calculated as: ˙

Q = Hs∆Ts(t) + Hd∆Td(t) + αeIe+ αwIw (4.9)

where the delta temperatures are defined in Equations 4.7 and 4.8. The internal temperature is considered to be thermostatically controlled (set-point).

4.2.2 Training

There are five parameters to be tuned, and for this reason a training period is required. During this training period, data are collected in order to optimize the model. There are input and output data. In particular, the input data are the external temperature, I_eand I_w.

In some cases, I_eand I_wcould not be measured easily. Indeed, at least two sensors

(so-larimeters) are necessary. In order to simplify the hardware utilities, an algorithm to compute

I_e andI_wstarting from the measure of solar radiation on an horizontal surfaceI_h. In this

way only one solarimeter is required.

Once the input and output (target) data are available for proper amount of time, an opti-mization algorithm is required. For this particular case, the real output (measured) is compared with the model output. A least squares optimization algorithm is used in order to optimize the parameters. The trust region algorithm is used. See Chapter 8 for more information.

(39)

(40)

C

h

a

p

t

e

5

Correlation analysis

5.1 Single input

D

iscrete processes can be modelled through weighting functions. This model is described by a curve, function or table, that carries some information about the characteristic properties of the system. For this reason the model is called nonparametric:

y(t) =

∞

X

k=0

w(k) u(t − k) (5.1)

where yis the output,uthe input, and wthe weighting function sequence. In the case of

a building, y(t)is the energy consumption andu(t)the difference in temperature between

indoor and outdoor, at timet.

Multiplying 5.1 foru(t − τ), withτ > 0: y(t)u(t − τ) =

∞

X

k=0

w(k) u(t − k) u(t − τ) (5.2)

and taking expectation gives:

E£ y(t) u(t − τ)¤ = X∞

k=0

w(k)_E£u(t − k) u(t − τ)¤ (5.3)

Using the covariance functions gives the Wiener-Hopf equation:

ryu(τ) = ∞

X

k=0

w(k) ru(τ − k) (5.4)

wherer_yu(τ) =E£ y(t) u(t − τ)¤andr_u(τ) =E£u(t + τ) u(t)¤.1

(41)

One of the approaches to the simplification of Eq. 5.4 is to consider a truncated wieghting function. Assume a natural numberMsuch that_{w(k) = 0}for each _{k ≥ M}. Eq. 5.1 turns into:

y(t) =

M−1

X

k=0

w(k) u(t − k) (5.5)

and Eq. 5.4 turns into:

r_yu(τ) =

M−1

X

k=0

w(k) ru(τ − k) (5.6)

If data are available, it is possible to estimate the covariance functions. Assume that ˆryu(τ)

and ˆru(τ)are estimates ofryu(τ)andru(τ). The purpose is to findw(k)ˆ , that is the estimate

ofw(k). Using the estimates:

ˆryu(τ) = M−1 X k=0 ˆ w(k) ˆru(τ − k) (5.7)

The following linear system of equations is obtained:

     ˆru(0) ˆru(1) · · · ˆru(M − 1) ˆru(1) ˆru(0) · · · ˆru(M − 2) ... ... ... ... ˆru(M − 1) ˆru(M − 2) ··· ˆru(0)           ˆ w(0) ˆ w(1) ... ˆ w(M − 1)      =      ˆryu(0) ˆryu(1) ... ˆryu(M − 1)      (5.8) This linear system has unique solution if and only if the matrix is nonsingular.

5.2 Multiple input

If the output depends on more than one input, Eq. 5.1 becomes:

y(t) = n X i=1 M−1 X k=0 w_i(k)ui(t − k) (5.9)

where subscript iindicate thei-th input u_i. Multiplying this equation foru_j_{(t − τ)}, where j = 1,..., n, and taking expectation will give:

E£ y(t) uj(t − τ)¤ = n X i=1 M−1 X k=0 w_i(k)_E£ui(t − k) uj(t − τ) ¤ (5.10) Let the covariances be:2

r_yi(τ) =_E£ y(t + τ) ui(t) ¤ τ ∈Z, i = 1,..., n (5.11) r_{i j}(τ) =_E£ui(t + τ) uj(t) ¤ τ ∈Z, i, j = 1,..., n (5.12) 2_{Note that}_r_{i j}₍_{τ) 6= r}_ji₍_τ)_{in general.}

(42)

For eachτcan be written the following equation: r_yi(τ) = n X i=1 M−1 X k=0 w_i(k) ri j(k − τ) i, j = 1,..., n (5.13)

Using the estimates based on data and the truncated weighting functions:

ˆryi(τ) = n X i=1 M−1 X k=0 ˆ w_i(k) ˆri j(k − τ) i, j = 1,..., n (5.14)

wherewˆ_i(k)are the unknowns.

One way to solve this linear system is to write it out as follows:

     ˆ Ru(0) Rû(1) · · · Rû(M − 1) ˆ Ru(1) Rû(0) · · · Rû(M − 2) ... ... ... ... ˆ Ru(M − 1) Rû(M − 2) ··· Rû(0)           ˆ w(0) ˆ w(1) ... ˆ w(M − 1)      =      ˆryu(0) ˆryu(1) ... ˆr_yu_{(M − 1)}      (5.15) where: ˆ Ru(τ) =      ˆr11(τ) ˆr12(τ) ··· ˆr1n(τ) ˆr21(τ) ˆr22(τ) ··· ˆr2n(τ) ... ... ... ... ˆrn1(τ) ˆrn2(τ) ··· ˆrnn(τ)      (5.16) ˆryu(τ) =      ˆry1(τ) ˆry2(τ) ... ˆryn(τ)      (5.17) ˆ w(τ) =      ˆ w₁(τ) ˆ w2(τ) ... ˆ w_n(τ)      (5.18) forτ = 1,..., n.

The correlation analysis method can be also called RX method, and belongs to a bigger family of algorithms called Statistical methods in Figure 2.1. RX stands for Regressive model with eXogenous inputs:

• exogenous means that the output depends on the external inputs apparently not corre-lated with the output;

• regressive means that, for the calculation of the output at a certain time step, the inputs are considered from a specific amount of previous time steps.

(43)

5.3 Study case

For the purpose of this work, two different inputs are used, son = 2:

1. difference in temperature between indoor and outdoor [K]; 2. solar radiation [W/m2_];

and the output is represented by the power consumption. Both inputs and outputs are considered to be measurable by sensors. In addition, a perfect forecasting is considered for the inputs.

The time step is 15 minutes, and the sum in Eq. 5.10 is truncated atM = 105. This choice

has been made because of the fact that a complete day (meaning 24 hours, i.e. 96 time steps) is included. In this way the model is able to capture all the correlations, even those related to what happened 24 hours before.

If needed, this model allows more inputs to be used, for instance, external humidity, time of the day, day of the week.

(44)

C

h

a

p

t

e

6

Nonlinear autoregressive exogenous model

D

eep learning is an aspect of artificial intelligence that is concerned with emulating the learning approach that human beings use to gain certain types of knowledge. Most modern deep learning models are based on an artificial neural network, although they can also include propositional formulas, deep belief networks or deep Boltzmann machines. For the purpose of this work, artificial neural networks are considered.

This Chapter will briefly explain what a neural network is, and how a NARX network can be used for the purpose of forecasting building heating demand.

6.1 Neural networks

Artificial Neural Networks (ANNs) are computing systems vaguely inspired by biological neural networks that constitute animal brains. An ANN is not an algorithm, but rather a framework for many different machine learning algorithms to work together. An ANN is a collection of connected units (or nodes) called neurons, which model the neurons in a biological brain. Each connection between two neurons can transmit a signal from one artificial neuron to another, like the synapses in a biological brain. Each neuron receives a signal, processes it (through an appropriate nonlinear function) and delivers it to another neuron. Signals at connections between artificial neurons are real numbers.

Typically, neurons are aggregated into layers. Information flows from the first layer (input layer) to the last layer (output layer), possibly traversing the middle layers (hidden layers) multiple times. Figure 6.1 shows an example of a simple neural network structure.

Unless indicated otherwise, most neurons can be described as accepting a vector of input

υ, computing an affine transformationW|υ+ band then applying an element-wise nonlinear

functiong(W|υ + b). Several chioces of this nonlinear functions are used in literature. The

(45)

Figure 6.1. An artificial neural network. Each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another. This ANN has only one hidden layer. In particular, this figure represents a feedforward neural network.

rectifier g(x) = max(0, x)

logistic sigmoid g(x) = σ(x) = _exe₊₁x

hyperbolic tangent g(x) = tanh(x) = 2σ(2x) − 1

The two main types of ANNs are: • feedforward neural network; • recurrent neural networks (RNNs).

Feedforward neural networks are ANNs wherein connections between neurons do not form a cycle (see for example Figure 6.1).

6.2 Recurrent Neural Networks

Conversely, in RNNs connections between nodes form a directed graph: they perform the same task for every element of a sequence, with the output being depended on the previous computations. For example, in order to predict the next word in a sequence, it is better to know which words came before.

RNNs are networks with loops in them, allowing information to persist. Figure 6.2 shows a chunkAof an RNN that takes an inputx_tand returns an outputh_t. A loop allows information

to be passed from one step of the network to the next. This structure reveals that RNNs are related to sequences and lists. For this reason, they are the best solution for forecasting of

(46)

=

A ht xt A h0 x0 A h1 x1 A ht xt …

Figure 6.2. Recurrent Neural Network.

signals. For example, RNNs are much used for natural language processing, i.e. the field of computer science concerned with the understanding of human language (both written and spoken).

6.3 NARX

The idea of using NARX for this work comes from the results of the correlation analysis algorithm (explained in Chapter 5), also called RX algorithm.

ARX stands for AutoRegressive model with eXogenous inputs, whereas NARX stands for Nonlinear AutoRegressive model with eXogenous inputs. This is a powerful class of models which has been demonstrated to be well suited for modelling nonlinear systems and specially time series [22]. One proved advantage is that these network converge much faster and generalize better than other networks, and are often much better at discovering long time-dependencies than conventional recurrent neural networks. Actually, NARX networks are sometimes not classified as recurrent in literature, due to the fact that the network is trained with no recurrency at all, as will be explained.

A NARX model can be stated algebraically as:

y_t= F(y_t−1, y_t−s, . . . , y_t−n, x_t−1, x_t−2, . . . , x_t−z) + ²t (6.1)

where:

• y_t∈Rnis the discretized time series at time tto predict (output, variable of interest);

• ut∈Rm is the discretized time series of inputs at timet(exogenous series);

• sandzare the number of previous values of output and input on which the new output

depends;

• ²t is an error therm, which relates to the fact that knowledge of the other terms will

not enable the current value of the time series to be predicted exactly.

Purpose of a NARX neural network is to find the functionF. In the case of thermal models

of buildings, for example, y_t= yt∈R could be the power needed for air conditioning, andxt

(47)

x(t) x(t) 2 b 1:3 W 1:5 W Hidden Hidden 15 b W Output Output 1 y(t) y(t) 1

Figure 6.3. Simple NARX neural network in closed-loop.

Figure 6.3 shows a NARX neural network working in closed loop.x(t) = xtis the exogenous

input. The number 2 below the input box specifies the number of exogenous inputs (for example outdoor temperature and solar radiation). One more "input" is the output itself (closed loop). The captions 1:3 and 1:5 mean that z = 3ands = 5. The number 15 states the

number of neurons in the hidden layer. Each neuron is calculated by the following operations: • first, each input of the hidden layer is multiplied by a weight. TheWblock in the hidden

layer of Figure 6.3 indicates this operation, whereWis an appropriate matrix.

• All these weighted inputs are summed up, adding a bias parameter (calledbin Figure).

• A logistic sigmoid is applied to the scalar of each neuron, as showed in the last block of the hidden layer of Figure 6.3.

The network can be formed by several hidden layer. After all the hidden layers is an output layer. Actually, it is not a layer because it is linear. Each scalar of each neuron of the hidden layer is weighted and summed up to a bias. The line symbol means that no nonlinearity function is applied.

Such a neural network seems to be simple. Nevertheless, the number of parameters to optimize is relevant.

• The hidden layer contains two weight blocks (W). The first one includes_{15 · 2 · 3 = 90}

parameters and the second one15 · 1 · 5 = 75;

• the bias block in the hidden layer includes15parameters;

• the weight block in the output layer includes15parameters;

• the bias block in the output layer includes1parameter.

In total, there are 196 parameters to be optimized. It should be noted that Figure 6.3

shows just a simple NARX network. Other networks can include more hidden layers, with hundreds of neurons, and much higher values of s andz. The number of parameters can

(48)

x(t) x(t) 2 y(t) y(t) 1 b 1:3 W 1:5 W Hidden Hidden 15 b W Output Output 1 y(t) y(t) 1

Figure 6.4. Simple NARX neural network in open-loop. If the true output is available, this scheme can be used for training.

easily increase up to hundreds of thousands. How can an algorithm optimize such a huge quantity of parameters? Two different ideas come to the aid: open-loop and backpropagation.

The first idea is: even though the NARX network is a closed loop network, if the true output is available during the training, a hierarchical network architecture without feeding back the output to the input is well-suited for training. Thus, the closed-loop form of the network for predictions is replaced by a open-loop architecture (Figure 6.4) [65]. This has two advantages. The first is that the input to the feedforward network is more accurate. The second is that the resulting network has a purely feedforward architecture, and static backpropagation can be used for training.

Backpropagation is a method used in almost every artificial neural networks to calculate a gradient that is needed in the optimization of the weights to be used in the network. See Chapter 11 for more information.

6.4 Study case

Figure 6.5 shows the actual NARX network used for modelling the building described in Chapter 13.1. As can be seen,s = z = 50. Moreover, there are two nonlinear hidden layer, with

50 and 20 neurons respectively. As already shown, Figure 6.5 shows only the closed-loop case, but the open-loop case is used during the training.

The inputs are:

1. the delta temperature;

2. solar radiation on horizontal plane.

All the results shown in Chapter 14.2 are the output of this network.

(49)

x(t) x(t) 2 b 0:50 W 1:50 W Hidden 1 Hidden 1 50 b W Hidden 2 Hidden 2 20 b W Output Output 1 y(t) y(t) 1

Figure 6.5. NARX scheme used for this work, for modelling of the building described in Section 13.1.