Artificial Intelligence and Machine Learning Techniques

(1)

P H .D. T HESIS : P ART I

Artificial Intelligence and Machine Learning Techniques

Author:

Pier Giuseppe G IRIBONE , PhD

Supervisors:

Prof. Marco G UERRAZZI

Prof. Ottavio C ALIGARIS

A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy

in

Economics (XXXIII Cycle)

Department of Economics

(2)

“In regard to every form of human activity it is necessary that the question should be asked from time to time: What is its purpose and ideal? In what way does it contribute to the beauty of human existence?

As respects those pursuits which contribute only remotely, by providing the mechanism of life, it is well to be reminded that not the mere fact of living is to be desired, but the art of living in the contemplation of great things.

Still more in regard to those avocations which have no end outside themselves, which are to be justified, if at all, as actually adding to the sum of the world’s permanent possessions, it is necessary to keep alive a knowledge of their aims, a clear prefiguring vision of the temple in which creative imagination is to be embodied”

Bertrand Russell - The Study Of Mathematics (1907)

“È necessario, nei confronti di ogni forma di attività umana, porsi di tanto in tanto la do- manda: qual è il suo scopo, qual è il suo ideale? In che modo contribuisce alla bellezza dell’esistenza umana?

In relazione alle attività che vi contribuiscono soltanto alla lontana, in quanto si occupano del meccanismo della vita, è bene ricordare che non soltanto il mero fatto di vivere va auspi- cato, ma l’arte di vivere nella contemplazione delle cose grandi.

A maggior ragione, quando ci riferiamo alle occupazioni che non hanno altro fine al di fuori di se stesse, che vanno giustificate, se lo si può, in quanto aggiungono realmente qualcosa alle ricchezze permanenti del mondo, è necessario aver viva la coscienza dei loro obiettivi, una chiara prefigurazione del tempio nel quale deve inserirsi l’immaginazione creatrice.”

Bertrand Russell - Lo studio della matematica (1907)

(3)

UNIVERSITY OF GENOA

Abstract

Department of Economics Doctor of Philosophy Artificial Intelligence and Machine Learning Techniques by Pier Giuseppe G IRIBONE , PhD

The first part of my PhD Thesis deals with different Machine Learning techniques mainly applied to solve financial engineering and risk management issues. After a short literary review, every chapter analyzes a particular topic linked to the imple- mentation of these models, showing the most suitable methodologies able to solve it efficiently. The following topics are therefore covered:

• Data Fitting and Regression

• Forecasting

• Classification

• Outlier Detection and Data Quality

• Pricing

Every chapter provides the theoretical explanation of the model, the description of the implementation in a numerical computing environment and the solution for real case-studies. Among others, the main technologies discussed in this work are the following:

• Shallow Multi-Layers networks

• Feed-forward and static networks

• Radial Basis Functions (RBF) networks

• Recurrent and Dynamic Neural Networks

• Nonlinear Autoregressive (NAR) networks and Nonlinear Autoregressive net- works with exogenous variables (NARX)

• Deep Neural networks

• Convolutional Networks (Conv Net)

• Fuzzy C-Means (FCM) clustering

• Self-Organizing Maps (SOM) and Kohonen networks

• Neural Networks with Circular Neurons

• Auto-Associative Neural Networks (AANN) and Auto-encoders for Nonlinear

Principal Component Analysis (NLPCA)

(4)

(5)

Acknowledgements

I would like to express my heartfelt thanks to the University Professors of the PhD School in Economics and Quantitative Methods. In particular to my Supervisor, Prof. Marco Guerrazzi, and to the PhD Coordinator, Prof. Anna Bottasso, who have constantly guided and advised me.

I thank my second Supervisor, Prof. Ottavio Caligaris for the careful reading of this work and for all our delightful mathematical conversations, which have always been a constant reference and inspiration for me since I was a first-year student in Engineering.

Finally, I would like to thank my parents, Piero and Giuliana, because I could not aspire to have better guides on the complex real-life journey.

Italo Calvino, one of my favourite writers, in the immortal work Invisible cities published for the first time in 1972 by Einaudi wrote:

“You take delight not in a city’s seven or seventy wonders, but in the answer it gives to a question of yours.”

Calvino, Invisible Cities, Introduction to the third Part

From this point of view, a new Research project for me can be compared to a pas- sionate exploration of a Calvinian invisible city.

Approaching a new study, you can never a-priori know how many and which dis- coveries you will encounter, but for sure there is always a reason that has led us to ask questions and for which we are looking for an answer.

As Invisible cities recounts a series of travel reports that the explorer Marco Polo makes to the emperor of Tartars Kublai Kan, I would like to bring back to you through the first part of my PhD Thesis some notes of my journey in the invisible city of Artificial Intelligence.

During the exploration, which began for me in 2015, I had the pleasure to meet many brave traveling companions who helped me in this fascinating cognitive iter.

Therefore, I want to thank all the co-authors of my reasearch works published during this time on the topic of Machine Learning: Arianna Agosto, Stefano Bonini, Alessia Cafferata, Giuliana Caivano, Ottavio Caligaris, Paola Cerchiello, Carlo Decherchi, Si- mone Fioribello, Simone Ligato, Marco Neffelli, Francesco Penone and Marina Resta.

I dedicate this work to my uncle Gian . . . I miss you so much

(6)

In memory of my uncle Gian Augusto Ghiglione

It is difficult to make a person who doesn’t know you understand your love for life.

Your inexhaustible energy gave shape to every idea, modeling it with your unmis- takable personality.

Following the example of your father Piero, you learnt the dedication to work, which was so strong that it could be considered unique and distinctive: now these traces are indelible imprints for all of us that continue to talk about you.

The passion you put into managing your activities can be fully understood only by those who have been close to you: finding an entrepreneur who thinks beyond the logic of mere profit is increasingly rare today.

Strength, determination, perseverance, ability, sensitivity . . . How difficult it is to be- have ethically in this world dazzled by deceptive values and false gods!

Following the example of your mother Celeste, you have always fought firsthand to affirm your principles of morality and fairness: how good you were at carrying out your projects, based on solid pillars, against the current perverse system that rewards relativism, fluidity, opacity and sophisticated complexity.

The commitment you put into your work was not a goal in itself, but it was certainly one of your many means with which you loved your family concretely, protecting it and guaranteeing a sunny future.

Altruism and integrity were the guidelines of your journey on this earth: I will be proud to follow the virtuous footprints left by you along the direction you indicated.

I would like to conclude this dedication recalling a personal episode happened some time ago.

I was in Turin to attend an applied mathematics conference and the weather threat- ened frost, storm, fog, preventing me from safely returning home.

My uncle Gian Augusto, having learnt about my participation at this event, called me to inquire as to how I would return to Liguria.

Knowing that he was at work and not wanting to bother him, I relieved him by telling that it was all right and I would take the public transportation.

At the end of the conference, without having been warned about anything and not having given any information about the ending, I found my uncle in front of the entrance with his car, who was waiting for me.

Opening the door I thanked him, amazed at the appreciated surprise.

With his usual cheerful spontaneity, smiling at me, he told me in Piedmontese di- alect:

"Chicco, I understood from the tone of your voice that you were worried, even if you didn’t tell me. Your equations might be useful, but remember that in life the things that truly matter are very different, like this gesture of affection".

Dear Uncle, thank you for all the love you gave.

In the picture, my uncle Gian Augusto and me at the age of one year.

(7)

Ringraziamenti

Desidero rivolgere un sentito ringraziamento ai Docenti Universitari della Scuola di Dottorato in Economia e Metodi Quantitativi: in particolare al mio Relatore, Prof.

Marco Guerrazzi, e alla Coordinatrice, Prof.sa Anna Bottasso, che mi hanno costan- temente guidato e consigliato.

Ringrazio il mio secondo Supervisor, Prof. Ottavio Caligaris per l’attenta rilettura dell’elaborato e per tutte le nostre proficue conversazioni matematiche, che sono da sempre per me un costante punto di riferimento e di ispirazione da quando, da studente del primo anno di Ingegneria, frequentavo le sue affascinanti lezioni di analisi.

Infine desidero dire un grazie di cuore e con profondo affetto ai miei genitori Piero e Giuliana perchè non potevo aspirare ad avere guide migliori nel complesso cam- mino della vita.

Italo Calvino, uno dei miei scrittori preferiti da sempre, nell’immortale opera Le città invisibili pubblicata per la prima volta nel 1972 dall’Editore Einaudi di Torino, scrisse:

“Di una città non godi le sette o le settantasette meraviglie, ma la risposta che dà a una tua domanda”

Calvino, Le città invisibili, Introduzione alla Parte III

Da questo punto di vista, per me un nuovo progetto di Ricerca può essere parago- nato alla stregua di una appassionata esplorazione in una delle città invisibili calvini- ane.

Approcciando un nuovo studio, non si può mai sapere a-priori quante e quali even- tuali scoperte si andrà incontro ma di sicuro vi è sempre una ragione che ci ha con- dotto a porci delle domande e per le quali si va a cercare una risposta.

Come ne Le città invisibili vengono raccontate una serie di relazioni di viaggio che l’esploratore Marco Polo fa all’imperatore dei Tartari Kublai Kan, mi farebbe piacere riportarvi in questa prima parte della tesi alcune mie annotazioni di questo viaggio nella città invisibile dell’apprendimento artificiale.

Durante tale esplorazione, iniziata per me nel 2015, ho avuto il piacere di conoscere numerosi valorosi compagni di viaggio, che mi hanno aiutato in questo affascinante iter conoscitivo.

Desidero, pertanto, ringraziare tutti i co-autori dei miei lavori di ricerca pubblicati in questo tempo sul tema di Machine Learning: Arianna Agosto, Stefano Bonini, Alessia Cafferata, Giuliana Caivano, Ottavio Caligaris, Paola Cerchiello, Carlo Decher- chi, Simone Fioribello, Simone Ligato, Marco Neffelli, Francesco Penone e Marina Resta.

Dedico questo lavoro a mio zio Gian . . . manchi tanto.

(8)

In ricordo di mio zio Gian Augusto Ghiglione

È difficile far capire ad una persona che non Ti abbia conosciuto il Tuo amore per la vita.

La Tua inesauribile energia dava forma ad ogni idea, plasmandola con il Tuo incon- fondibile carattere.

Seguendo l’esempio di papà Piero, hai imparato la dedizione con la quale seguivi il Tuo lavoro, che era talmente forte da essere unica e distintiva: ora queste tracce sono per tutti noi indelebili impronte che ci continuano a parlare di Te.

La passione che ponevi nella gestione delle Tue attività può essere pienamente com- presa solo da chi Ti è stato vicino: trovare un imprenditore che ragiona oltre le logiche del mero profitto è oggi sempre più raro.

Forza, determinazione, perseveranza, capacità, sensibilità . . . Quanto è difficile com- portarsi eticamente in questo mondo abbagliato da ingannevoli valori e falsi dei!

Seguendo l’esempio di Tua mamma Celeste, hai sempre combattuto in prima per- sona per affermare i Tuoi principi di moralità e correttezza: come sei stato bravo a portare avanti i tuoi progetti, fondati su solidi pilastri, contro l’attuale perverso sis- tema che premia il relativismo, la fluidità, l’opacità e la sofisticata complessità.

L’impegno che mettevi nel lavoro non era fine a se stesso, ma era sicuramente uno dei Tuoi tanti mezzi con i quali amavi concretamente la Tua famiglia, proteggendola e garantendo loro un futuro sereno.

Generosità ed integrità sono state le linee guida del Tuo cammino su questa terra:

sarò orgoglioso di seguire le virtuose orme da Te lasciate lungo la direzione che ci hai indicato.

Mi fa piacere concludere ricordando un episodio personale accaduto qualche tempo fa.

Ero a Torino per partecipare ad una conferenza di matematica applicata ed il tempo minacciava gelo, tempesta, nebbia, impedendomi un sicuro rientro a casa.

Mio zio Gian Augusto, appreso della mia partecipazione a questo evento, mi chiamò per informarsi su come avessi fatto rientrare in Liguria.

Sapendo che era al lavoro e non volendo recargli alcun disturbo, l’ho tranquillizzato dicendogli che era tutto a posto e mi sarei arrangiato con i mezzi di trasporto pub- blici.

Al termine del convegno, senza essere stato avvisato su nulla e non avendo dato alcune indicazioni in merito all’uscita, trovai mio zio davanti all’ingresso con la sua auto, che mi attendeva.

Aprendo la portiera lo ringraziai, stupito per la gradita sorpresa.

Con la sua consueta allegra spontaneità, sorridendomi, mi disse in piemontese:

"Chicco, ho capito dal tuo tono di voce che eri preoccupato, anche se non me l’hai detto.

Le tue equazioni saranno anche utili, ma ricordati che nella vita le cose che contano sono ben altre, tipo questo gesto d’affetto".

Caro Zio, grazie per tutto il bene che ci hai donato.

Nella fotografia mio zio Gian Augusto ed io all’età di un anno.

(9)

(10)

(11)

List of Figures

2.1 Architecture of an MLP feed-forward neural network . . . 14 2.2 The signal transformation process of a perceptron in a MLP network . 14 2.3 Architecture of a Radial Basis Function network . . . 16 2.4 Interest rates term structure, Tenor 3 month - 30th December 2016 . . . 18 2.5 NS fitting model on Interest rates term structure, Tenor 3 month - 30th

December 2016 . . . 22 2.6 SV fitting model on Interest rates term structure, Tenor 3 month - 30th

December 2016 . . . 22 2.7 dRF fitting model on Interest rates term structure, Tenor 3 month -

30th December 2016 . . . 23 2.8 RBF network architecture used for fitting the Interest rates term struc-

ture, Tenor 3 month - 30th December 2016 . . . 23 2.9 From top to bottom and from left to right: interpolation of the EUR003M 2004

yield curve with the Nelson–Siegel (NS), Svensson (SV), de Rezende–

Ferreira (dRF) models and with RBF . . . 24 2.10 From top to bottom and from left to right: interpolation of the USD003M 2004

yield curve with the Nelson–Siegel (NS), Svensson (SV), de Rezende–

Ferreira (dRF) models and with RBF . . . 24 2.11 From top to bottom and from left to right: interpolation of the EUR003M 2016

yield curve with the Nelson–Siegel (NS), Svensson (SV), de Rezende–

Ferreira (dRF) models and with RBF . . . 25 2.12 From top to bottom and from left to right: interpolation of the USD003M ₂₀₁₆

yield curve with the Nelson–Siegel (NS), Svensson (SV), de Rezende–

Ferreira (dRF) models and with RBF . . . 25 3.1 1 input – 1 neuron – 1 layer artificial neural network . . . 28 3.2 an R input – 1 neuron – 1 layer artificial neural network . . . 28 3.3 an R input – 1 neuron – 1 layer artificial neural network in abbreviated

notation . . . 29 3.4 Comparison between an R input – S neurons – 1 layer artificial neural

network in extended and abbreviated notation . . . 29 3.5 The general case, an R input – S ¹ neuron – S ² layer network in abbre-

viated notation . . . 30 3.6 An example of a static feed-forward ANN architecture . . . 30 3.7 Neural network process (supervised learning) . . . 31 3.8 An example of feed-forward dynamic neural network with a single

delay, having only one layer and one neuron . . . 33 3.9 An example of recurrent dynamic neural network with a single feed-

back, having only one layer and one neuron. . . . 34

3.10 Structure of a two-layer NARX network, shown in abbreviated notation 35

3.11 Target feedback versus output feedback . . . 35

(14)

3.12 Discretization of a continuous function into a time series, which can

be used to predict its future values . . . 36

3.13 One step ahead prediction with NAR model and with a NARX model 37 3.14 Two-step-ahead prediction with a NAR model, both direct and indirect 37 3.15 The network errors after training . . . 45

3.16 The network goodness of fit . . . 45

3.17 Output-target comparison . . . 46

3.18 Network errors after training . . . 46

3.19 Network error surface . . . 47

3.20 Network goodness of fit . . . 47

3.21 The estimated volatility surface . . . 48

3.22 NAR network goodness of fit . . . 48

3.23 Errors committed by the network: difference between network out- come and actual target . . . 49

3.24 Target-output comparison for the NAR network: comparison between the network estimation of the time series and the actual time series . . 49

3.25 Error autocorrelation for different lags . . . 50

3.26 Cross-correlation between error and input for different lags . . . 50

3.27 Network goodness of fit . . . 51

3.28 Network output – actual target difference . . . 51

3.29 Target-output comparison for the NARX network . . . 52

3.30 Error autocorrelation . . . 52

3.31 Error-input (price) cross-correlation . . . 53

3.32 The trading rule . . . 53

3.33 Backtesting the system applied on the Dow Jones Industrial Average Index (2016) . . . 54

3.34 Backtesting the system applied on the Nasdaq Composite Index (2016) 55 4.1 Layers of a neural network . . . 58

4.2 ANN architectures for classification problems . . . 59

4.3 Concept of training in supervised learning . . . 60

4.4 Backpropagation in a supervised multi-layer neural network . . . 62

4.5 Illustration of dropout technique . . . 63

4.6 Concept of training in a Convolutional Network . . . 67

4.7 Illustration of the process for extracting the feature maps . . . 67

4.8 Architecture of the Convolutional Network . . . 68

4.9 The architecture of an Artificial Neural Network Battery . . . 68

4.10 The training and the inference process for the ANN Battery . . . 69

4.11 Flow-chart of the main Matlab functions . . . 69

4.12 Filtering the time series through the Robust Loess methodology . . . . 70

4.13 Time-windows creation from the set of knots . . . 70

5.1 Fuzzy C-means clustering and centroids . . . 73

5.2 The most widespread network topologies for the architecture of a SOM 78 5.3 Starting disposition for the first test . . . 80

5.4 SOM training phase after 10, 100 and 1000 epochs . . . 80

5.5 Starting and final position of the neurons constituting the SOM grid (Training: 5000 epochs) . . . 81

5.6 Detection of market anomalies for the first security . . . 82

5.7 Detection of market anomalies for the second security . . . 82

(15)

6.1 Schematic Representation of a non-linear (on the left) and a linear (on

the right) neuron . . . 86

6.2 The most popular activation functions: s ₁ : sigmoid, s ₂ : step-function, s ₃ : hyperbolic tangent, s ₄ : ramp . . . 86

6.3 Connections between layers . . . 87

6.4 Representation of a non-linear (on the left) and a linear (on the right) neuron . . . 88

6.5 Representation of a neuron in the first (on the left) and in the last (on the right) layer . . . 88

6.6 A linear neural network architecture . . . 89

6.7 A non-linear neural network architecture . . . 90

6.8 Architecture of Circular Neurons . . . 91

6.9 Principal Component Analysis through Artificial Neural Network . . . 93

6.10 Auto-Associative Neural Network (AANN) architecture . . . 94

6.11 The surface after deleting 30% of the values and after the introduction of a white noise . . . 95

6.12 Reconstruction of the surface through Auto-Associative Neural Net- work (AANN) after 1.000 epochs . . . 95

6.13 Reconstruction of a benchmark surface through Auto-Associative Neu- ral Network (AANN) . . . 96

6.14 Forecasting on the benchmark dataset using ANN with Circular neu- rons . . . 96

6.15 The incomplete Black swaption volatility surface . . . 97

6.16 Reconstruction of the swaption volatility surface through Auto-Associative Neural Network (AANN) . . . 98

6.17 Historical time-series of European inflation index (May2013 - May2018) and relative projection . . . 104

6.18 Historical and projected seasonality estimated using standard market methodology . . . 106

6.19 Historical and projected seasonality estimated using a Machine Learn-

ing methodology . . . 106

(16)

(17)

List of Tables

1.1 Machine Learning techniques for the three types of events . . . 12

2.1 Parameter Estimation of NS, SV and dRF model. (*) Overfitting prob- lems. Market Data: 30th December 2016 . . . 18

2.2 Estimated coefficients for the parametric techniques: (2004-2016) vs. (EUR003M-USD003M) . . . 19

2.3 RBF nets settings for the observed yield curves . . . 19

5.1 Quoted CDS premium with tenors: 1Y,3Y,5Y,7Y,10Y (Source:Bloomberg

^R

) 74 5.2 Fuzzy C-means centroids . . . 75

5.3 Membership levels of the points: 1-10 respect to the clusters . . . 75

5.4 Membership levels of the points: 11-20 respect to the clusters . . . 75

5.5 Membership levels of the points: 21-30 respect to the clusters . . . 75

5.6 Membership levels of the points: 31-40 respect to the clusters . . . 75

5.7 Membership levels of the points: 41-50 respect to the clusters . . . 76

5.8 Membership levels of the points: 51-60 respect to the clusters . . . 76

5.9 Membership levels of the points: 61-70 respect to the clusters . . . 76

5.10 Membership levels of the points: 71-77 respect to the clusters . . . 76

6.1 Euro Inflation Swap Rate - 29th June 2018. Source: Bloomberg . . . 99

6.2 Inflation Swap rate quoted on 29th June 2018, K ( T _M ) , = _M ( 0 ) and ∆ = _M 104 6.3 Monthly Standardized residuals, < _M . . . 104

6.4 Financial characteristics of the YYIIS . . . 107

6.5 EUR-OIS term structure. Reference Date: 29th June 2018. Source: Bloomberg . . . 107

6.6 Discounted Cash Flows of the YYIIS paying leg . . . 108

6.7 Discounted Cash Flows estimated by the standard method: YYIIS re-

ceiving leg . . . 108

6.8 Discounted Cash Flows estimated by the network: YYIIS receiving leg 108

(18)

(19)

Chapter 1

Introduction

“Finally the journey leads to the city of Tamara.

You explore it along streets thick with signboards jutting from the walls. The eye does not see things but images of things that mean other things”

Italo Calvino, Invisible cities, Cities and Signs 1

“Finalmente il viaggio conduce alla città di Tamara.

Ci si addentra per vie fitte d’insegne che sporgono dai muri. L’occhio non vede cose ma figure di cose che significano altre cose.”

Italo Calvino, Le città invisibili, Le città e i segni 1

In this chapter I introduce the topic of Machine Learning in Risk Management:

the first section presents a short literary review while in the second part the list of published studies of the candidate in this field is reported.

1.1 Literary Review

The literature of Machine Learning Techniques in Risk Management can be divided into three parts depending on the main risks that a bank has to face: Credit Risk, Financial Risk and Operative Risk.

1.1.1 Credit Risk

Credit risk analysis is one of the most important topics in banking risk management and it is one of the main concerns of financial institutions.

Technological progress has brought new opportunities that make it easier to over- come the challenges that Risk Management has to face, such as an increasing com- putational power and the increasing availability of data.

This has created fertile ground for the application of Machine Learning techniques typically used in different fields of science, such as engineering, by combining them with traditional methods of estimating risk factors based on traditional statistics.

Unlike traditional models, Machine Learning techniques are for the most part non- parametric, a very important factor since it allows you to grasp characteristics and data patterns that cannot be captured by subjecting to the constraints and parametric assumptions about the data.

In addition, these methods have been created specifically to better manage large

amounts of data and to facilitate adaptation and recalibration due to their continu-

ous updating.

(20)

Banking management and regulation on credit risk revolves around the estimation of three key components:

1. the Probability of Default (PD), that is the probability that a counterparty will enter the default state within a time horizon of one year;

2. the Loss Given Default (LGD), i.e. the expected value of the ratio between the loss due to the default and the amount of exposure at the time of default (EAD).

3. the Exposure at Default (EAD), i.e. the value of risk assets on and off balance sheet.

In recent years there have been many Machine Learning techniques designed for the estimation of binary variables in many fields of science. The application of these new techniques to the credit scoring and estimation problems of the PD component was therefore straightforward.

One of these innovations is the Neural Network, a model that connects explana- tory variables with the target through different layers made up of combinations of transformed input variables.

The importance of Neural Networks as a credit scoring model has been presented in many studies, emphasizing the ability of this tool to adapt to data thanks to a wide range of settings, such as the number of hidden layers and nodes (Khashman, 2010).

You are able to obtain much higher performance compared to the use of standard statistical models reported in the most recent relevant literature by combining this powerful model with an accurate optimization of data distribution, so as to balance the number of instances within the dataset, which is a typical problem when esti- mating the probability of default (Zhao et al., 2015).

In fact, several studies have shown how neural networks are able to produce much more accurate PD estimates than the market standard of logistic regression (Less- mann et al., 2015).

Thanks to the high performance it can achieve, this technique has become very pop- ular, attracting the attention of many scholars who have tried to propose increasingly complex versions of the Artificial Intelligence method. One of these is the Random Subspace Neural Network which aims to reduce the correlation between estimators by training them on random samples of variables rather than on the entire available set (Lumini and Nanni, 2009).

Another typical model of Machine Learning is Decision Trees (CART), a technique adaptable to both regressive and classification problems. With this method it is pos- sible to obtain accurate estimates through consecutive and dependent divisions of the data space based on the threshold values of the individual explanatory vari- ables. Despite their simple structure, CARTs are a particularly powerful tool, as well as intuitive and easy to interpret, capable of obtaining high predictive performances (Khandani, Kim, and Lo, 2010).

In addition, tree models not only achieve more accurate results than classical statis- tical methodologies, but also produce more stable results than those obtained with models based on Multilayer Neural Networks (Addo, Guegan, and Hassani, 2010).

CART techniques are particularly useful where a particular understanding of the

problem is required, allowing an automatic selection of the best variables and al-

lowing to overcome the problem of incomplete datasets (Strobl, Boulesteix, and Au-

gustin, 2010).

(21)

A direct evolution of the CART models is the Random Forest techniques. They have been derived from a combination of several tree models built in a different way, such as on a subset of data and / or with a number of different variables. This aggregation can improve the estimation obtained by each single CART, both in terms of accuracy and in terms of stability.

In literature, the Random Forest, together with Nearest Neighbors, another nonpara- metric technique, obtain extremely consistent results in the probabilistic estimations (Malley et al., 2012) and have often been compared with the classic parametric meth- ods by testing them on different types of data.

As regards credit risk, this comparison was made by estimating the probability of default from a large set of historical credit data (Kruppa et al., 2013); the excellent results obtained with the Random Forest show the validity of their use especially in the case when the sample size is particularly high, making these models a viable alternative to models based on traditional statistics.

Furthermore, as in the case of CARTs, Random Forests are very useful for overcom- ing problems of high data dimensionality, allowing to obtain a subset of variables determined by a particular measure of importance. This is one of the reasons why they are also often used in studies related to genetics, in which the number of vari- ables is much higher than that of the observations (Schawarz, König, and Ziegler, 2010).

Another estimation technique, applicable to both regression and classification prob- lems, which derives from an evolution of CART models is Boosting. It is a method that is based on the iterated union of very simple estimators, such as Decision Trees formed by a single node, in order to create one that has better accuracy and high performance. The innovation of this model lies in the introduction of a function ("pseudo-loss") that forces each new estimator to focus its attention on the portion of data that has not been correctly estimated in the previous step, thus trying to fill in the errors. The Boosting model is completed by the weighted combination of each individual binary tree, depending on their accuracy (Freund and Schapire, 1996).

Again, much has been said about the high performance of this particular technique and many proposals inspired by it can be found in literature. Two particular models have become more known, Adaptive Boosting (or AdaBoost) and Gradient Boosting, the last one has found its application in Credit Scoring problems and in the calcu- lation of the Probability of Default, showing the ability to manage the estimate of extreme values (Fonseca and Lopez, 2017).

As already mentioned, the combination of several models of the same type allows to improve the estimates both in terms of performance and stability. In the context of Credit Risk, it has also been tried to apply this logic to methods different from Decision Trees (Yu et al., 2010).

Although the results obtained are promising, these models find some computational difficulties in the combination of very complex models and in the risk of overfitting the estimations.

The models available in the literature are a large number and they all have particular characteristics that make them unique, as well as cons that limit their performance and application; for example, although CARTs are very accurate and intuitive esti- mation methods that can also automatically select the most important variables, they are not able to approximate mathematical functions with frequent irregular peaks.

In the same way, Neural Networks are able to achieve very high performances but

they can obtain very different results on the same dataset by changing even slightly

the initial settings with which the machine has been trained. In addition, other more

(22)

complex models may require repeated calibration which often leads to overfitting problems.

One way to overcome these limitations is the use of hybrid models, that is, made up of the union of different estimation techniques, combining Machine Learning and Artificial Intelligence models. Analyzing a dataset of a bank in Taiwan, it has been shown how a hybrid model based on the combination of a Logistic classifier and a Neural Network can achieve very high performances and maximize profits (Chen and Tsai, 2010).

The researchers studied in depth the estimation of the PD component establishing sophisticated measurements. For the other two components set forth in banking regulation, LGD and EAD, innovations are scarce and studies have been proposed only recently.

Some of the proposals concern alternative methods of transforming and selecting variables from the original dataset in order to reduce the size of the data and keep only the most important information. Others are based on the software’s ability to handle an ever increasing amount of data by changing the concept of information and creating more complex and informative databases or by creating time series on which to obtain historical information (Khandani, Kim, and Lo, 2010).

Furthermore, it has been demonstrated that the majority of the Machine Learning and Artificial Intelligence methods adopted for the estimation of the Probability of Default can also be used for the LGD and EAD components. Among these, the Sup- port Vector Machines (SVM) methodologies are able to minimize the empirical clas- sification error and maximize the geometric margin of the components underlying these methodologies (Bhavsar and Panchal, 2012).

For this reason, there have been several uses of SVMs in credit risk assessment (Yu et al., 2010) capable of creating a data division surface that maximizes the distance between observations of different classes, obtain particularly satisfactory results and comparable, in terms of performance, to those of the Neural Networks (Huang et al., 2004) and in some studies they also obtain better results than the latter (Ghodselahi, 2011, Chaudhuri and De, 2011).

In the context of credit risk, a further area of application of machine learning is rep- resented by the Support Vector Machines (SVM) models in the classification of the borrower with the assignment of a probability of default (PD).

SVM appears as a powerful and particularly useful technique for data whose distri- bution is unknown (in the presence of "non-regularity in the data"). In addition to its simpler linear version, SVM can also be used for non-linear data with the advan- tage of not requiring any a priori hypotheses on its functional form of distributions.

Since this method allows to separate the data with the maximum possible margin, SVM proves to be a robust approach and capable of handling different types of data:

dataset with the presence of "noise" or biased training data.

In fact, SVM relies on the available information (training dataset) to perform a divi- sion between the data itself, via hyperplanes, which thus allows to more efficiently classify the "positive" event (which usually represents the trigger event or the bor- rower default) estimated from the observed real value. For this reason, the SVM model must be carefully estimated, otherwise the accuracy of this method may not give satisfactory results. Furthermore, it should be remembered that the SVM tech- nique is closely related to regression. For linear data, in fact, we can compare the SVM outputs with those obtained with a linear regression while the "non-linear"

version of SVM is comparable with the outputs obtained with logistic regression.

The SVM method has been tested over the years by some empirical studies that have

mainly focused on the problem of estimating the creditor’s probability of default,

(23)

obtaining interesting results. However, one of the main disadvantages associated with the SVM method is the issue of its calibration. On the other hand, it must be said, this methodology allows to be able to interpret the results produced by the estimations in a very intuitive way, i.e. through its graphic display.

In this regard, (Hardle, Moro, and Schafer, 2005), wrote the discussion paper pub- lished by the Collaborative Research Center 649 Economic Risk, in which they an- alyzed a sample of large American companies (with a capitalization exceeding one billion dollars) considering those that survived and those failed in 2001. The study presents the results of the classifications obtained with SVM and demonstrates how the variation of the parameters of the Kernel function (sensitivity analysis of the SVM) allows to appreciate - through a scatter plot - the areas associated with the different levels of estimated probability of default. In particular, as the complexity of the classification function increases, the graphical representation becomes more detailed, circumscribing the areas of "survival" from those of "default" more clearly.

Subsequent studies, such as that of (Min and Lee, 2005), applying the SVM model in forecasting corporate bankruptcies, have demonstrated the attractiveness of the predictive power of SVM techniques compared to existing methods. In particular, mapping the input vectors in a multi-dimensional space (as carried out by SVM) allows to transform complex problems into simpler problems that can be analyzed with linear discriminating functions.

Furthermore, the Mathematical Institute of the University of Utrecht in 2016 (through the master’s thesis of Groot, 2016) published a study on the models for credit risk, successfully applying the SVM method. The analysis was carried out by studying the credit rating system used by Freddie Mac. Before the famous financial crisis, the models used by this intermediary were totally unable to predict the increased risk of defaults after 2007. The author of the empirical study, analyzing the Federal Home Loan Mortgage Corporation data set, in the period for the years 2001-2007, intro- duced macro-economic variables in addition to the set of budget ratios usually used previously in the literature. In particular, the study of the Mathematical Institute mentioned above carried out the empirical analysis with a modified version of the SVM through the assignment of (high) weights for the default cases. This approach is motivated by the fact that default cases are in a numerical minority with respect to performing loans, thus constituting a substantially unbalanced sample (this version of the model is called "weighted SVM"). Furthermore in this case, the final results showed that the SVM method could be considered as a good model able to discrim- inate between classes of different PD.

Basically, the results of the studies conducted so far by researchers, bankers and other practitioners have shown that Logit models, as well as those used in discrim- inant analysis traditionally used in the financial world, showed their inefficiency in correctly classifying a set of data that had the characteristic of being non-linearly separable.

In another study by the Collaborative Research Center 649 Economic Risk, the au-

thors (Hardle, Moro, and Hoffmann, 2010) applied the SVM model for the empirical

analysis on a sample of German companies (20 thousand solvent companies and one

thousand insolvent) in the period from 1996 to 2002. The use of SVM, applied to 25

financial ratios, has allowed to estimate an individual PD for each company with

an overall superior performance of the model in terms of classification compared

to traditional Logit models. The study, in fact, concludes that based on the empir-

ical evidence (applied to German companies), the SVM-based rating models show

an over-performance compared to the traditional parametric models with particular

regard to the prediction of the probability of default out of sample. The mentioned

(24)

study reports, in particular, an Accuracy Ratio (AR) level of 60.51% for the final model with 8 ratios (selected from the 25 analyzed). A previous study (Chen, Har- dle, and Moro, 2006) based on a sample of different size but on the same population (medium-sized German companies) and time horizon, reports that the percentage of correct classification on the “out-of-sample” observations (also defined as Accuracy and calculated on the basis of the Confusion Matrix) - for the model with 8 ratios - is equal to 71.85% for the SVM against a 67.24% for the Logit model. In addition, the SVM model (considering a series of simulations) reaches a median that exceeds 60%, at the AR level, compared to the Logit model which fails to significantly exceed 35%

of the AR.

In recent decades, banking crises have made it necessary to adopt early warning systems, models that can identify the weaknesses of macroeconomic indicators as quickly as possible. Through their use, therefore, banks have the opportunity to make safer choices by reducing the risk of bankruptcies.

An Early Warning model, however, is not easy to implement because of the large number of variables that constitute the indicators of which it is composed and due to the particular nature of the target variable.

These problems make the use of traditional statistical methods unsuitable because of poor performances. Techniques such as logistic regression or the Cox model fail to take into account all the financial causes that lead to the bankruptcy of a bank, while the Bayesian models are constrained by the explanation of an often difficult to define hypothesis that can in some cases be too much stringent.

Even more sophisticated models, such as those related to Machine Learning and Artificial Intelligence, however, find several difficulties, first of all the possibility of incurring over-fitting problems, reducing their discriminating power and conse- quently their applicability.

For this reason, in many studies it has been chosen to adopt multiple estimation methods, combining performance together in order to fill the gaps of each model and thus obtain better predictions. One of the proposals concerns the use of a hybrid technique consisting of a Support Vector Machine estimation model together with the Rough Sets classification technique. The former is chosen for its ability to reduce overfitting and the problems related to the restricted domain of the variable to be estimated, being based on the principle of minimizing structural risk; the second, however, allows to overcome the problem of noise present in economic data and eliminate redundant information (Feng and Pang, 2006).

Another technique used in Early Warning problems, tested on data from various US banks, concerns the adoption of models structured as simple Neural Networks which, however, assign rules for the interpretation according to Fuzzy Logic, a multi- purpose logic that results in an extension of the Boolean one and therefore suitable for this particular risk calculation (Fu, Nguyen, and Shi, 2007).

In doing so, a neuro-fuzzy model is not only able to obtain better results than simple Neural Networks, but it is also able to build reliable causal relationships between variables (Chang et al., 2008).

1.1.2 Financial Risk

Recent developments in financial markets have brought to light numerous irreg-

ularities in these markets, such as negative interest rates, illiquidity and extreme

volatility (Burro et al., 2017); these features highlight the gaps in the tools of classical

statistics in adapting to changing and complex events.

(25)

These shortcomings have led the literature to identify different solutions in Machine Learning techniques to adapt to the continuous changes of the markets. Academic evidence shows that these methods are the best for approaching key elements of the financial market such as interest rate term structures (Sambasivan and Das, 2017).

One of the most used techniques in this field are Artificial Neural Networks (ANN);

which are algorithms used to solve complex problems whose features are not easy to understand. They are considered as a cornerstone in Machine Learning and Deep Learning.

Artificial Neural Networks (ANN) derive their name from the fact that their work- ing principle is similar to the information management mechanism of the human brain: the natural brain is composed of many neurons suitably connected with the synapses, while the neural networks are made up of nodes grouped on different lev- els and interconnected by means of weighted arches, which map the greater or lesser importance of a connection between two neurons. The number of neurons in each level can vary according to the needs of the algorithm.

Using the paradigms of neural networks, it is therefore possible to approach prob- lems of classification and recognition of extremely variable and irregular environ- ments, whose features cannot be detached using traditional statistics methods.

A widespread application of neural networks is the modeling of interest rates term structures (Cafferata, Giribone, and Resta, 2018). There are numerous studies in which it is shown that the performance of these models is clearly superior to the classic regressive approaches (Josepha, Larrain, and Singh, 2011).

In the modeling of term structures, classical regressive approaches, based on a- priori defined functions, need further ex post calibrations, while neural network approaches always provide optimal fitting, even in the case of strong irregularities (Giribone, 2017).

It was also highlighted that the estimate of the yield curves through neural networks is very versatile and the underlying model can also be adopted in the forecasting of market prices, for example to determine the price of crude oil (Baruník and Malin- ská, 2016).

The problem of an a-priori definition is also overcome in the automatic detection of trading opportunities, so the literature has obtained more efficient results by apply- ing neural network techniques such as Self-Organizing Maps (Kohonen, 1982) in the identification of market anomalies (Cafferata and Giribone, 2017) or by improving the standard technical indicators using Non-linear Autoregressive Networks (Giri- bone, Ligato, and Penone, 2018).

The use of neural networks is also suitable for forecasting studies since it allows to identify recession cycles (Gogas et al., 2015) and typical patterns of the historical time-series such as seasonality (Caligaris and Giribone, 2018).

These algorithms are also very powerful for detecting patterns in the case of large amounts of missing data. A typical problem in pricing derivatives is the lack of relevant portions of information on the volatility surfaces; the problem of the re- construction of missing data is often solved by traditional methodologies, such as interpolation, which however is not able to produce reliable estimates for large por- tions.

Being tools suitable for a non-linear analysis of the underlying data, on the con- trary, Neural networks are able to obtain a reliable pricing in absence of significant quantities of input data; because they allow to study globally the volatility surface (Caligaris, Giribone, and Neffelli, 2017).

Radial Basis Functions (RBF) neural network, proposed for the first time by (Broom-

head and Lowe, 1988), is becoming increasingly popular in the world of financial

(26)

markets. This type of neural network uses radial basis functions as activation knots, for which the values depend exclusively on their distance from the origin.

For this reason it represents a very flexible instrument that is suitable to any market characteristic and it has performances that are clearly superior to traditional para- metric techniques, especially in case of extreme market conditions such as negative rates and extreme volatility (Cafferata et al., 2019).

RBF neural networks are also widely used in calculating the fair value of options (Fioribello and Giribone, 2019) as they allow the resolution of PDEs based on the fundamental Black-Scholes-Merton PDE and PIDE (Larsson et al., 2013) very effi- ciently.

Pricing of financial instruments through the radial basis functions allows higher pre- cision than the traditional integration schemes (FDM and FEM), provided that it has been correctly defined the shape parameter (Giribone and Ligato, 2015).

1.1.3 Operational Risk

Operational risk (OpRisk) is defined by the Basel Committee for Banking Supervi- sion (BCBS) as the risk of losses resulting from inadequate or failing internal pro- cesses, people and internal systems or from external events including legal risk.

Within the new Basel III framework, the standardized approach to measure the minimum capital requirements to cover operational risk replaces all the existing ap- proaches in the previous version of Basel II.

This new standardized approach is based on:

• Business Indicator (BI) as a balance sheet proxy for operational risk;

• Business Indicator Component (BIC) calculated by multiplying BI by a set of marginal coefficients determined by regulation;

• Internal Loss Multiplier (ILM) as a standardization factor based on a bank’s average loss history.

The literature of the 2000s has traditionally been dominated by approaches aimed at measuring, controlling and managing operational risk through the ex post cal- culation of hedging capital, thus implicitly assuming that events at risk occurred, not only once, but may occur again (see Cruz, 2002 for a broad and comprehensive overview).

However, it is essential to highlight the specificity of the operational risks which makes them natively different from other risk categories: the events follow a power law distribution (or Pareto distribution) having fat tails and rare events.

This implies that not all risky events can be "learned" from the historical data series, given the nature of the rare events typically accompanied by a very low or even zero number of occurrences for some cases.

At the same time, empirical evidence demonstrates the existence of some precursor signals or patterns that can be identified and highlighted with ML techniques aimed at implementing an early warning system.

The most recent literature has in fact placed itself in an ex ante risk management perspective, that is to predict before the event occurs, by taking advantage of the

"learned" event regularity on the available data.

If on the one hand, this approach has the advantage of positioning itself with a view

to preventing rather than managing ’damages’, on the other hand the analysis be-

comes decidedly more complex due to the known limits in the historical data series

(27)

and the specificity of the data themselves (fat tails and rare or even never happened event).

As a general approach, risk events could be classified into three macro-categories:

• Frequent and known events

• Known events but not frequent

• Unknown and extremely rare events

In a typical financial or non-financial production activity, the focus is on the first two categories of events and most of the literature has studied and proposed useful models in this sense. A classic example of a known and frequent event (group 1) is fraud in the payment system.

The main methods of managing this event are based on a two-phases analysis in which the characteristics of customers’ behavior are initially extracted in the pay- ment phases (usually by credit card) and secondly these same characteristics are used to classify every single transaction and authorize it upon request to the cus- tomer.

The banks and corporations issuing the main credit cards began in the mid-90s to test and implement the first neural networks (even if much simpler than the current ones) to detect fraud, ( Ehramikar, 2000).

Combined systems of more classical methods based on decision trees with neural networks are currently being successfully used. If we move out of purely banking fraud Vocalink Analytics (a Mastercard company) and NatWest have launched Cor- porate Fraud Insights, a company specializing in the use of advanced methods of ML, AI and behavioral analytics to determine the level of risk of a payment.

In the case of rare events but with a potentially high impact (group 2) you could be in the unpleasant condition of not having examples, or having very few of them, of the event of interest. In this context, the use of the traditional Machine Learning techniques which by definition need examples is more complex. If you try to train an algorithm in cases of rare events, you have the risk of running into overfitting or underfitting problems.

In other words, the model adaptively learns the specific characteristics of the avail- able data and it is almost unable to generalize (extrapolate) satisfactorily on new data. On the contrary, in cases of underfitting, the model excessively simplifies the real generative mechanism of the data, resulting once again in underperforming in the extrapolation phase (systematic underestimation of the risk).

It is therefore essential to ’help’ the data and models with the experience of experts;

recent literature has shown that a mixed approach (data + expert driven knowledge) can significantly improve performance (see Gigerenzer and Brighton, 2009, Gigeren- zer and Gaissmaier, 2011).

Therefore, in the second type of events (low frequency, high impact), recent literature has shown that simpler ML approaches, supported by expert information, not only offer competitive performances but also reduce the so-called model risk. The latter refers to the risk associated with the use of "black box" models or algorithms with a particularly complex structure, difficult to interpret and potentially unstable with varying, albeit minimal, of the set of data.

Typical examples of operational risks falling within this group are the events asso- ciated with money laundering, the incidence of which is in the order of 1% and it is rather difficult to obtain useful datasets for training models.

As for the third type, unknown and extremely rare events, analysis becomes partic-

ularly complex. This typology is typically characterized by two aspects:

(28)

• A gap between the actual financial loss and the potential worst case scenario.

• The complexity of the primary cause which is typically given by the intertwin- ing of various elementary causes.

The level of knowledge for this category of events is very low since no examples are available, consequently an unaware application of an ML model can become an additional source of risk (i.e. model risk).

At the same time, the typical power-law distribution for operational risks can repre- sent a guide, albeit qualitative, for understanding the unknown events. The studies suggest that even unknown and extremely rare events can be broken down into smaller and elementary events that can be a-priori known. Therefore the research is trying to exploit these basic events considering them as precursors to be able to make inferences on the relative high impact losses.

Table 1.1. shows the types of ML methods based on the three types of events as classified by (Milkau and Bott, 2018).

The most recent approaches in operational risk use the most innovative methodolo- gies generally defined as deep learning. In particular Convolutional Neural Net- works (LeCun et al., 1999, Szegedy et al., 2015) or the most recent Long Short-Term Memory feedback network (LSTM, Hochreiter and Schmidhuber, 1997) are some of the main examples of the latest methodologies.

However, it is important to highlight that any ML model needs large databases with previous events and this can constitute the main limitation of application in the Op- erational Risk field whose intrinsic specificity lies in the rarity of some events or in extreme cases in the total absence of examples.

1.2 Author’s research work

This first part of the Ph.D. Thesis is mainly based on the following Author’s research work:

1. AGOSTO A., GIRIBONE P. G. (2019) Machine Learning methodologies in ap- plication to market risk, Conference: "Artificial Intelligence in Risk Manage- ment", 11th April 2019, Milan (Agosto and Giribone, 2019)

2. BONINI S., CAIVANO G., CERCHIELLO P., GIRIBONE P. G. (2019) Artifi- cial Intelligence: Applications of Machine Learning and Predictive Analytics in Risk Management. AIFIRM (Italian Association of Financial Industry Risk Managers) Position paper N. 14 (Bonini et al., 2019)

3. CAFFERATA A., GIRIBONE P. G. (2017). Non supervised learning paradigms for neural networks in financial markets: design of a self-organizing maps to track market anomalies. Risk Management Magazine Vol. 12, N. 2 (Cafferata and Giribone, 2017)

4. CAFFERATA A., GIRIBONE P. G., NEFFELLI M., RESTA M. (2019). Yield

curve estimation under extreme conditions: do RBF networks perform bet-

ter? Chapter 22 in book: "Neural Advances in Processing Nonlinear Dynamic

Signals" – Springer (Cafferata et al., 2019)

(29)

5. CAFFERATA A., GIRIBONE P. G., RESTA M. (2018). Interest rates term struc- ture models and their impact on actuarial forecasting. QFW18: Quantitative Finance Workshop, UniRoma3 - Rome (Cafferata, Giribone, and Resta, 2018) 6. CALIGARIS, O., FIORIBELLO F., GIRIBONE P. G. (2017). Fuzzy C-means clus-

tering as automatic algorithm for the detection of market anomalies. AIFIRM (Italian Association of Financial Industry Risk Managers) Magazine Vol.12, N.

1 (Caligaris, Fioribello, and Giribone, 2017)

7. CALIGARIS O., GIRIBONE P. G. (2015). Modeling the yield curves through machine-learning techniques: analysis and comparison between traditional re- gressive methods and neural networks. AIFIRM (Italian Association of Finan- cial Industry Risk Managers) Magazine Vol. 10 N. 3 (Caligaris and Giribone, 2015)

8. CALIGARIS, O., GIRIBONE P. G. (2018). Modeling seasonality in inflation in- dexed swap through machine-learning techniques: analysis and comparison between traditional methods and neural networks. Risk Management Maga- zine Vol. 13, N. 3 (Caligaris and Giribone, 2018)

9. CALIGARIS O., GIRIBONE P. G., LIGATO, S. (2015). Application of a feed- forward Neural Network for the reconstruction of volatility surfaces. AIFIRM (Italian Association of Financial Industry Risk Managers) Magazine Vol. 10 N.

1 (Caligaris, Giribone, and Ligato, 2015)

10. CALIGARIS O., GIRIBONE P. G., NEFFELLI M. (2017). Volatility surface re- construction through auto-associative neural networks: a case-study based on Nonlinear Principal Component Analysis. Risk Management Magazine Vol.

12 N. 3 (Caligaris, Giribone, and Neffelli, 2017)

11. DECHERCHI C., GIRIBONE P. G. (2020). Prospective Estimation of financial and risk measures using dynamic neural networks: an application to the US market. Risk Management Magazine Vol. 15 N. 1 (Decherchi and Giribone, 2020)

12. FIORIBELLO S., GIRIBONE, P. G. (2018). Design of an artificial Neural Net- work battery for an optimal recognition of patterns in financial time series.

International Journal of Financial Engineering Vol. 5, N. 4 (Fioribello and Giri- bone, 2018)

13. GIRIBONE, P. G., LIGATO S., PENONE, F. (2018). Combining robust Dynamic

Neural Networks with traditional indicators for generating mechanic trading

signals. International Journal of Financial Engineering - Vol. 5 N. 4 (Giribone,

Ligato, and Penone, 2018)

(30)

Fr equent "Known" Events Rar e "Unknown" Events "Unknown Unknown" Statistics of own risk event data Power Law or GDP Extr eme V alue Theory Use of external public data Pr oblem of unknown assumptions and methodologies OpRisk Self-Assessment Quantitative Enhancement Key Risk Indicators (KRI) Possibility for (delayed) for ecast Heuristics (in FLD) Danger of Bias Heuristics for ad-hoc actions Heuristics for best guesses Machine Learning (stats methods) Pattern Recognition Pr oblem of sensitivity Machine Learning (ANN) Enhanced Patter Recognition Pr oblem of sensitivity Machine Learnings + Heuristics Complex Patterns (Fraud Mgmt) Machine Learnings + Scenarios Example: Autonomous Cars Reinfor ced Machine Learning Example: Google AlphaGo Example: Google AlphaGo Zer o Machine Reasoning (Heuristics) Pr oblem Solving Dynamic Pr oblem Solving

T A B L E 1 .1 : Machine Learning techniques for the thr ee types of events

(31)

Chapter 2

Data Fitting and Regression

“On the day when I know all the emblems, - he asked Marco, - will I be able to possess my empire, at last?

And the Venetian answered: - Sire, do not believe it. On that day you will be yourself an emblem among emblems.”

Italo Calvino, Invisible cities, Conclusion of the first part

“Il giorno in cui conoscerò tutti gli emblemi, - chiese a Marco, - riuscirò a possedere il mio impero, finalmente?

E il veneziano: - Sire, non lo credere: quel giorno sarai tu stesso emblema tra gli emblemi”

Italo Calvino, Le città invisibili, Conclusione della parte I

This section deals with the most established Machine Learning techniques ded- icated to regression. Firstly, the working principles of a feed-forward multi-layer neural network are described and then the characteristics of a RBF (radial basis func- tion) network are discussed. In the last part some financial applications of these principles are proposed.

2.1 Theoretical Aspects

A neural network is a parallel distributed system, composed of simple units, able to synthesize knowledge by processing the information contained in the external data.

The Artificial Neural Network (ANN) is a type of Artificial Intelligence (AI) that simulates the learning behavior of human brain (Arbib, 2003).

ANNs have the feature to be able to model systems without the need to make a- priori assumptions on the mathematical function to use. On the contrary this is necessary for most traditional statistical approaches and, thanks to this peculiarity, machine learning technique has found successful applications in many areas of sci- ence and engineering (Principe, 2000).

Among the many feasible classes of ANN, the most frequent in the literature is the one that distinguishes the feed-forward and the recursive type networks, also called feedback networks.

In order to solve fitting, approximation and reconstruction problems, the first class

of nets is usually used, that is, feed-forward ones (Caligaris, Giribone, and Ligato,

2015).

(32)

Among these, the most widespread architecture is constituted by a graph, whose nodes (neurons) are arranged on several levels (layers) and interconnected in a single direction from one layer to another.

The networks configured in this way are defined MLP (Multi-layer perceptron net- works) and have a layer of input signal (input layer), one or more hidden layers that process the information - (hidden layers) and a layer of output (output layer), which makes the processed information accessible (Shamisi, Assi, and Hejase, 2010). This architecture is outlined in Figure 2.1.

F IGURE 2.1: Architecture of an MLP feed-forward neural network

The neurons that make up the input layer are acting as a buffer to distribute the input signals (i.e. the independent variables of the problem), x _i , i = 1, . . . , n to the neurons which compose the hidden layer (also called perceptron).

Each perceptron j processes its input signals x _i by summing them, after having weighted them with the power associated with each entering connection, ω _j,i . The neuron processes its output y _i , applying a function f on the result of this sum, according to y _j = f ∑ ⁿ i = 1 ω _j,i x _i .

The activation function f can typically be a step, a sigmoid or a hyperbolic tangent.

The signal transformation process of a neuron in the hidden layer is summarized in Figure 2.2.

F IGURE 2.2: The signal transformation process of a perceptron in a MLP network

The signal processed by the neurons in the output layer is similar to that of the

perceptrons in the hidden layer, with the only difference that f is a linear function,

since it must make the information directly usable to the outside. The responses to