Artificial neural networks for interest and exchange rates models calibration. An approach based on FX implied volatilities and swaptions prices

(1)

POLITECNICO DI MILANO

School of Industrial and Information Engineering

Master of Science in Mathematical Engineering

Artificial neural networks for interest

and exchange rates models calibration

An approach based on FX implied volatilities and swaptions prices

Supervisor: Prof. Marcello Restelli

Co-supervisor: Luca Sabbioni

Candidate:

Elia Mazzoni

Matr. 899709

18th December 2019 Academic Year 2018-2019

(2)

A quelli che mi hanno detto che ho un bel sorriso, e a coloro che mi hanno aiutato ad averlo

(3)

Ringraziamenti

Questa tesi magistrale è il coronamento di un lungo percorso, pieno di soddisfazioni e di stimoli, che mi ha cambiato profondamente. Con questo lavoro concludo il percorso di doppia laurea tra Politecnico di Milano ed Ecole polytechnique.

Un grande grazie al Prof. Restelli che mi ha dato fiducia fin da subito e mi ha trasmesso l’energia e la passione per il lavoro e la ricerca. Il suo supporto è stato essenziale e le sue idee sempre preziose e illuminanti. Ringrazio Luca Sabbioni per l’aiuto con questa tesi, ne ammiro la serietà, il talento e la capacità di sdrammatizzare che lo contraddistinguono. Un ringraziamento e un saluto anche a Lorenzo e Mirco per i bei momenti passati insieme in ufficio all’airlab.

Ringrazio i miei genitori che mi hanno sempre sostenuto e aiutato concretamente, e sui quali ho potuto contare nei momenti di difficoltà. Ringrazio i miei familiari e gli amici, soprattutto quelli più stretti con cui ho condiviso dei momenti di distensione che mi hanno fatto apprezzare il mio tempo libero e valorizzare i miei interessi e il sentimento dell’amicizia.

Un saluto ai miei prof delle scuole elementari, medie e superiori: sono sicuro che per raggiungere questo traguardo l’allenamento che ho fatto in precedenza con loro sia stato essenziale e determinante. Ringrazio infine me stesso, perché il coraggio ce l’ho messo io. E anche il cuore nel credere in me stesso e in ciò che ho fatto. Qui finisce un capitolo, adesso voltiamo pagina e si riparte. La storia continua.

(6)

Acronyms

ANN Artificial Neural Network. ATM At-the-Money.

BFGS Broyden-Fletcher-Goldfarb-Shanno. BM Brownian Motion.

B&S Black & Scholes. CE CrossEntropy.

CUDA Compute Unified Device Architecture. EONIA Euro OverNight Index Average. EURIBOR EURo InterBank Offer Rate. FX Foreign Exchange.

FFNN Feed-Forward Neural Network. FWD Forward Curve.

G2++ Two-additive Factor Gaussian Model. GPU Graphics Processing Unit.

IRS Interest-rate swap.

LIBOR London InterBank Offer Rate. ML Machine Learning.

MLP Multilayer Perceptron. MP Multiprocessor.

MRS Mean Reversion Speed. NN Neural Network.

NPV Net Present Value.

ODE Ordinary Differential Equation OIS Overnight Indexed Swap. PCA Principal Component Analysis. PDE Partial Differential Equation. ReLu Rectified Linear Unit.

SDE Stochastic Differential Equation. SL Supervised Learning.

VOL Volatility.

(7)

Sunto

Lo scopo essenziale di questa tesi è presentare dei modelli di tasso di interesse e tasso di cambio e valutare la calibrazione dei parametri di tali modelli ricorrendo all’uso di reti neurali e basandosi su un dataset composto di prezzi di swaptions europee At The Money e volatilità implicite di opzioni su tassi di cambio. L’aspetto innovativo è costituito dalla calibrazione simultanea di modelli di rates e FX, utilizzando modelli gaussiani a due fattori per le dinamiche rates, introducendo correlazioni incrociate tra rates e con FX, e utilizzando varie tecniche di trasformazione dei dati in output della rete neurale per garantire la validità delle matrici di correlazione. Tale calibrazione troverà applicazione successivamente nel pricing / hedging del desk xVA nella simulazione delle dinamiche rates e FX.

Parole chiave: calibrazione, tassi di interesse, tassi di cambio, swaptions, correlazione, rete neurale artificiale.

Abstract

The essential goal of this thesis is to present some interest and foreign exchange rates models and to evaluate the calibration of their parameters relying on the use of neural networks and of a dataset made of European ATM swaptions prices and implied volatilities of options on FX rates. The innovation is in the simultaneous calibration of rates and FX models, using two-factor Gaussian models for rates dynamics, introducing cross-correlations between rates and also with FX, and using different techniques to transform the outputs of the neural network to guarantee the validity of correlation matrices. Such calibration can then be applied for pricing/hedging purposes in the xVA trading desk in the simulation of rates and FX dynamics.

(8)

1 Introduction

“All models are wrong, but some are useful”. George E. P. Box

Starting from the ’80s, the financial world has seen an incredible explosion of interest in derivatives. These products nowadays cover the entire GDP of the world several times, if considered collectively. They are flexible, useful to hedge risks and to share them across investors and institutions, but they can also be dangerous. The main issues are related to their valuation (e.g. the correct pricing) and their hedging. In this thesis we will show how to calibrate interest and exchange rate models (which describe their dynamics) using the prices of European swaptions and the implied volatilities of options on exchange rates. For FX data, when we mention implied volatilities we mean the implied volatilities that are obtained by inverting the Black Scholes formula from prices of options on foreign exchange rates. In order to speed up the calibration process we used artificial neural networks. Instead of calibrating the models on the current prices observed in the market at each time, the approach studied in this thesis and also in [Sabbioni] is based on the calibration of the weights of the ANN. In this way, the calibration of the ANN is very time consuming but it is executed only once, and the consequent calibration of model parameters is immediate, since the parameters are the output of the ANN. After supplying our already calibrated ANN with the current prices observed on the market, we see the model parameters as outputs of the NN and they are computed very quickly. What is currently happening in the industry is that simpler models are used (e.g., Vasicek for rate dynamics), without correlation between rates and FX, and sometimes without a method that systematically calibrates on valid correlations between dynamics. The choice of the derivatives that make up the dataset for calibration is arbitrary as well, but since European swaptions are very liquid and frequently exchanged products, it often happens that they are in the panel that is finally used for calibration.

We start now with a brief description of the world of derivatives and we will focus on interest rate swaps and swaptions. Then we show the pricing framework we are interested in. After that, we will deal with machine learning algorithms (in our case, artificial neural networks) and some optimization tasks that had to be performed during our work. Finally, we will see the main results obtained and we will end with some conclusions.

We now mention that this project is also based on the previous work of three other students ([Cella], [Donati], [Sabbioni]), who also presented their Master’s theses at Politecnico di Milano in collaboration with Banca IMI:

- Leonardo Cella, former student of Computer Science and Engineering, trained neural networks based on the results of the calibrator provided by Banca IMI and on a Vasicek short rate model. In this way, no pricing of swaps/swaptions were present, but only a calibration on mean reversion speed and volatility.

- Andrea Donati, former student of Computer Science and Engineering, used analytical pricing of swaptions with Vasicek model and calibrated as well the parameters of the model with a black-box approach.

- Luca Sabbioni, former student of Mathematical Engineering, chose a more complex rates model, the G2++, and used a neural-network-based calibrator as an alternative to the calibration part, in order to provide the best parameters for the model.

(9)

with an extension to multi-currency calibration, the presence of correlations and an optimization of the parameters contained in the exchange rate models. We provide with this thesis the expression of the integrated variance of forward FX rates when short rates are distributed as G2++ processes and when the exchange rate is lognormal, with correlations among rates and between rates and FX. We will see that by non-arbitrage the drift of the spot FX rate is not a free parameter, so the FX dynamics introduce in our framework as new free parameters only the volatility of the FX rate and the correlations with the other Brownian motions. Our approach is based on the use of an artificial neural network that is trained on a panel of swaptions prices and FX implied volatilities. We checked such a method to see that the ANN, once calibrated, finds the correct parameter values entered by the tester. We studied the performance of such a calibration on synthetically generated data. In addition to that, when the number of neurons and the number of input data are high enough, we train on synthetically generated data without overfitting the test data. This thesis also offers the reader four ways to impose that the correlations between the Brownian motions produce systematically valid correlations between asset dynamics, which otherwise would not occur based on the only calibration process. In fact, when we execute the calibration without additional constraints on the fact that the overall covariance matrix has to be semipositive definite, very often we end up with correlations between Brownian motions that would not imply valid correlations between rates or with FX. On the contrary using our approach we can execute a calibration that provides as final parameters some correlations which are part of a coherent and consistent model. This issue was already present when the traders of the xVA desk presented the idea of the project, so we think that our calibration with ANN and our methods to obtain valid correlations are useful and mathematically coherent approaches to face the problem of calibration for value adjustment purposes. Our four techniques are presented in this order: (1) the additional term in the loss function, (2) the projection onto the space of semidefinite positive matrices, (3) the multiplication of a matrix by its transpose, and (4) the parameterization with angles. Overall, with this thesis, for the first time, we calibrate the parameters present in FX models, which have a big impact on the simulations done by IMI Bank when making the value adjustments for its portfolio of derivatives. In fact, the previous theses only concerned parameters in rates. Finally, as an innovative contribution, we also computed the partial derivatives of the integrated variance in a flexible framework (G2++ dynamics for short rates, lognormal diffusion for FX, with correlations between all the pairs of BMs) with respect to the parameters, and this can be used for backpropagation when we train the ANN.

We structure this thesis as follows. First, in Chapter 1, we present the most common derivatives and we recall some useful results of stochastic calculus. Then we continue with describing Gaussian models for short rates (with one and two factors) in Chapter 2. In the following chapter we show how to price IRS and swaptions, and why practitioners use a multi-curve framework. After that, in Chapter 4, we deal with neural networks, showing how they are used in general and in our problem. We continue with the description of the calibration process in the fifth chapter, and then we present the dataset and the preprocessing procedure in Chapter 6. We then explain in chapter 7 how the optimization process is structured and how we parallelized some tasks. At that point, we present in the eighth chapter four possible techniques that systematically let us have valid correlations. Then we show our main results and we mention what we can obtain with our algorithms. We finally conclude in chapter 9 with some remarks, comments on the applications in the trading desk of xVA, and we give a few hints for further research.

(10)

1.1 The world of derivatives

From a general point of view, derivatives are contracts whose value depends on an "underlying". In a sense, they are therefore composite. According to the type of underlying, we can classify them. The main underlying assets may be equity, interest rates, exchange rates, commodities, or credit risk. Other classifications can be made on the types of rights included in the contract (e.g., between issuer and purchaser): the most important ones are options, futures/forwards, and swaps. Futures and forwards are linear payoffs in the underlying, which means that they are an obligation to buy the underlying asset at a predetermined price at the agreed maturity in the future. Options are instead the right, but not the obligation, to do such an action. This means that, at maturity, they can be exercised or not, according to market conditions. In particular, they will be exercised by the purchaser only if it is profitable for her. The options are divided into call (i.e., the right to buy the underlier) and put (i.e., the right to sell the underlier), and also into vanilla (if the payoff depends on the only value of the underlier at maturity) or exotic (if the payoff of the option is a function of the entire trajectory followed by the asset value from the starting date until maturity, which in practice means that they are more complex functions of the underlier). Swaps are exchanges between the seller and the buyer, usually consisting in an exchange of a deterministic amount of money with a random one. The main swaps are CDS (on credit) and IRS (on interest rates). There are also the so-called "variance swaps" which are actually forwards on the realized variance, and also volatility and gamma swaps. In general, when in a portfolio, derivatives can be long or short. This distinction refers to the fact that they were respectively bought or sold. Derivatives can also be OTC or traded on standard markets.

1.2 Notions of stochastic calculus

In this subsection we just collect and recall (focusing only on the main properties, without going rigorously into all the technicalities) a few useful theorems and definitions that have been used in this thesis and appear in some proofs. Stochastic calculus is a type of calculus that is deeply used in quantitative finance, mainly for derivatives pricing and hedging. In some points it gives different results than the standard Newtonian calculus. This calculus relies on the use of Brownian motion to model stochastic processes.

Definition 1.1. Brownian Motion

A Brownian motion is a stochastic process {Wt}t≥0 with three properties:

1) It starts in zero (W0= 0)

2) It has independent increasing:

Wt4− Wt3 ⊥ Wt2− Wt1

with t4≥ t3≥ t2≥ t1≥ 0

3) It is normally distributed

Wt∼ N (0, t)

Definition 1.2. Stochastic integral

(11)

motion (Wt)t≥0 in this way: Z b a XtdWt= lim max i (|ti+1−ti|)→0 N −1 X i=0 Xti(Wti+1− Wti),

where the sum is on a partition of [a, b] (i.e. a = t0 < t1< .. < ti< ti+1 < ... < tN = b) and the

limit is for max

i (|ti+1− ti|) → 0 and with a norm of the quadratic mean.

Definition 1.3. Itô process

We will say that (Xt)t≥0 is an Itô process if there exist two random processes (us)s and (vs)s

such that it can be written

Xt(ω) = X0(ω) + Z t 0 us(ω)ds + Z t 0 (vsdWs)(ω),

for a.e. ω ∈ Ω. With a shorter notation, the same statement is commonly written as

dXt= utdt + vtdWt.

Theorem 1.1. Itô lemma

Let {Xt}t be an Itô process dXt= µtdt + σtdWt , and let f be a function of class C1,2([0, T ]XR),

s.t. (t, x) → f (t, x), then the stochastic process Yt = f (t, Xt) is an Itô process that verifies the

equation df (t, Xt) =  ∂f ∂t(t, Xt) + µt ∂f ∂x(t, Xt) + 1 2σ 2 t ∂2f ∂x2(t, Xt) dt + σt ∂f ∂x(t, Xt) dWt.

Definition 1.4. Stochastic differential equation

A stochastic differential equation (SDE) is a differential equation that contains a stochastic process. It can be written as

dXt= µ(t, Xt)dt + σ(t, Xt)dWt, X|t=0= X0

where (Wt)t≥0is a vectorial Brownian motion adapted to a probability filtered space (Ω, F, Ft, P ),

and X0 is F0-measurable.

1.3 Short rate modeling

Rate derivatives are contracts whose value depends on the underlying, which in this case is an interest rate. The pricing of such contracts is model-dependent (see [Barucci, Marsala, Nencini, Sgarra]). Therefore, it is crucial to properly choose an appropriate model of the short rate dynamics. Short rates are the most basic element when modeling derivatives on rates: they are a theoretical tool that ideally corresponds to an instantaneous rate, i.e. applied to an infinitesimal amount of time. In practice, all rates on the markets are on discrete time, but sometimes it is very convenient and even common to use a continuous time framework in finance. The most common dynamics for a short rate is the following SDE

(12)

where W (t) is the value of a Brownian Motion, µ r(t), t and σ r(t), t are respectively the so-called drift and volatility. The term in dt is deterministic, while the one in dW (t) is stochastic. This equation entails that the price of a ZCB is described by this PDE:

∂p ∂t + σ2 2 ∂2p ∂r2 + q(r(t), t) ∂p ∂r − rp = 0, where q(r(t), t) = µ r(t), t − λ r(t), tσ r(t), t.

Actually the presence of the market price of risk λ is justified by the fact that, unlike other derivatives (such as European options on equity, which can be hedged by trading the underlying), in this case the short rate is not directly tradable. It can be shown that for µ and σ affine on the short rate (i.e., µ(r(t), t) = α(t)r(t) + β(t) and σ2_{(r(t), t) = γ(t)r(t) + δ(t)) then the ZCB value}

gets a particular form:

p(r, t, T ) = eA(t,T )−rB(t,T ).

By injecting this espression of r(t) into the PDE of the short rate, we get the PDEs that regulate

A(t, T ) and B(t, T ): (_∂B ∂t + α(t)B(t, T ) − 1 2γ(t)B 2_{(t, T ) = −1, B(T, T = 0)} ∂A ∂t = β(t)B(t, T ) − 1 2δ(t)B 2_{(t, T ), A(T, T ) = 0.}

(13)

2 Rates derivatives

In this thesis we will mainly focus on rates derivatives, in particular swaptions. They are options on IRS, for this reason we are going to introduce first the IRS and after that we will deal with swaptions.

2.1 Interest rate swap

Interest rate swaps (IRS) are contracts based on the exchange between two agents of two cash flows, whose one is fixed in advance (deterministic) and the other one is a random variable (so its amount is not known in advance). These two amounts are called legs. The random one is called floating leg, and the other one is the fixed leg of the swap. Normally the floating leg was simply the LIBOR or the EURIBOR (or the LIBOR plus a certain number of basis points). LIBOR and EURIBOR are two interbank rates (see next chapter for more details) and are often chosen as underliers for IRS. An other necessary element to define an IRS is the notional: it is the amount of money to which the legs are applied when computing the cash flows.

Let’s introduce some notations to be more precise. Let t ≤ T0be the current time, and T0, T1, ..., TN

a set of dates, where T1, .., TN correspond to the dates scheduled for exchanging the cash flows of the

swap. If the dates are equidistant (equispaced), then in particular Tn= T0+nδ, where δ = Ti−Ti−1.

The coupon payed by the floating leg is cn = δL(Tn−1, Tn)Notional , where L(Tn−1, Tn) is the

Libor rate determined at time Tn−1 for the time interval [Tn−1, Tn].

We recall here that

L(Tn−1, Tn) =

1 − ZCB(Tn−1, Tn)

δZCB(Tn−1, Tn

.

The Libor rate is obtained as a simply compounded rate. If we denote by p(tn−1, tn) the price of a

ZCB between two consecutive times tn−1 and tn, then its payoff is 1 at time tn.

In general, the simple discounted value of a product that pays out G after a time interval of length

τ with constant rate i is

P = G

1 + τ i, so in our case we can write

p(tn−1, tn) = 1 1 + δL(tn−1, tn) which implies L(tn−1), tn) = 1 p(tn−1,tn)− 1 δ

and from there we obtain the LIBOR rate

L(Tn−1, Tn) =

1 − ZCB(Tn−1, Tn)

δZCB(Tn−1, Tn

.

The fixed leg is simply Notional δr, where r is the swap rate. The cash flow at time Tnis therefore

Notional δ(L(Tn−1, Tn) − r), and when we discount it to the current time t it becomes

Notional

ZCB(t, Tn−1) − (1 + δr)ZCB(t, Tn)

(14)

The value of the contract at time t is the sum of all the discounted cash flows it will provide: V (t) = Notional N X n=1 ZCB(t, Tn−1) − (1 + δr)ZCB(t, Tn) .

There is a particular value of the rate that makes the value of the swap zero at time t: it is called

par swap rate. It is computed as

¯

r = ZCB(t, T0) − ZCB(t, TN) δPN

n=1ZCB(t, Tn)

.

What is remarkable and we should pay attention for, is that the swap rate (which is a kind of fair strike for an interest rate swap, like a fair strike of a variance swap, etc.) only depends on ZCB prices. Once we fix the dynamics of short rates in our framework, we can get the price of ZCB and from that we have also the swap rates. We can rewrite that as

¯ r = N X n=1 ZCB(0, Tn) PN n=1ZCB(0, Tn) L(0, Tn−1, Tn)

that underlines the fact that the swap rate is a weighted average of the forward LIBOR rates at different expiries.

2.2 A new reform of benchmark rates

In this chapter we discuss a few changes that will affect financial markets because of the reform of some benchmark rates ([Reuters. Factbox: The global benchmarks replacing Libor]). EONIA (Euro Over Night Index Average) and LIBOR (EURo Inter Bank Offered Rate) will be substituted by the so called ESTER.

The maturities for EURIBOR are 1-3-6-12 months. From a practical point of view, EURIBOR is considered as a benchmark rate for transactions in EUR, while LIBOR is dollar-based and makes reference to transactions in various currencies (USD, EUR, CHF, GBP, JPY). EURIBOR will not be dismissed after this reform, but its computation will be slightly modified it will be computed as an average on a larger panel of transactions, involving a bigger number of institutions, and in practice the possible changes with the current definition is estimated with an order of a few basis points (1-5 bps).

The LIBOR rate was computed in this way (simplifying a bit the procedure): a panel of banks (8 to 16) was chosen daily and they were asked to communicate to the BBA (British Bankers’ Association) through the intermediation of Thomson Reuters the respective rates they would accept to borrow money on the interbank rate. The extreme rates (lowest and highest), up to four maximum, were discarded, averaging at the end on the only eight central rates. A critical point is the fact that LIBOR is not an average on market observed rates, but only on values which are communicated by banks after an internal estimation. This procedure in the past has led to scandals mainly consisting in manipulations to influence the LIBOR in order (i) to have profits or (ii) not to show the critical internal situation. As already mentioned the unsecured LIBOR will be substituted for USD by SOFR which is secured with U.S. Treasuries as collateral (transactions with collateral are called

(15)

repurchase agreements, and this is a niche example with overnight secured rates). The clearing from LCH of SOFR based IRS started in July 2018.

Coming back to the overnight rate, we mention now the differences between the current EONIA and the future ESTER rate. EONIA is currently computed on a panel of data provided by 28 banks, it concerns the only interbank market and it is published by 7pm of the same working date (T convention). ESTER instead concerns the wholesale market, it is a rate computed on data of 52 banks (that satisfy the Mmsr criteria) on the only transactions involving more than 1 million EUR of notional and it is published with convention T+1 by 8am of the next day.

Dealing with the other main currencies, we mention that there will be respectively • the Secured Overnight Financing Rate (SOFR) for dollar

• TONAR for JPY • SARON for CHF • SONIA for GBP

2.3 Swaption

We introduce now the swaptions, which are options on IRS. In this contracts, the buyer is given the possibility to activate at time T (the so called expiry of the swaption) an IRS with maturity

TN and payments at T1, ..., TN at a predetermined rate r, which is the strike of the option. The

difference in time between the date of the last payment and the expiry is the so called tenor (which would correspond to TN − T ). Since a swaption is an option on a rate swap, we start by the value

of a swap. As we said, it is only dependent on the prices of ZCB, and in particular a swap of rate

r at time t is equal to Swap(t) = Notional N X n=1 p(t, Tn−1) − (1 − δr)p(t, Tn) .

In order to simplify, we take 1 as notional. At time T the value of such a swap will be

Swap(T ) = N X n=1 p(T, Tn−1) − (1 − δr)p(T, Tn) = = N X 1 p(T, Tn−1) − N X 1 p(T, Tn) − δr N X 1 p(T, Tn) = = p(T, T ) − p(T, TN) − δr N X 1 p(T, Tn).

We also know, by definition of ZCB, that P (T, T ) = 1 because it is the payoff of a zero coupon bond at maturity, so we have Swap(T ) = 1 − p(T, TN) − δrPN₁ p(T, Tn) The value of the swaption

(16)

options are derivatives that give rights but not obligations. In conclusion the price of the swaption with generic notional is

Swaption(t) = Notional Et Swap(T )₊ = Notional Et 1 − p(T, TN) − δr N X 1 p(T, Tn) + .

In our thesis we will focus on At The Money (ATM) European swaptions, in which the strike is chosen equal to the par swap rate (i.e. the rate that makes the value of the IRS equal to zero at inception).

(17)

3 The pricing framework

In this chapter we will present some models for short rates, and also some formulas to price ZCBs, swaps and swaptions in the respective framework. We will also discribe the model for FX dynamics that we use in this thesis, and we will conclude the section giving more information about the reasons behind the choice of a multicurve framework for forwarding and discounting.

3.1 The Vasicek model

Now we see in detail how to work with the model made by Vasicek, which has a great importance in financial engineering thanks to its analytical tractability and its mean-reversion property. Moreover, the fact that it allows negative rates is not considered anymore a drawback as it was in the past. In fact, after the crisis, also due to the massive impact of the action of the ECB known as quantitative easing, we have sometimes seen in the markets even negative rates. In Vasicek the short rate dynamics is modeled as

drt= k(θ − rt)dt + σdWt.

The parameters which appear have a particular meaning. The parameter k is usually called

mean-reversion speed (MRS). The MRS indicates how the process is fast in order to rebound towards the

asymptotic value. The higher is the MRS, the quicker is the process to get closer to the asymptotic value. The parameter θ represents the asymptoptic value of the short rate, i.e. the value towards which the short rate constantly tends. The parameter σ represents the volatility of the process. In case of zero volatility, the short rate is deterministic and it goes exponentially towards the asymptotic rate with rate k (in practice, it becomes the solution of an ODE).

In general if the dynamics was

drt= a(t, rt)dt + b(t, rt)dWt

then by Ito’s theorem we would get, for a function f = f (t, rt), the following

df = ∂f ∂t + ∂f ∂rt a(t, rt) + 1 2b 2_{(t, r} t) ∂2_f ∂r2 t dt + b(t, rt) ∂f ∂rt dWt.

Now, coming back to the use of Vasicek model, we consider this particular function f (t, rt) = ektrt

so we get df = kektrt+ ektk(θ − rt) dt + ektσdWt= = kθektdt + ektσdWt

Integrating between the times s and t leads to

ektrt= eksrs+ kθ Z t s ekudu + σ Z t s ekudWu.

So in this framework we know the solution of the previous SDE:

rt= rse−k(t−s)+ kθ Z t s e−k(t−u)du + σ Z t s e−k(t−u)dWu.

(18)

The short rate has a conditional Gaussian distribution:

E[rt|Fs] = rse−k(t−s)+ θ(1 − e−k(t−s)),

and its conditional variance is

V [rt|Fs] = E σ2 Z t s e−k(t−u)dWu 2 Fs = σ2 Z t s e−2k(t−u)du = σ2e −2kt 2k Z t s 2ke2kudu = =σ 2_e−2kt 2k (e 2kt − e−2ks) = σ 2 2k(1 − e −2k(t−s)_).

The next step is to find out which is the expression of the prices of zero coupon bonds. See also [Mamon] in order to see different approaches for this.

P (t, T ) = E[e−

RT

t rsds|F_t].

First , we start with a derivative:

∂P (t, T ) ∂rt = ∂ ∂rt E e− RT t rsds|F_t = E e− RT t rsds ∂ ∂rt (− Z T t rsds)|Ft .

Since we know that

rs= rte−k(s−t)+ kθ Z s t e−k(s−u)du + σ Z s t e−k(s−u)dWu

then this implies

∂(rs)

∂rt

= e−k(s−t).

By injecting this in the previous formula we get

∂P (t, T ) ∂rt = E[e −RT t rsds(−1) Z T t e−k(s−t)ds|Ft] = −( Z T t e−k(s−t)_ds)E[e− RT t rsds|F_t] = −A(t, T )P (t, T ). So far we got ∂P (t, T ) ∂rt = −A(t, T )P (t, T ) which means P (t, T ) = h(t, T )e−A(t,T )rt_. By posing f (t, T, rt) = e −Rt 0rsds_E[e− RT t rsds|F_t]

then, since f is a martingale, by Ito’s lemma we first deduce that

df = ∂f ∂t + ∂f ∂rt k(θ − rt) + 1 2 ∂2_f ∂r2 t σ2 dt + ∂f ∂rt σ dWt,

(19)

and then we know that for all martingales the coefficient in dt has to be zero. We know that

f (t, T, rt) = e

−Rt

0rsdsP (t, T ) so we can compute its partial derivatives:

∂f ∂t = e −Rt 0rsds(−r_t)P (t, T ) + e− Rt 0rsds ∂P (t, T ) ∂t ∂f ∂rt = e− Rt 0rsds ∂P (t, T ) ∂rt ∂2_f ∂r2 t = e− Rt 0rsds ∂2_{P (t, T )} ∂r2 t .

By combining all these partial derivatives and injecting them in the null coefficient of the martingale, we obtain the following SDE:

∂P (t, T ) ∂t − rtP (t, T ) + k(θ − rt) ∂P (t, T ) ∂rt +1 2σ 2∂2P (t, T ) ∂r2 t = 0

which has to be solved for the variable P(t,T). We already know that P (t, T ) = h(t, T )e−A(t,T )rt _so

we get ∂P (t, T ) ∂t = (∂th)e −A(t,T )rt_{+ h(t, T )e}−A(t,T )rt_(−∂ tA)rt ∂P (t, T ) ∂rt = h(t, T )e−A(t,T )rt_{(−A(t, T ))} ∂r2tP (t, T ) = h(t, T )A 2_{(t, T )e}−A(t,T )rt_.

By injecting these results in the PDE we get

(∂th)e−A(t,T )rt + h(t, T )e−A(t,T )rt(−∂tA)rt− rtP (t, T )+

+k(θ − rt)h(t, T )e−A(t,T )rt(−A(t, T )) +

1 2σ

2_{h(t, T )A}2_{(t, T )e}−A(t,T )rt _{= 0.}

Rearranging the terms, and knowing that P (t, T ) = h(t, T )e−A(t,T )rt _{we arrive to this PDE:}

∂th − ∂tArth − rth − k(θ − rt)hA + 1 2σ 2_hA2_{= 0.} If rt= 0 then ∂th − kθhA + 1 2σ 2_hA2_{= 0}

since P (T, T ) = 1, because it is the payoff of a ZCB at maturity, and we also know that h(T, T ) = 1 because A(T, T ) = 0 . So the function h(t, T ), which we already knew that does not depend on rt,

is the solution of the following ODE: (_{∂h(t,T )} ∂t + 1 2σ 2_A2_{(t, T ) − kθA(t, T ))h(t, T ) = 0} h(T, T ) = 1.

(20)

By solving this ODE, we get first the function h(t, T ) and then P (t, T ) = e(log(h(t,T ))−A(t,T )rt)_{. Now}

we are interested iin the pricing of an IRS, that corresponds to find the fair strike. We inject the expression that holds for the Vasicek model inside the formula for the IRS rate:

¯ r = P (t, T0) − P (t, TN) δPN n=1P (t, Tn) = h(t, T0)e −A(t,T0)rt_{− h(t, T} N)e−A(t,TN)rt δPN n=1h(t, Tn)e−A(t,tn)rt , where h(t, T ) = e ( 1_k[1−e−k(T −t) ]−(T −t))(k2 θ−σ2 /2) k2 − σ2 /k2 [1−e−k(T −t) ]2 4k and A(t, T ) = 1 k[1 − e −k(T −t)_].

Figure 1: Impact of the mean reversion speed, focusing on one sample at a time. The parameters for this simulation were θ = 0.03 and σ = 0.01. The initial short rate was r0= 0.01 and we considered

(21)

Figure 2: Impact of mean reversion speed on the term structure of ZCB prices in Vasicek. We used as parameters an initial rate = 2% and an asymptotic rate = 3%.

Figure 3: Impact of the asymptotic rate on the term structure of ZCB prices in Vasicek. We used as parameters an initial rate = 2% , MRS = 0.5 and σ=5%.

(22)

Figure 4: Impact of volatility on the term structure of ZCB prices in Vasicek. We used as parameters an initial rate = 2% , MRS = 0.5 and an asymptotic rate = 3%.

Figure 5: ZCB(t, T ) as a function of t in Vasicek, looking only at one sample. We used as parameters an initial rate = 2%, MRS = 0.1, an asymptotic rate = 3%, a maturity = 1Y and σ = 0.05.

(23)

3.2 G2++ with a single short curve

Here we describe a short rate model which contains two Gaussian factors instead of only one (as it was in Vasicek). The short rate is decomposed into the sum of two Vasicek processes plus a deterministic shift. The sum of two Vasicek processes is called a G2 process, but the one we consider is the so called G2++, which means that we added a deterministic shift to match discount and forward curves at the beginning. The parameters are two MRS and two volatilities, with no asymptotic values because all that information is included in the shift. This framework is more powerful and complete than a simple Vasicek because not only it can have different MRSs, but also it has an internal correlation between its two Brownian motions. That makes its distribution more complex and flexible. The short rate dynamics is r(t) = x(t) + y(t) + φ(t), where

(

dx(t) = −ax(t)dt + σdW1(t) dy(t) = −bx(t)dt + ηdW2(t).

The function φ(t) is just a deterministic shift.

Theorem 3.1. Let I(t, T ) be defined as I(t, T ) := R_tT(x(u) + y(u))du. Then it has a Gaussian

distribution I(t, T ) ∼ N (M (t, T ), V (t, T )) with mean and variance

M (t, T ) = 1 − e −a(T −t) a x(t) + 1 − e−b(T −t) b y(t) V (t, T ) = σ 2 a2 T − t +2 ae −a(T −t)₋ 1 2ae −2a(T −t)₋ 3 2a +η 2 b2 T − t +2 be −b(T −t)₋ 1 2be −2b(T −t)₋ 3 2b]+ +2ρση ab T − t +e −a(T −t)_{− 1} a + e−b(T −t)_{− 1} b − e−(a+b)(T −t)_{− 1} a + b .

Proof. We can compute the two terms in the sum separately, by a reasoning on symmetry they will

have the same structure. Z T t x(u)du = ux(u)|T_t − Z T t udx(u) = T x(T ) − tx(t) − Z T t udx(u) = = T x(T ) − T x(t) + T x(t) − tx(t) − Z T t udx(u) = (T − t)x(t) + Z T t (T − u)dx(u).

We inject the dynamics of the stochastic process x(t) that was dx(t) = −ax(t) + σdW1(t) so that

the term with an integral becomes Z T t (T − u)(−ax(u)du + σdW1(u)) = = −a Z T T (T − u)x(u)du + σ Z T t (T − u)dW1(u).

(24)

Since

x(u) = x(t)e−a(u−t)+ σ Z u

t

e−a(u−s)dW1(s)

then the previous expression is equal to −a( Z T t (T − u)x(t)e−a(u−t)du + σ Z T t (T − u) Z u t e−a(u−s)dW1(s)du).

Now we just show how to compute the first term −a(

Z T

t

(T − u)x(t)e−a(u−t)du) = −ax(t)eat(T Z T

t

e−audu−

Z T

t

ue−audu) = −ax(t)eat( T −a(e −aT_e−at_{)−F )} where F = Z T t ud(e −au −a ) = ue−au −a | T t + 1 a Z T t e−audu = te −at_{− T e}−aT a − 1 a2(e −aT _{− e}−at_).

Therefore what we were computing is equal to

T x(t)(e−a(T −t)− 1) + eat_x(t)te−at_{− T e}−aT ₋1 a(e −aT_{− e}−at_{) =} = −x(t)T (1 − e−a(T −t)) + tx(t) − T x(T )e−a(T −t)−x(t) a (e −a(T −t)_{− 1) =} = −x(t)(T − t) +x(t) a (1 − e −a(T −t)_).

This was the first term, the second term has slightly longer computations to do, but at the end we reach these formulas:

Z T t x(u)du = 1 − e −a(T −t) a x(t) + σ a Z T t

[1 − e−a(T −u)]dW1(u)

Z T t y(u)du = 1 − e −b(T −t) b y(t) + η a Z T t [1 − e−b(T −u)]dW2(u).

As a result, in this framework we know that I(t, T ) ∼ N (M (t, T ), V (t, T )). They stand respectively for M (t, T ) = 1 − e −a(T −t) a x(t) + 1 − e−b(T −t) b y(t) and V (t, T ) = σ 2 a2 T − t +2 ae −a(T −t)₋ 1 2ae −2a(T −t)₋ 3 2a +η 2 b2 T − t +2 be −b(T −t)₋ 1 2be −2b(T −t)₋ 3 2b + +2ρση ab T − t +e −a(T −t)_{− 1} a + e−b(T −t)− 1 b − + e−(a+b)(T −t)− 1 a + b .

(25)

We present now a theorem about the pricing of ZCB.

Theorem 3.2. Consider a G2++ model for the short rate. Then the price P (t, T ) of a ZCB is

equal to P (t, T ) = e− RT t φ(u)du−M (t,T )+ 1 2V (t,T )_. Proof. P (t, T ) = E e− RT t r(s)ds|F_t] = E[e− RT t (x(s)+y(s)+φ(s))ds|F_t = E e− RT t φ(s)ds−I(t,T )|F_t .

We can see that we have the exponential of a Gaussian variable, so it’s a lognormal random variable. Knowing the characteristic function of a Gaussian random variable

φX(u) = E[eiuX] = eiµu−

1 2σ

2_u2

we get that, in our case, E[e−I(t,T )] = φI(i) = ei

2_{M −}1 2i

2_V

= e−M +12V and this leads us to the

conclusion.

Now we proceed by computing the swap rate within a G2++ model.

¯ r = P (t, T0) − P (t, TN) δPN n=1P (t, Tn) = Φ(t, T0)e −M (t,T0)+12V (t,T0)_{− Φ(t, T} N)e−M (t,TN)+ 1 2V (t,TN) δPN n=1Φ(t, Tn)e −M (t,Tn)+1₂V (t,Tn) , where we put Φ(t, T ) = e− RT t φ(u)du.

3.3 Swaptions pricing in a multicurve framework and G2++

We have introduced up to now the building blocks which are necessary to understand the final formula for swaptions pricing in a G2++ framework. For more details on this, and also for the proof, see [Sabbioni]. IRS values can be written as

Swap(t, Tα, K) = Notional β X i=α+1 Φd(Si−1, Si) ΦF(Si−1, Si) Pd(t, Si−1− Pd(t, Si) − K β X i=α+1 τiPd(t, Ti) = = Notional β X i=α+1 Φd(Si−1, Si) ΦF(Si−1, Si)

Φd(t, Si−1)A(t, Si−1)e−M (t,Si−1)− β X i=α+1 Φd(t, Si)A(t, Si)e−M (t,Si)+ −K β X i=α+1 τiΦd(t, Ti)A(t, Ti)e−M (t,Ti) .

A swaption is ATM when the strike is chosen in order to have NPV equal to zero of the correspondent IRS. This means that the strike of an ATM swaption is

K = Pβ i=α+1 Φd(Si−1,Si) ΦF(Si−1,Si)Pd(t, Si−1− Pd(t, Si) Pβ i=α+1τiPd(t, Ti) .

(26)

In the following lines we will write swaps and swaptions prices in a more concise way. We introduce a new notation:

d1_i = Φd(Si−1, Si) ΦF(Si−1, Si)

Φd(t, Si−1)A(t, Si−1)

d2i = −Φd(t, Si)A(t, Si)

d3_i = −KτiΦd(t, Ti)A(t, Ti).

In this way, we can sum the terms which multiply the same exponentials in previous sums (i.e.

e−M (t,Ti)_{). For each time t}

j ∈ Γ, we define cj =P

3

k=1d k

j. Nonetheless we precise that, for each

cj the three terms are always present only if the timing for payements in the floating and fixed

legs are exactly the same, otherwise, for each time tj in the union of dates (for floating and fixed

payments) we just sum on the terms dk

i which effectively appear in the stream of payments. With

this notation we can write more concisely the swap as Swap(t, Tα, K) = Notional X j∈Γ cje−M (t,tj)= X j∈Γ cje−B(a,t,tj)x(t)−B(b,t,tj)y(t)) . Swaption(t, Tα, K) = Notional Pd(0, Tα) Z +∞ −∞ e−1/2(x−µxσx ) 2 σx √ 2π X j∈Γ λj(x)ekj(x)Φ(−hj(x)) dx,

where we used these expressions

hj(x) = ¯h + B(b, T, tj)σy q 1 − ρ2 xy ¯ h = y(x) − µ¯ y σy q 1 − ρ2 xy − ρxy(x − µx) σx q 1 − ρ2 xy λj(x) = cje−B(a,t,tj)x kj(x) = −B(b, T, tj) µy− 1 2(1 − ρ 2 xyσ 2 yB(b, T, tj) + ρxyσy x − µx σx ) µx= −MxTα(0, Tα) µy= −MyTα(0, Tα)

and ¯y(x) is the only root of

X

j∈Γ

cje−B(a,Tα,tj)x(t)−B(b,Tα,tj)¯y(x)= 0.

We mention here that the algorithm we used in our implementation to find the root of that (nonlinear) function was the so called irrational Halley’s algorithm (see [Salmi Noorani, Shloof] for details). In our algorithms we used at the end a Gaussian numerical quadrature, which led to implement this formula for swaptions prices:

(27)

Swaption(t, Tα, K) = Notional Pd(0, Tα) π n X k=1 ωk X j∈Γ λj(xk)ekj(xk)Φ(−hj(xk))

where {xk}k are n Gaussian nodes, and {ωk}k are the correspondent Gaussian weights used in a

numerical quadrature.

3.4 Approximated price of swaptions for Gaussian models

Here we just recall a results that was initially suggested in [Schrager] and then well described and deepened in [Di Francesco]. We warn that this last paper was written in a single curve framework. The idea behind this approximation is that there is a random variable with very small changes in expected values in time which is considered a martingale (therefore its expected value is approximated by its value in zero). Let’s consider a Gaussian framework with n factors. The correct pricing would be:

Swaption(0, T , tk, K, N ) = Notional P (0, T ) R Rn 1 −Pk j=1cjA(T, tj)e− Pn i=1Bi(T ,ti)xi + g(x1, ..., xn)dx1...dxn for some coefficients {cj}. In particular g(.) is the density function of a n-dimensional Gaussian

random variable with mean (−M1(0, T ), ..., −Mn(0, T )) and covariance matrix given by

Cij(T ) = n X i,j=1 Z T 0

σi(u)σj(u)ρi,je−ai(T −u)e−aj(T −u)du

In general if we take constant vols (like in B& S framework) the result gets simpler:

Cij(T ) =

σiσj

ai+ aj

ρi,j(1 − e−(ai+aj)T)

Now we show instead the approximated pricing. This formula just applies for ATM European swaptions. Swaption(0, T , tk, K, N ) = Notional V OL √ 2π k X i=1 τiP (0, ti) =: Notional V OL √ 2πP tk t1 where V OL = v u u t n X i,j=1 Z T 0

σi(u)σj(u)ρi,jAiAje(ai+aj)udu

and Ai = e−aiT P (0, T ) Ptk t1 − e−aitkP (0, tk) Ptk t1 − K k X j=1 e−aitj_τ i P (0, tj) Ptk t1 .

We remark here that this paragraph (and in particular the definition of the {Ai}i) is

single-curve-based.

When we consider constant volatilities we end up with a simplified version for the vol term:

V OL = v u u t n X i,j σiσjρi,jAiAj e(ai+aj)T− 1 ai+ aj .

(28)

3.5 Black’s formula for swaptions

Instead of modeling the short rate dynamics, which is sometimes unnecessary, an alternative could be to consider a dynamics for the swap rates. This model is attributed to Black (see [Black]), and the consequent formula is known as Black’s formula for swaptions pricing. The assumption is that a swap rate follows a lognormal dynamics:

dr(t) = r(t)σ(t)dW (t)

with numeraire S(t). In particular, the brownian motion is a BM with respect to the risk-neutral measure associated to that numeraire. In case of constant volatility (as in Black-Scholes) the price of the swaption is Swaption(t) = Notional S(t)[R(t)N (d1) − KN (d2)] with d1= 1 σ√T − t log R(t) K +σ 2 2 (T − t) d2= d1− σ √ T − t

where K is the strike rate and R(t) is the forward starting swap rate.

3.6 FX rates modeling

In this section we will introduce a first description of the dynamics of FX rates. A foreign exchange rate is a positive valued process that indicates at each time how many units of foreign currency are equivalent to just holding one unit of the domestic currency. For example, the EURUSD exchange rate is the number of EUR (foreign currency) that correspond to one USD (domestic currency). We can distinguish between intrinsic and implicit FX rates: we will call "intrinsic" an exchange between the domestic currency and a foreign one, and we will call "implicit" the exchanges between two foreign currencies. Of course, by definition and consequently by non-arbitrage condition,the entire market is completely already defined by the only intrinsic exchanges. Because of this fact, only intrinsic FX rates have to be modeled, while the implicit rates have the correspondent values that come from non-arbitrage. If we call φi

1 the FX rate from domestic currency to currency i,

this is the formula that shows how every implicit exchange rate can be written as a function of two intrinsic ones:

φi_j= φ

i

1 φj₁.

Finally, an additional distinction can be done between spot and forward rates. Spot rates express the immediate current exchange rate between currencies, while a forward FX rate is an agreement between two intermediaries that fix now the exchange rate they will use in the future at a maturity date between two currencies. By non-arbitrage, the forward FX rate is linked to the spot FX rate through the prices of ZCB in the two currencies. Let’s see how to use the non-arbitrage principle:

Lemma 3.1. Let the intrinsic exchange rate (from domestic to foreign currency nr. i) dynamics

be modeled by

dφi1(t) = φ

i

1(t)[µφi

(29)

We consider the tenor b for the forwarding curve, and we take the OIS as discount curve. Moreover we indicate with rois

1 (t) the short rate in the domestic currency for discounting, and with rbi(t) the

short rate in the foreign currency for forwarding under the tenor b. Then we get

µ_φi 1(t) = r ois 1 (t) − r b i(t).

Proof. First, we consider the zero-coupon bonds dynamics respectively in the domestic and the

foreign market:

dBd(t) = rd(t)Bd(t)dt

dBf(t) = rf(t)Bf(t)dt.

After that, we have to keep into account the dynamics of the exchange rate (foreign to domestic currency, so in particular it is an intrinsic exchange rate):

dφ(t) = φ(t)[µ(t)dt + σ(t)dW (t).]

By definition of exchange rate, since the currencies are related by this rate, then we know that

Bd(t) = Bf(t)φ(t). Let’s now compute the differential of the domestic bank account, using the Itô

decomposition:

dBd(t) = d(Bfφ)(t) = dBf(t)φ(t) + Bf(t)dφ(t)+ < dBf, dφ >

where the crochet is zero because dBf is Ft-measurable. We recall briefly the definition of the

crochet. Given two stochastic processes (in this case Bf and φ), their crochet is the expected value

of the product of their differentials. In symbols:

< dBf, dφ >= E dBf(t)dφ = E (rf(t)Bf(t)dt)(φ(t)[µ(t)dt + σ(t)dW (t)]) .

In this case it is equal to zero, because the first differential (dBf) has no term in dWt, and it

is therefore already Ft-measurable at time t (even though it is still a random variable because it

depends on Bf(t) which is the solution of an SDE driven by the short rate, and the short rate is

a stochastic process). From now on we will denote by φi

j the FX rate between the two currencies

i and j, in the sense that an amount of money Xi in the currency i is equivalent to φijXj in the

currency j.

dBd(t) = φ(t)[rf(t)Bf(t)]dt + Bf(t)φ(t)[µ(t)dt + σ(t)dW (t)] =

= Bf(t)φ(t)[rf(t) + µ(t)]dt + Bf(t)φ(t)σ(t)dW (t).

Moreover we already know that dBd(t) = rd(t)Bd(t)dt (because ZCB are theoretically riskfree,

hence by no arbitrage they have as instantaneous return exactly the short rate with no additional stochastic component), so the drift terms have to be equal.

rd(t)Bd(t)dt = Bf(t)φ(t)[rf(t) + µ(t)]

rd(t)Bd(t) = Bf(t)φ(t)[rf(t) + µ(t)].

Since, by definition of the exchange rate, we know that Bd(t) = Bf(t)φ(t), by injecting this in the

previous formula we finally get

µ(t) = rd(t) − rf(t) = rois− rb

(30)

Here we remark that the discounting curve rois _{is unique (because of no-arbitrage), while we}

put a generic index b to indicate the tenor of the forwarding curve. We recall, in fact, that in a multicurve framework as the one we are considering (and which is the most used by practitioners in trading after the crisis) the forwarding curves are several, one per tenor. Let’s now prove the formula on the implied volatility. In order to do so, we first recall a useful result:

Theorem 3.3. Let Xt be a lognormal stochastic process. Then the following formula holds:

Var logXT Xt |Xt = Z T t Var dX(u) X(u) .

Proof. Suppose the lognormal diffusion is dXt= Xt(µdt + σdWt) For this dynamics, we know the

strong solution: XT = Xte(µ−

σ2

2(T −t))+σ(WT−Wt) _{then we compute the variance on the left of our}

formula: Var logXT Xt |Xt = Var logXT Xt |Xt, Wt = Var (µ − σ2)(T − t) + σ(WT− Wt)|Xt, Wt = = Var[σZT −t] = σ2(T − t)

because we set ZT −t∼ N (0, T − t). Let’s now see that the term on the right part of the formula is

equivalent: Z T t Var(dXu Xu ) = Z T t Var[µdu + σdWu] = Z T t σ2du = σ2(T − t).

Let’s compute now the implied volatility for a framework based on Vasicek-modeled rates (with known parameters) and lognormal-diffused exchanges. We denote with kx and ky the two MRSs,

and with σx and σy the two volatilities of the short rates (respectively for domestic and foreign

currencies). We call ρxy the correlation between short rates.

Theorem 3.4. Let the spot exchange rate evolve according to

dS(t) = S(t)[µ(t)dt + σS(t)dZs]

with piecewise-constant (between consecutive maturities) local volatility and let the dynamics of the bonds in both the domestic and foreign market be respectively

dB1= B1[µ1dt + σ1(t, T )dZ1] dB2= B2[µ2dt + σ2(t, T )dZ2].

Assume ˆνh−1 is the current implied FX volatility observed in the market for maturity Th−1 and ˆνh

for maturity Th, with ∆Th = Th− Th−1. If we assume that there is no correlation between rates

and FX, then we get

σ2_S(Th)∆Th= ˆνh2Th− ˆνh−12 Th−1− σ2 x k2 x ∆Th− 2 e−kxTh−1− e−kxTh kx +e −2kxTh−1− e−2kxTh 2kx + −σ 2 y k2 y ∆Th− 2 e−kyTh−1 _{− e}−kyTh ky +e −2kyTh−1_{− e}−2kyTh 2ky ] + 2ρxy σxσy kxky [∆Th− e−kxTh−1_{− e}−kxTh kx + −e −kyTh−1 _{− e}−kyTh ky +e −(kx+ky)Th−1_{− e}−(kx+ky)Th kx+ ky .

(31)

Proof. We can start from the interest rate parity: F (t, T ) = S(t)B2(t, T )/B1(t, T ) where F (t, T )

denotes the forward exchange rate. Now we apply the Ito formula:   x y z  =   S(t) B1(t, T ) B2(t, T )   We set f (   x y z 

, t) = xz_y , that will be the formula on which we apply the Ito lemma. Let’s first

compute its partial derivatives: ∂ft= 0, ∂xf = z_y, ∂yf = −xz_y2, ∂zf = x_y. So we get

dF = (...)dt +B2 B1 − SB2 B2 1 S B1   σsS 0 0 0 σ1B1 0 0 0 σ2B2     dZs dZ1 dZ2  

so, by injecting these values, we get Var(dF

F ) = Var[σsdZs− σ1(t, T )dZ1+ σ2(t, T )dZ2] =

= (σ2_s+ σ2₁+ σ2₂− 2ρs1σSσ1+ 2ρs2σsσ2− 2ρ12σ1σ2)ds.

Since we did the assumption that there is no correlation between noise in rates processes and in FX dynamics, the only non-zero correlation is ρ12Finally we get

Var(dF F ) = (σ 2 S(s) + σ 2 1+ σ 2 2− 2ρ12σ1σ2)ds

and by integrating we obtain ˆ νh2Th= Z T 0 V ar(dF F ) = Z T 0 (σ2_S(s) + σ2₁+ σ2₂− 2ρ12σ1σ2)ds.

From this formula, the only unkwnown is the FX volatility σφ. It is a function, actually, but by

using the simplifying assumption that it is piecewise constant, we can compute it on each small interval by subdividing the integral of an unknwown function as the integral of several constants on small intervals of time. The explicit formula to do this task will be:

σ2_φ∆Th= ˆνh2Th−νh−1ˆ 2Th−1− σ2 x k2 x [∆Th− 2 e−kxTh−1_{− e}−kxTh kx +e −2kxTh−1_{− e}−2kxTh 2kx ]+ −σ 2 y k2 y [∆Th− 2 e−kyTh−1_{− e}−kyTh ky +e −2kyTh−1 _{− e}−2kyTh 2ky ] + 2ρxy σxσy kxky [∆Th− e−kxTh−1_{− e}−kxTh kx + −e −kyTh−1− e−kyTh ky +e −(kx+ky)Th−1− e−(kx+ky)Th kx+ ky ].

Thanks to this theorem we can retrieve the values for a piecewise constant volatility model of the FX rate from options prices in a Vasicek model for short rates.

(32)

3.7 G2++ with several rates

Now we will see how the previous constraints change when the dynamics of the interest rates is based a two Gaussian Factor model: the so called G2++. The dynamics for the two rates (domestic and foreign are respectively)

     r1(t) = x1(t) + y1(t) + φ1(t) dx1(t) = −a1x1(t)dt + σ1dW11,t dy1(t) = −b1x1(t)dt + η1dW12,t      r2(t) = x2(t) + y2(t) + φ2(t) dx2(t) = −a2x2(t)dt + σ2dW21,t dy2(t) = −b2x2(t)dt + η2dW22,t

where the correlations between the brownian motions are

< dW11,t, dW12,t>= ρ1dt < dW21,t, dW22,t>= ρ2dt < dW11,t, dW21,t>= ρ11dt < dW11,t, dW22,t>= ρ12dt < dW12,t, dW21,t>= ρ21dt < dW12,t, dW22,t>= ρ22dt.

Here we remark that by introducing all these correlations we are increasing the complexity of the model with respect to a simpler Vasicek. Moreover, while in Vasicek a correlation between the BM of two currencies was already the correlation between the two rates, here the situation is more complicated: all the correlation parameters are just between single BMs, and the correlation between two rates is a combination of correlations among BMs.

We know that under these dynamics the zero-coupon bond have these values:

P (t, T ) = Φ(t, T )A(t, T )e−B(a,t,T )x(t)−B(b,t,T )y(t)

for each currency, and the auxiliary functions in the previous formula are the following:

B(z, t, T ) = 1 − e −z(T −t) z A(t, T ) = eV (t,T )/2 Φ(t, T ) = e− RT t φ(s)ds V (t, T ) = σ 2 a2 T − t + 2 ae −a(T −t)₋ 1 2ae −2a(T −t)₋ 3 2a + +η 2 b2 T − t + 2 be −b(T −t)₋ 1 2be −2b(T −t)₋ 3 2b + +2ρση ab T − t +e −a(T −t)_{− 1} a + e−b(T −t)− 1 b − e−(a+b)(T −t)− 1 a + b .

(33)

Theorem 3.5. Let the rates dynamics be a two factor-gaussian model and let the exchange rate

dynamics be lognormal diffusive. Then the implied volatility (of an ATM option on the spot exchange rate) is

ˆ

νh2Th=

Z Th

0

φ2_sdt + Var(σ1, a1, 0, Th) + Var(η1, b1, 0, Th) + Var(σ2, a2, 0, Th) + Var(η1, b2, 0, Th)+

+2Covar(ρ1, σ1, η1, a1, b1, 0, Th) + 2Covar(ρ2, σ2, η2, a2, b2, 0, Th)+

−2Covar(ρ11, σ1, σ2, a1, a2, 0, Th) − 2Covar(ρ12, σ1, η2, a1, b2, 0, Th)+

−2Covar(ρ21, η1, σ2, b1, a2, 0, Th) − 2Covar(ρ22, η1, η2, b1, b2, 0, Th)+

+2MixCovar(ρs,11, σs, σ1, a1, 0, Th) + 2MixCovar(ρs,12, σs, η1, b1, 0, Th)+

+2MixCovar(ρs,21, σs, σ2, a2, 0, Th) − 2MixCovar(ρs,22, σs, η2, b2, 0, Th).

Remark: for the notations, see the proof below.

Proof. Given the rates dynamics, we recall for the moment that in a G2++ model the dynamics of

the ZCB are

(dP1(t,T )

P1(t,T ) = r1(t)dt − σ1B(a1, t, T )dW11,t− η1B(b1, t, T )dW12,t

dP2(t,T )

P2(t,T ) = r2(t)dt − σ2B(a2, t, T )dW21,t− η2B(b2, t, T )dW22,t

respectively for the domestic and the foreign currencies. We can apply now the Ito formula:   x y z  =   S(t) P1(t, T ) P2(t, T )  . We set f (   x y z  , t) = xz

y , that will be the formula on which we apply the Ito lemma. Let’s first

compute its partial derivatives: ∂ft= 0, ∂xf = z_y, ∂yf = −xz_y2, ∂zf = x_y. So we getdF = (...)dt+

_P 2 P1 − SP2 P2 1 S P1   0 0 0 0 σsS −σ1B(a1, t, T )P1 −η1B(b1, t, T )P1 0 0 0 0 0 −σ2B(a2, t, T )P2 −η2B(b2, t, T )P2 0         dW11 dW12 dW21 dW22 dWs      

Since we are interested in the variance, we can drop all the terms in dt and we get: Var(dF

F ) = Var[σsdZs+σ1B(a1, t, T )dW11,t+η1B(b1, t, T )dW12,t−σ2B(a2, t, T )dW21,t−η2B(b2, t, T )dW22,t] =

= [σ_s2+ σ₁2B2(a1, t, T ) + η21B 2_(b 1, t, T ) + σ22B 2_(a 2, t, T ) + η22B 2_(b 2, t, T )+ +2ρ1σ1η1B(a1, t, T )B(b1, t, T ) + 2ρ2σ2η2B(a2, t, T )B(b2, t, T )+

−2ρ11σ1σ2B(a1, t, T )B(a2, t, T ) − 2ρ12σ1η2B(a1, t, T )B(b2, t, T )+

(34)

+2ρs,12σsη1B(b1, t, T ) − 2ρs,21σsσ2B(a2, t, T ) − 2ρs,22σsη2B(b2, t, T )]dt.

Just to clarify the notation, with ρs,11 we mean the correlation between dZs and dW11, and with ρs,21 we mean the correlation between dZs and dW21, and so on. Since we have to integrate all

these expressions in order to get the implied volatility, we present first three auxiliary formulas: Z T S B(z, t, T )dt = Z T S 1 − e−z(T −t) z dt = T − S z − 1 z Z T S e−z(T −t)dt = T − S z − e−zT z2 (e zT − ezS) = T − S z − 1 − e−z(T −S) z2 = 1 z[T − S − B(z, S, T )] (1)

and the second one is the integral of the square Z T S B2(z, t, T )dt = Z T S 1 − 2e−z(T −t)+ e−2z(T −t) z2 dt = = 1 z2(T − S) − 2 z2e −zT1 z(e zT _{− e}zS_{) +} 1 z2e −zT 1 2z(e 2zT _{− e}2zS_{) =} =T − S z2 − 2(1 − e−(T −S)) z3 + 1 − e−2z(T −S) 2z3 = 1 z2[T − S − 2B(z, S, T ) + B(2z, S, T )] (2)

and the third one is mixed: Z T S B(z1, t, T )B(z2, t, T )dt = Z T S 1 − e−z1(T −t) z1 1 − e−z2(T −t) z2 dt = T − S z1z2 − 1 z1z2 Z T S e−z2(T −t)_{dt −} 1 z1z2 Z T S e−z1(T −t)_{dt +} 1 z1z2 Z T S e−(z1+z2)(T −t)_{dt =} = 1 z1z2 [T − S − B(z1, S, T ) − B(z2, S, T ) + B(z1+ z2, S, T )]. (3)

We introduce now the notation: Var(σ, z, S, T ) = σ2 Z T S B2(z, t, T )dt = σ2 1 z2[T − S − 2B(z, S, T ) + B(2z, S, T )] and also Covar(ρ, σ, η, z1, z2, S, T ) = ρση Z T S B(z1, t, T )B(z2, t, T )dt = = ρση 1 z1z2[T − S − B(z1, S, T ) − B(z2, S, T ) + B(z1+ z2, S, T )] and MixCovar(ρ, σ1, σ2, z, S, T ) = ρσ1σ2 Z T S B(z, t, T )dt = ρσ1σ2 1 z[T − S − B(z, S, T )]

so at the end the final formula for implied volatility is:

ˆ

νh2Th=

Z Th

0

Artificial neural networks for interest and exchange rates models calibration. An approach based on FX implied volatilities and swaptions prices

POLITECNICO DI MILANO

School of Industrial and Information Engineering

Master of Science in Mathematical Engineering

Artificial neural networks for interest

and exchange rates models calibration

An approach based on FX implied volatilities and swaptions prices

Supervisor: Prof. Marcello Restelli

Co-supervisor: Luca Sabbioni

Candidate:

Elia Mazzoni

Matr. 899709

Contents

Ringraziamenti

Acronyms

Sunto

Abstract

1

Introduction

1.1

The world of derivatives

1.2

Notions of stochastic calculus

1.3

Short rate modeling

2

Rates derivatives

2.1

Interest rate swap

2.2

A new reform of benchmark rates

2.3

Swaption

3

The pricing framework

3.1

The Vasicek model

3.2

G2++ with a single short curve

3.3

Swaptions pricing in a multicurve framework and G2++

3.4

Approximated price of swaptions for Gaussian models

3.5

Black’s formula for swaptions

3.6

FX rates modeling

3.7

G2++ with several rates