• Non ci sono risultati.

Chapter 2 Stocks Analysis

N/A
N/A
Protected

Academic year: 2021

Condividi "Chapter 2 Stocks Analysis"

Copied!
17
0
0

Testo completo

(1)

Chapter 2

Stocks Analysis

In this chapter we show the results of the first part of our analysis. We have focused our attention on the analysis of price time series looking for correlations among shares. To this purpose, we have extracted from our database only the information about price and date-time for every share, building up as many time series as different shares. For statistical reasons, we only have looked at price time series of the  shares listed in table 2.1; among the stocks composing the Ibex-35 index these are the most traded in the Spanish stock market during the period of examination. From now on, these shares form our “portfolio”.

2.1

Probability Distributions and Price Returns

Following the same line of an earlier analysis[3] performed on the NYSE, first of all we focused our attention on the main statistical properties of the price time series such as the probability distributions, returns, cross-correlations and so on. Before that, we introduce same basically concepts of economical analysis.

In economical time series analysis there are different stochastic variables which can be investigate to extract statistical information. The most immediate and simple variable is the

(2)

price change                (2.1) where

   is the price of a financial asset at time . This variable is useful because it does not introduce nonlinear transformation of the data, but it is influenced by changes in price scale. In a financial market the price unit of a stock is usually the currency of the country, and this value is not constant in time. Many factors can change the value of the currency, among all we recall inflation, economic state (growth/recession) and random fluctuations in the global currency market. Another variable which takes into account the differences among prices of stocks is the price return

                           (2.2)

This variable is especially suitable to compare the price time series of different stocks, be-cause it provide a direct percentage of price variations in a given time period. However, for large time horizons price returns are affected by changes of the price scale. The most common choice is the so called logarithmic price return,

   defined as                                   (2.3)

which represents the per cent variation of a good price after a time interval, , in logarithmic scale. This variable incorporates the average correction of price scale changes without re-quiring detrending functions. This choice introduces, however, a nonlinear transformation of the data and this deeply affects the statistical properties of the time series. Another problem is that the growth rate of the economy is not constant, generally it fluctuates, and this aspect is not corrected by 



  . In any case 



  seems to be the best variable for the aim of this thesis, in which we analyse the short and long time horizon statistical properties of the price time series and cross-correlation among them.

All the stochastic variables defined above have the same behaviour for short time hori-zons. For high-frequency data, is small and 



   



   . Hence from 2.2 and 2.3

                               (2.4)

(3)

2.1 PROBABILITY DISTRIBUTIONS ANDPRICE RETURNS 18

Code Extended name Sector Transactions

TEF Telefonica Telecommunications 5720506

TRR Terra Networks Telecommunications 2619468

BBVA Banco Bilbao Vizcaya Argentaria Banking 2562916

SAN Banco Santander Banking 1743947

REP Repsol YPF Petroleum Refining 1630826

ELE Endesa Energy producing 1614251

ACR Aceralia Building 1581845

ZEL Zeltia Biotechnology 1011903

VAL Sacyr-Valle Building 990013

AMS Amadeus class A Tourist 814997

ALT Altadis Drug Manufacturing 775900

IBE Iberia Communications 742237

TPI TPI-Amarillas Telecommunications 722462

POP Banco Popular Banking 585404

ALB Corporaci´on Financiera Alba Telecommunications 557531

SGC Sogecable Telecommunications 533764

UNF Union Fenosa Energy production 525525

IDR Indra Sistemas Telecommunications 510519

TPZ Tele Pizza Restoration 497935

FCC Fomento de Constucciones Building 492219

ACS Actividades de Consrtucci´on Building 469705

PRS Prisa Telecommunications 405056

BKT Bankinter Banking 403933

FER Grupo Ferrovial Transports 382377

ACE Abertis Building 341754

DRC Grupo Dragados Building 338961

NHH NH Hoteles Tourist 319508

ACX Acerinox Building 309897

ANA Accinosa Building 304348

SOL Sol Melia Tourist 303691

Table 2.1: Table of the selected thirty stocks among the ones composing the Ibex-35

index. Here we report the market symbol, the name of the company, the sector, and their number of transactions during the period examined.

Efficient Market Hypothesis vs Real Markets

According to economic classical models, markets are regulated by the efficient market hy-pothesis [4]. These models, formulated for the first time in 1960’s by Samuelson are a fundamental assumption about the market. Nowadays they are the most accepted paradigm among scholars in finance.

(4)

the participants quickly and comprehensively obtain all information relevant to trad-ing.

it is liquid. This means that an investor can easily buy or sell a financial product at any time. The more liquid the market is, the more secure it is to invest. The investor knows that he can always cash-in her assets. This easy exchange between money and financial products raises the attractiveness of the market. On a ‘mature’ liquid market, the myriad of transactions efficiently balance the decision of a single investor (or of a small group of investors) so that individual purchases or sales are possible at any time without destabilising the asset price.

there is a low market friction. Market friction is a collective expression for all kinds of trading costs. These include traders provisions, transaction costs, taxes, bid-ask spreads, i.e., differences in the prices that an investor obtains (bid-price) when sell-ing or has to pay (ask-price) when buysell-ing, etc. The sum of these costs is negligible compared with the transaction volume under the hypothesis of low market friction. A market with these properties “digest” the new information so efficiently that all the cur-rent information about the market development is, at all times, completely contained in the present price. No advantage is gained by taking into account all or part of the previous price history. In other words, the expected value of the price of a given asset at time  , 



is related to the previous values   



    through the relation                  (2.5)

which means that the best estimate of the price at the time  is the current price. Stochastic processes obeying this conditional probability are called “martingales”.

In a real market, residual inefficiencies are always present so that price returns are dif-ficult if not impossible to predict if one starts from the time series of prices returns. In economics, people uses the geometric Brownian motion as a model to study the stock price changes because it implies the Efficient market hypothesis. In a geometric Brownian motion, the differences of the logarithms of prices are Gaussian distributed and this model satisfies eq 2.5 but it is known to provide only a first approximation of what is observed in real data.

(5)

2.1 PROBABILITY DISTRIBUTIONS ANDPRICE RETURNS 20

Nevertheless, this model is one of the most used nowaday to estimate the value of options, futures, etc.

In a real analysis, the pdf of price returns shows some “universal” aspects. By “universal” aspects we mean that they are observed in different financial markets at different scales of time, provided that a sufficiently long time period is used in the empirical analysis. The first of these “universal” or stylised facts is the leptokurtic nature of the pdf. A leptokurtic distribution is a pdf symmetrical in shape, similar to a normal (Gaussian) distribution, but where the central peak is much higher; that is, there is a higher frequency of values near the mean. In addition, a leptokurtic distribution is characterised by thick tails as we can see in Figure 2.1. Leptokurtic pdfs have been observed in the stocks and indices time series

-0.2 -0.1 0 0.1 0.2 x 0 0.1 0.2 0.3 0.4 0.5 P(x) Gaussian distribution leptokurtic distribution

Figure 2.1: Example of a high leptokurtic distribution compared with a Gaussian

dis-tribution.

by analysing high-frequency and daily data. The origin of the observed leptokurtosis is still debated. There are several models trying to explain it. One of these models assumes that a non-Gaussian behaviour occurs as a result of the uneven activity during market hours[5]. In real data analysis, a Gaussian pdf has never been observed, and for this reason in economic

(6)

literature we only find studies connected with deviations of data pdf from a Gaussian pdf, with kurtosis and leptokurtosis calculations.

Among the alternative models proposed, the most revolutionary is Mandelbrot’s hy-pothesis that price changes follow a L´evy stable distribution[6]. L´evy stable processes are stochastic processes fulfilling the generalised central limit theorem[6, 7]. By fulfilling the generalised form of the central limit theorem, they have a number of interesting properties. They are stable (as are the more common Gaussian processes), i.e., the sum of two inde-pendent stochastic processes characterised by the same L´evy distribution of index is itself a stochastic processes characterised by a L´evy distribution of the same index. The shape of the distribution is maintained by summing up independent identically distributed L´evy stable random variables. A mathematical explanation of L´evy stable distributions and mem-ory in time series is given in appendix A. In the last years, a correction to the Mandelbrot’s hypothesis has been found. Prices seem to change following a L´evy stable distribution only at short values, while they follow an inverse power law with exponent equal to three at large values.

2.1.1

Empirical returns distributions

After these considerations, we have started our data analysis. We have calculated the log-arithmic returns time series for each of the stocks composing our portfolio, see Table 2.1, at different time intervals         



, where the indices  



mean respectively minutes and day, for the period taken into examination (june 2000 - december 2002). Before investigating the statistical properties of the time series of the stocks composing our portfolio and comparing them, we have to give consideration to another aspect. Different stocks can-not be directly compared due to, for example, their different price fluctuations, their different exchange frequencies, or their different non stationary long time horizon behaviour. In other words, we need to normalize our time series. This is done transforming the logarithmic price time series into a new series whose elements have null mean and unitary variance. To do that we define the normalized logarithmic price return for the stock and the time interval as

                   (2.6)

(7)

2.1 PROBABILITY DISTRIBUTIONS ANDPRICE RETURNS 22

From now on, the angular brackets indicate the standard time average over all the trading time intervals  Total minutes   within the investigate time period:

                   (2.7)

In our case the total minutes of the investigate period are   . The

 



    is the standard deviation, i.e., the root square of the variance defined as:                                 (2.8)

With this choice, one can easily verify for the mean that (dropping the indices       to simplify the notation)

                           (2.9)

and for the variance from 2.6 and 2.9 one has that

                                        (2.10)

At this point we have built the normalised logarithmic price returns





for all the stocks at a time interval   . We have then calculated the probability density function 







comparing it with a Gaussian pdf. As one can see in figure 2.2, the prices returns distribution moves away from a Gaussian pdf, going toward what seems to be more similar to an inverse power law; it is almost symmetric with thick tails. This empirical distribution exhibits also a high leptokurtosis, which is a typical feature of the return pdfs of other studied markets. The central peak of the pdf, corresponding to null prices returns, has been produced by the low statistics of some stocks which not always change their price every five minutes, causing a large number of zeros in the returns time series. This figure is similar to that of [3,page 69] done for the analysis of Standard and Poor’s   stock price index (S&P   ) changes. In order to compare the statistical properties of the returns pdf for the Spanish Ibex-35 index with those of S&P   , we have calculated the cumulative probability distribution 

   defined as                          (2.11)

(8)

-6 -4 -2 0 2 4 6 r~τ 10-2 10-1 100 P(r ~ )τ Ibex-35 returns Gaussian pdf

Figure 2.2: Probability density function for high-frequency Ibex-35 prices returns with the Gaussian pdf (dashed line). The returns are calculated for 

 

, and here we can see the evident difference between the two distributions.







 takes values between zero and one, and is monotonically increasing with





. In figure 2.3 we plot in Log-Log scale the function 

        . We use      because this function shows better the probability of observing rare events, i.e., return changes larger than a fixed value. We mentioned that Mandelbrot suggested that a possible pdf could be in the form of a L´evy distribution, see Appendix A. A L´evy distribution would have an exponent



) . We have found an inverse power law behaviour for both positive and negative tail of  





 , with an exponent   )  $   that is almost similar to the one, found in [3,page 75]     $    , implying that this distribution is not of L´evy type. This result also proves that the second moment of the returns is finite, as also found in[8, 9, 10, 11]

This statistical similarity between the return pdfs of the two markets is encouraging for us, and it justifies us to use of the same correlation analysis techniques also successfully used for NYSE. In other words, the Spanish financial markets fulfils the most common statistical properties of the other financial markets. These statistical properties are a leptokurtic shape

(9)

2.2 CORRELATION ANALYSIS 24 100 101 r~τ 10-5 10-4 10-3 10-2 10-1 F(r ~ )τ Positive Tail Negative Tail Lévy Regime

Figure 2.3: Cumulative probability distribution for high-frequency Ibex-35 prices returns ( 

 

), plotted in Log-Log scale.   



 

exhibits an inverse power-law behaviour with the same exponent for both positive and negative tail. The exponent is

    



that is larger than  , which is the maximum value compatible with L´evy processes. In the plot the straight line corresponds to

  . .

of the returns pdf, an inverse power law behaviour of the returns pdf tails, and a cubic exponent for the tail of the returns cumulative probability distribution.

2.2

Correlation Analysis

Finally we have looked for cross-correlations among different stocks. The starting point of our cross-correlation investigation has been to quantify the degree of similarity between the synchronous time evolution of a pair of normalised returns time series by the correlation coefficient at a fixed :                   (2.12)

(10)

0 0.1 0.2 0.3 0.4 0.5 0.6 ρ(τ) 0 5 10 15 20 P( ρ(τ))

Figure 2.4: Probability density function for correlation coeffi cients 





for   

.The mean value is      



.

By definition, 

   can vary from  (completely anti-correlated pair of stocks) to (com-pletely correlated pair of stocks). When



     the two stocks are uncorrelated. So we have determined a      matrix, a square matrix with side as big as the number of shares considered, of correlation coefficients. This matrix is a symmetric matrix with



   

along the diagonal. Hence, in our portfolio,          correlation coefficients characterise the matrix completely.

In figures 2.4 and 2.5 we plot two different probability density functions of the full set of  correlation coefficients for the matrices corresponding to two time intervals   and   



. It is clear that there is a positive mean of the correlation coefficient

which increases with : for   the mean is      and for  



the mean is      . The shapes of the two pdf are different. When the time interval is too small,    , price fluctuations dominate and the correlation is lower. In the other case, if the time interval is larger, for example  



, the information among stocks is relevant and the correlation is higher. It is interesting to note that in all cases the correlation coefficient are all positive.

(11)

2.2 CORRELATION ANALYSIS 26 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ρ(τ) 0 1 2 3 4 5 6 P( ρ(τ))

Figure 2.5: Probability density function for correlation coeffi cients 



for  



. The mean value is     



.

This is a strong evidence that in our portfolio there are no anti-correlated pairs of stocks. It is not very probable to find some strongly anti-correlated pairs of stocks. Our portfolio is small, only the thirty stocks which are the most traded in the Spanish stock market, and then this result is not so upsetting. In the analysis of correlation between stocks done for the S&P   index, where there are 500 stocks, the correlation coefficients pdf has a very little negative tail, see [3,pag 102]. An example[12] of only positive correlation coefficients is that of the set of thirty stocks of the Dow-Jones Industrial Average of NYSE, analysed during the year 1990, where the minimum value of



is  ) and the maximum   . This feature of correlation coefficient can be attributed to the fact that the majority of stocks more or less “follow” the same market indices, and this introduces a kind of underlining positive cross-correlation. In fact, the analysed portfolio corresponds to the Ibex-35 , which represents the main index of the Spanish stock market.

(12)

A little application

To test the efficiency of the correlation coefficient,  

   , we have calculated it for a pair of stocks, corresponding to the most correlated ones. They are BBVA and SAN, two of the largest banks of Spain. It is known that these two banks have similar market interests, sharing, and they are expected to show correlation. In figure 2.6 we plot the two price time series of the selected stocks. It is evident that the two price time series are remarkably

0 1e+05 2e+05 3e+05

Time (sec.) 500 1000 1500 2000 Price (ptas) SAN price BBVA price

Figure 2.6: Price time series of two stocks, chosen among the most correlated ones.

They are BBVA (bottom curve) and SAN (top curve), two of the largest Spanish banks.

synchronised. Only at a time scale of minutes one can see some differences due to little fluc-tuations caused by arrivals of external information. We have hence calculated our correlation coefficient for different time intervals,         



and we have plotted our 

 

in figure 2.7 in a log-linear scale. The correlation coefficient increases with the increasing time interval, starting from a relatively little correlation to a large correlation, following an inverse power law behaviour. The increase of the correlation coefficient is related to the fact that at large intervals the stocks price follow the market trend. The power law behaviour of

(13)

2.2 CORRELATION ANALYSIS 28

this increase might be relationed with the prices changes correlation, due to the similarity of the exponents, but we have not looked at this aspect in this work. We have fitted the 

  

by using a function like 

   



   . From the fit we have obtained a limit value for



  



    and an exponent       . The limit value is not  because of the independent fluctuations, and the value of 

  



 is    corresponding to the maximum value of

  in figure 2.5. This test shows the validity of this parameter.

10 100 1000 τ(min) 0.5 0.6 0.7 0.8 0.9 1 ρ ij (τ) ρi,j(τ) 1 day limit best fit

Figure 2.7: Cross-correlation between the two most correlated stocks, BBVA and SAN,

for different temporal windows.

2.2.1

From correlation to distance

We have then analysed the correlation coefficient matrix to detect the hierarchical organi-sation present inside our portfolio. To do that, we have introduced a metric defined using as distance a function of the correlation coefficient. The correlation coefficient of a pair of stocks cannot be used as a distance between the two stocks because it does not fulfil the three axioms that define a metric:

(14)

(i)        if and only if   ; (ii)             (iii)                    . An appropriate function is:

               (2.13)

With this choice,   



 fulfils the three axioms of a metric distance: (i)

        if and only if    ; (ii)              and (iii)                     . The first axiom is valid because 

  



   if and only if the correlation is total (



  , namely only if the two stock perform the same stochastic process). The second axiom is valid because the correlation coefficient matrix, and hence the distance matrix , is symmetric by definition. The third axiom is valid because equation 2.13 is equivalent to the Euclidean distance between two vectors

and



which are obtained from the time series

and



by considering each record of the time series a component of a vector. The vector obtained has a unitary norm, namely it has been obtained by subtracting to each record the average value, and by normalising it to its standard deviation. The introduction of a distance between pairs of stocks was first proposed in [13], where a distance numerically verifying the metric distance axioms was used. The knowledge of the distance matrix between a set of objects is customarily used to decompose the set into subsets of closely correlated objects.

2.2.2

Ultrametric Space and Minimal Spanning Tree

To better point out the hierarchies which are in our portfolio, we have represented our sys-tem using the mathematical theory of graphs, i.e., mathematical objects formed by vertices connected by arcs. Hence we have built up a network whose vertices are the 30 selected stocks, and the arcs are obtained by considering the return cross-correlations. We have used the distance matrix to determine the minimal spanning tree, MST [14], connecting our  stocks.

Among all different mathematical graphs we have chosen the MST because it is attractive and because it provides a topological arrangement of stocks which selects the most relevant

(15)

2.2 CORRELATION ANALYSIS 30

connections of each element of the set. Moreover the MST gives, in a direct way, the sub-dominant ultrametric hierarchical organisation of the stocks investigated. An ultrametric space is only a useful space for linking  objects, and the distance between these objects is an ultrametric distance. The ultrametric distance is a distance that fulfils the first two proper-ties of a metric distance,





  if and only if  and









 

, while the usual triangular inequality is replaced by a stronger inequality, called an ultrametric inequality,

                         (2.14)

From now on, we use for the ultrametric distance between two stocks the symbol 



    . With this choice, an ultrametric space provides a natural way to describe hierarchies among our stocks, since the concept of ultrametricity is directly connected to the concept of hier-archy. The method of constructing a MST linking a set of 

objects is the algorithm of Kruskal[14]. In a few words, the MST associated with a Euclidean distance matrix can be obtained as follows: first one orders the non-diagonal elements of the distance matrix in increasing order, than the MST is progressively built up by linking all the elements of the set together in a graph characterised by a minimal distance between stocks. One starts with the pair of elements with the shortest distance, drawing un edge between them. Than one adds edges between the successive pairs following the increasing order, while avoiding the creation of cycles, until all edges have been added. The resulting tree is a sub-graph of the graph contains all the stock of our distance matrix , but the total weight of it (the sum of the weights of its edges) is a minimum.

Before calculating the MST of our portfolio, we have calculated the MST of a random distances matrix (     ). We have generated a set of  correlation coefficients accord-ing to a Gaussian with null mean and unitary variance. Then we have normalised the  extracted coefficients to the interval    , because they must be correlation coefficients. Then we have calculated the random distances matrix by using 2.14. We have done it, to have a term of comparison and to be able to point out the presence of hierarchies in our portfolio.

As one can see in figure 2.8, a Gaussian MST is very typical. There are two nodes of big correlation, due to the facts that in a Gaussian pdf the probability to have large values is very small, so one can obtain at most one or two large distances, one drawn to the right and

(16)

Pajek

Figure 2.8: Minimal Spanning Tree generated from a random distances matrix deriving

by a Gaussian distribution of correlation coeffi cients.

one to the left. Then, all values are almost symmetrically divided into positive (correlated), and negative (anti-correlated) ones, so that the two nodes correspond to the two subsets of correlated and anti-correlated stocks respectively with the smallest and the largest value.

We have then calculated the MST taking into exam a time interval 



. The reason why we have used this time interval is to follow previous analysis done for the NYSE on one hand, and after same theoretical and empirical considerations for different time intervals,     



on the other hand. If is too small (intraday), the correlation among stocks is nearly null, in other words the price fluctuations dominate and the corresponding MST is very similar to the Gaussian one. If  is too large, a week or a month, almost all main stocks act more or less in the same way. Practically, they follow the Ibex-35 index which they themselves generate. This behaviour gives rise to a huge correlation among all stocks connected only to the market trend. In all cases, the resulting MST maintains, on a time scale contained between one day and one month, its structure that exhibits a satisfying economic hierarchy, called economic taxonomy.

(17)

2.3 RESULTS 32 ACE ACR ACS ACX ALB ALT AMS ANA BBV BKT DRC ELE FCC FER IBE IDR NHH POP PRS REP SAN SGC SOL TEF TPI TPZ TRR UNF VAL ZEL Pajek

Figure 2.9: Minimal Spanning Tree of our portfolio, showing a quite evident

hierarchi-cal structure.

2.3

Results

The figure 2.9 shows the first result of our analysis. We have obtained the well known economic structure underlying our selected stocks, starting only from the price time series of stocks, without any additional assumptions. There is a main cluster centred on BBVA (bank); the others banks such as SAN-BKT-POP are linked to this main cluster. Other branches containing only stocks of the same category, as ACE-ACS-ANA-DRC and ACX-FCC for building sector, TEF-IDR-TPI-TRR for telecommunications sector, and AMS-SOL-NHH for tourist sector, are visible. In some cases and in branches containing stocks of the same sector, we can find one or two stocks of a different sector, for example ALB (holding) in the telecommunications sector. This can be easily explained: some companies control other companies of different sectors, and this fact cause a kind of underline correlation.

Figura

Table 2.1: Table of the selected thirty stocks among the ones composing the Ibex-35
Figure 2.1: Example of a high leptokurtic distribution compared with a Gaussian dis-
Figure 2.2: Probability density function for high-frequency Ibex-35 prices returns with the Gaussian pdf (dashed line)
Figure 2.3: Cumulative probability distribution for high-frequency Ibex-35 prices returns ( 
+7

Riferimenti

Documenti correlati

This problem seems now solved because the continuing evolution within the multiverse keeps only finely tuned universes to begin with, of critical mass and of unified

This evidence suggests that the recent large inow of low-skilled immigrants to Italy, by massively enlarging the magnitude of the long-term care market and pushing down its cost,

Regardless, our thymus-transplantation data clearly demonstrated the dominant role of central insulin expression in regulating islet-autoimmunity, as ID-TEC thymus alone

Average roughness values measured in opaque areas are very close to the non implanted spacers, which in any case show a different surface finish between the Spacer-G (R a = 3.37 µm)

If the ultimate end of representation using perspective – and all of the classic delineation derived from it – was founded on the attempt to (re)construct – or reproduce – space

In the light of automatising the common approaches to the analysis of models of biological systems, we intro- duced the reader to PYTSA, a novel Python Time-Series Analyser to

The 8 th Biennial International Meeting of the CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society was hosted by the University of Pavia within

As the locations (Juliet’s house and tomb) of the play that can be visited in Verona are not actually related to the story, tourists who give importance to authenticity may