• Non ci sono risultati.

Betting according to Bookmakers Forecasts: The case of English Football Leagues.

N/A
N/A
Protected

Academic year: 2021

Condividi "Betting according to Bookmakers Forecasts: The case of English Football Leagues."

Copied!
50
0
0

Testo completo

(1)

Simone Torre

Betting according to Bookmakers

Forecasts: the case of English

Football Leagues

Master thesis

University of Pisa & Sant'Anna School of Advanced Studies

February 2017

(2)

Master of science in

E

CONOMICS

Betting according to Bookmakers

Forecasts: the case of English

Football Leagues

Thesis by

Simone Torre

Supervisor

Prof. Giulio Bottazzi

. . . .

Candidate

Simone Torre

. . . .

23 February 2017

Academic year 2015/2016

(3)

To my family, friends and whoever supported me so far. I would especially mention my parents, whose love I never lacked and my brother Marco, to who I would never be thankful enough. And, since life is a matter of love, I dedicate this achievement to Anna.

(4)

Betting according to Bookmakers forecasts: the

case of English Football Leagues

Simone Torre

Abstract

Gambling as a research topic has been fascinating the academic world since centuries and scientists belonging to the the more disparate areas have devoted eorts to describe and structure this social phenomenon. We provide an overview of dierent contributions and analyze the case of Football Betting Market. The goal is to set out a betting strategy allowing to obtain prots being clueless of the underlying events dynamics. In order to do so we discriminate between bookmakers advises, via the study of the odds, in the eort to form educated predictions of the match outcomes and minimize the regret arising with not complying with the best forecasts. To this aim we resort to Online Portfolio Selection methodologies, in the attempt to translate them from the nancial reality to the specic context.

(5)

I would like to acknowledge Prof. Bottazzi for his patient and kind guidance.

(6)

Contents

1 Introduction 4

1.1 Scientic collocation . . . 4

1.2 The importance of the pshychological perspective . . . 5

2 Betting markets 7 2.1 Structure and main features . . . 7

2.2 Bookmakers . . . 8

2.2.1 How to derive implicit probabilities . . . 9

2.3 Forecasting . . . 11

2.3.1 Accuracy measures . . . 12

3 Experts and investment strategies 14 3.1 General idea . . . 14

3.2 An information theory set-up . . . 15

3.2.1 Universal Portofolio . . . 16

3.2.2 Universal Portfolio with experts . . . 17

3.2.3 Logarithmic Loss . . . 18

3.2.4 Gambling . . . 19

3.2.5 Kelly criterion . . . 20

3.3 Online Portfolio Selection . . . 20

4 An application to Football Betting Market. 22 4.1 Dataset and Rationale . . . 22

4.2 Empirical set-up . . . 23 4.3 Descriptive statistics . . . 25 4.3.1 Focus on forecasting . . . 29 4.3.2 Favourite-Longshot bias . . . 32 5 Results 33 5.1 Sequential Investement . . . 33 5.2 Additive Investement . . . 35

5.3 Kelly Criterion investment . . . 36

5.4 Half-season . . . 37

5.5 OLPS investement . . . 37

6 Discussion 39

(7)

List of Figures

2.1 Market Book . . . 8

4.1 Margins . . . 29

4.2 No sweat strategy . . . 31

4.3 Returns distribution vs odds posted . . . 32

5.1 Cover weigths . . . 34

5.2 Sequential investment losses . . . 34

5.3 Additive investment losses . . . 36

(8)

List of Tables

1.1 Psychological literature review . . . 6

2.1 Football forecasting literature review . . . 12

4.1 Bookmakers . . . 22

4.2 Ex-post realizations, football season 2011/2012 . . . 22

4.3 Legend . . . 23

4.4 Highest-lowest odds ratios . . . 26

4.5 Draw odds correlation (League2) . . . 26

4.6 Friedman Test . . . 27

4.7 Disagreement . . . 27

4.8 Shin probabilities and Entropy . . . 28

4.9 Margin and z . . . 29

4.10 Percentage of experts performance . . . 30

4.11 Accuracy measures . . . 31 4.12 Favourite-Longshot bias . . . 32 5.1 Stategies wealth . . . 33 5.2 Benchmark wealth . . . 33 5.3 Strategies return . . . 35 5.4 Benchmark returns . . . 35 5.5 Kelly Betting . . . 36 5.6 Seasons' rst half . . . 37

5.7 Seasons' second half . . . 37

5.8 Follow the winner average losses . . . 38

(9)

Chapter 1

Introduction

1.1 Scientic collocation

In his very essence gambling means taking part to a game of chance with a sure waging and an uncertain outcome (either positive or null). In the common jargon, though, it is most often conceived with its negative acceptation, rather hinting to a reckless behavior.

Gambling activity has been ubiquitous in human societies through years, as we could name examples dwelling from stone age1 to modern times.

Due to its intrinsic nature, studies on the subject permit to deeply analyze decision-making under uncertainty, which have historically been appealing diverse scientic research eorts. To certain aspects in fact, statistics, probability theory and gam-bling overlap in terms of dening aims and nature. Indeed, in all areas, uncertainty plays the central role and, not surprisingly, statistical reasoning has been serving as the most obvious mean by which deepen the understanding of gambling dynamics. Conversely, scientic discussion beneted, either directly or indirectly, from some applications dedicated to games of chances. Worth mentioning, gambling problems were the focus of the rst mathematical approach tailored to probability theory.

In this vein, as reported by Siegrist [49, 2011], we could enumerate De Mere's problem (1654), The problem of points (1654), Pepys' problem (1693) and perhaps the most prominent example of Bernoulli's infamous St.Peterbourgh Paradox (1713). The concept of utility theory, cornerstone of economic and social sciences, has gam-bling established roots as well, with contributions by Cramer (1728), Bernoulli (Marginal utility, 1738), Morgesten and Von Neumann (Von Neumann-Morgesten utility theory, 1947) and Ward (1950). Thus it results clear the interplay and mu-tual inuence of the dierent bodies of research.

When contextualized to sport events, we commonly refer to Betting.

Bettors form subjective probabilities on the outcomes of the events, and, as dis-cussed below, rather often biased ones. Being out of scope a discussion on the origin of probability theory and the philosophical conictual relationship between objectivism and subjectivism, our aim is to just highlight the sentimental nature of bettors' decisions [27]2. Hence we need to deepen the psychology of gamblers to

1Specically, there are evidences of a primitive game of chances where astragalus ( the ankle

bone) played the role of a rudimentary dice.

2See Al-Najjar and De Castro [15, 2009] for an exhaustive analysis on subjective probability

(10)

CHAPTER 1. INTRODUCTION 1.2. THE IMPORTANCE OF THE PSHYCHOLOGICAL PERSPECTIVE

grasp what stands behind their betting attitude.

1.2 The importance of the pshychological

perspec-tive

Driven from the multidisciplinary nature of the subject, we provide a avor of the psychological literature on the issue, with the attempt to investigate on the na-ture of the biases we referred to. In this framework the idea of Homo oeconomicus is severely challenged and the emotional and biased feature of agents assessments could not be neglected. Rationality of individuals (bettors to our ends) have been heavily inquired in economic contexts since the pioneering work by Kanheman and Tversky [36, 1979], and, among others, Shiller (2000) and Sunstein and Thaler (2008). To our ends we refer to rationality as the ability to eectively assess true chances and bet accordingly. As notably argued by Kahneman [35, 2011] individu-als tend to overestimate the probabilities of unlikely events and to overweight them in their decisions. Formally such a concept is presented by Tversky and Koehler [57, 1994], in terms of Support Theory, and summarized as : "[. . . ] both qualita-tive and quantitaqualita-tive assessments of uncertainty are not carried out in a logically coherent fashion [pag.565]". This attitude seems to explain an established result in the betting literature known as the positive favourite-long shot bias. The latter consists in the distorted implied probability assigned to events by bookmakers as to underestimate favorite teams winnings and overestimate underdogs chances. Actu-ally such a phenomenon could be also interpreted, in a neoclassical fashion, as the response to the presence of a portion of risk loving individuals in the bettors pop-ulation. Additionally, the wide psychological analysis touches other delicate issues such as problem gambling and illusion of expertise, suggesting that individuals com-mit serious cognitive biases in judgmental and decisional activities (Campitelli and Speelman [7, 2014]). Related to this, Langer [39, 1975] denes illusion of control as: an expectancy of a [personal] success probability inappropriately higher than the objective probability would warrant [pag.314]". The study, in fact, report individuals to feel overcondent in chance situations due to such an illusion. The concept of illusion of control is broad and well outlined in all its dimensions in Thompson , Armstrong, and Thomas [55, 1998], while an exhaustive and comprehensive review on overcondence is provided in Moore and Healy [42, 2008]

On the other hand, Ceci and Liker [8, 1986] reported evidences of a complex deci-sion making process taking place, allowing some individuals to make more accurate assessment of fair odds than others, irrespective to their IQ and expertise levels. Their study collect 30 usual racetrack attendants and shows complex modeling with multiple interaction eects and nonlinearity when predicting outcomes in some of them. Despite the objects of the study are dierent, we can easily apply it to our case. Later, Evans [22, 2012] enlarge the discussion on Risk Intelligence (RI), dening it as the ability to estimate probability accurately. In this vein people are characterized by dierent degrees of RI, just as what holds for IQ, and a point is made on the lack of eorts devoted to its development.

All the cited contributions t in our context fully and successfully depict the reality of both sides of the gambling environment. In what follows we will consider the

(11)

CHAPTER 1. INTRODUCTION 1.2. THE IMPORTANCE OF THE PSHYCHOLOGICAL PERSPECTIVE

average bettor as characterized by a low RI, with a portion of the population sup-posed to perform relatively better (as explained in Shin model in subsection 2.2.1). The underlying assumption is that bookmakers, given the greater risk exposure in case of probabilities miscalculations, are able to produce more adherent forecasts either by recruiting skilled people (as claimed by a professional bookmaker in [51]) or by exploiting reliable models. Either ways odds-makers are assumed, rightly so, to be endowed with an higher degree of RI, which justify condence. Below we report a summary of the tailoring of the literature to the betting reality.

Authors Year Assessment Contribution Data

Langer 1975 Negative Illusion of control 631adults

Ceci and Liker 1986 Positive Systematic prediction ability. 30racetrack attendants

Tversky and Koehler 1994 Negative Biased probabilities assessments. 120 students

Evans 2011 Positive Risk intelligence.

-1Assessment on gamblers' skills (directly or as an individuals' subcategory).

(12)

Chapter 2

Betting markets

2.1 Structure and main features

Betting markets have been thoroughly analysed in the economic literature, with dierent approaches tackling rather disparate issues. We will provide an overview of the most compelling ndings and partially replicate some of them with the data in possess. As argued by Thaler and Ziemba [54, 1988], betting markets seem to oer the possibility to inquire on rationality of investors and likely to show evidences on market eciency. Unlike canonical nancial markets, each asset has a pre-determined termination point along with a certain return conditioned on the occurrence of the event at stake (resembling hence option markets). While anomalies are then reported, in principle the structure of the market could lead to eciency, due to the conditions facilitating learning. According to the authors, eciency could be either strong, as long as all bets have expected values equal to the amount bet (adjusted by the transaction costs) or weak when there are not bets with positive expected values. An early attempt to inquire on the eciency of betting markets in parallel with the nancial literature could be found in Dowie [20, 1976]. The author concludes: "weak eciency is complete if the expected values are equal throughout the range and decreases to the extent that expected values diverge from equality [pag.147]". Levitt [40, 2004] stresses the undeniable similarities between the two markets. In particular, both are characterized by prot driven investors with heterogeneous beliefs, by the zero-sum game feature and by the high volumes involved. He further deals with striking dierences though, among which price dynamics are the most compelling. Especially, in betting markets prices are set by odd-setters and are not the product of the demand-supply interactions.

Several works have coped with eciency in the literature, all stemming (as far as football betting is concerned) from the pioneering work by Pope and Peel[45, 1989]. They argue that despite odds may be incline to oer protable opportunities, betting markets result to be weakly ecient, in the sense that no betting strategies guarantee abnormal returns. Undoubtedly favorite-long shot bias represents the most well known and studied betting markets' ineciency, the nature and direction of which still embodies source of disagreement in the literature. First evidences were reported in horse-racing markets and later contextualized to football scenarios. In presence of the bias, the underlying probabilities inferred from the posted odds, are no longer fair but skewed, either positively or negatevely. As for the rst case, most probable events result overpriced (the implicit probabilities are lower than the

(13)

CHAPTER 2. BETTING MARKETS 2.2. BOOKMAKERS

fair ones) while the opposite holds for underdogs (less probable events). Viceversa as far as the second scenario is concerned. Among others, evidences of positive bias are found by Cain, Pope and Peel [6, 2000], examining exact score market, and Deshamps and Gergaud [16, 2012]; Dixon and Pope [18, 2004] reported the persistence of a negative bias, while Kuypers [38, 2000] and Forrest, Goddard and Simmons [25, 2005] claim the nonexistence of the phenomenon. Moreover, Paton and Williams [44, 2005] found evidences of ineciencies in the spread betting market and [16] concentrated also on disagreement in the eort to exploit out-of-the-market odds. Our analysis focuses on xed-odds football betting market (as opposed to person-to person exchanges), and in the next section we aim to characterize its major players.

2.2 Bookmakers

Bookmakers provide the odds for all the type of sport events, acting as market makers. Prices are set well in advance to the underlying match and bettors act as price takers. Firms reserve the right to tune odds until the very beginning of the event, even though upgrades appear to be rare and of little entity. Adjustments, if any, usually occur as the matches approach. Due to their modus operandi and the striking turnover gures, several attempts have been made to shed lights on their behavior, igniting an intriguing stream of research. Especially after the advent of online bookmakers, prices oered are easily available to the the general public and comparisons between odds could straightforwardly be drawn. The two competing denitions depict odd-setters' activity either as to maximize prots or to balance their books [23, 1997]. In our view, though, the two characterizations are not mu-tually exclusive. It is fair to rst describe the pricing process and then enter into details. We draw our attention to the simplest market consisting in: Home win, Draw, Away win. Bookmakers set unfair odds as to earn a margin on every bet. If we compute the sum of the inverse of the prices (πi = c1i), in fact, we obtain

an amount exceeding unity. That amount represent the overround (also known as vigorish or margin), an implicit fee rms charge to provide their services.

Figure 2.1: Market Book Source: www.betdevil.com

Figure 2.1 above depicts partially the spectrum of prices oered by dierent bookmakers and betting exchanges, along with the relative margins, for a random

(14)

CHAPTER 2. BETTING MARKETS 2.2. BOOKMAKERS

match.

According to Fingleton and Wardon [23] the margin is mainly induced by: • positive operating costs

• positive monopoly rent

• risk aversion on the part of bookmakers • inside information on the side of the punters.

An higher margin, ceteris paribus, could be translated into a less competitive price oered. Indeed, as Fingleton and Wardon [23] pointed out, setting a lower vig-orish could be interpreted as a risk seeking behavior, while margins higher than competitors may characterize risk-averse rms. Aversion to risk has always been considered as the main driver in the odds-setting process, even though a grow-ing literature stemmgrow-ing from Shin [48, 1993] and recently intensifygrow-ing (Strumbelj [53, 2010], Humphreys [33, 2011]) has challenged it. Especially in [33], examining NFL point spread market, it is modeled and clearly set out the convenience and protability of unbalanced books for odds-setters1. Detractors of the predominant

view consider bookmakers taking a position in the market, rather than sit and earn marginal returns. The reasons are numerous, dwelling from the hedging due to the presence of informed bettors, the superiority in assessing outcome probabili-ties ([40]), the access to extra information to the normal ow (broken-leg cue as for Webby and O'Connor [58, 1996]) and the knowledge of the irrationality characteriz-ing bettors. The point made in [58] is extremely valid in such a context as the claim: "judges will outperform models when they have contextual information to help them comprehend discontinuities in series [pag.95]" gains even more prominence when both models and judgments are combined. Buying this argument, we will not con-sider bookmakers as neutral odd-providers but rather as taking a position, with the goal to build betting strategies derived from their implicit assigned probabilities. In what follows we attempt to isolate the margin bearing in mind that as pointed out in [45], we are not able to disentangle (if not by mean of assumptions) the particular contribution of each factor aecting it.

2.2.1 How to derive implicit probabilities

As the well-known axiom of Kolmogorof theory imposes, the sum of the probabilities assigned to a nite and complete set of events must sum up to the unity measure. As already showed previously though, if we simply were to invert the odds to obtain probabilities, we would have not met the basic axiomatic requirement. In order to comply with it, two methods are mostly used in the literature and seem to subtly hint to respective dierent views on odd-setters' behavior. The simplest approach, Normalization, assumes that the margin the bookmakers set are unrelated to the specic outcome, rather suggesting neutrality on market makers side. For every match i = 1, . . . , n and outcome y ∈ {h, d, a}

pN ormy,i =

πy,i

πh,i+ πd,i+ πa,i (2.1)

1Worth exploring also the literature there referenced, where the same conclusions for dierent

(15)

CHAPTER 2. BETTING MARKETS 2.2. BOOKMAKERS

Despite the easy solution, such a procedure seems to not capture some funda-mental features proved to be integrating part of the market as the favourite long shot bias and the heterogeneity of the bettors' population. We conceive the odds-setting process as to take into account the peculiarities above described, attributing then a less decentralized role to bookmakers. Hence the method applied derives from the maximization of bookmakers' prot aware to interact with a portion of inter-locutors who are either insiders or skilled. First proposing this optimization has been Shin [48] and then further developments and applications have been intro-duced by, among others, Jullien and Salanié [34, 1994], Fingleton and Warldon [23] and Strumbelj [52, 2014]. According to Shin, z represents the percentage of the total volume of bets coming from insiders or, as for our usage, from better skilled investors (with above-average RI levels). At this stage the essential features stem in the bookmakers oering the full menu of bets and expected return equal to zero. Each bookmaker posting odds cy = π1y for the outcome y then faces an expected

liability:

1 πy

(py(1 − z) + z)

where the factor z

πy is assumed to be paid out with certainty.

In the application in [52] it is further assumed that bookmakers form probabilis-tic beliefs p, shared by the uninformed bettors. Though, it would be inconsistent with our understanding of betting markets, as we rely on the superiority of odd-setters' assessment. Yet, provided we put forward a betting strategy which fully incorporates bookmakers views, we overcome the issue by requiring the same p to characterize uninformed bettors. Nevertheless, in [48], the aggregate behavior of uniformed bettors leads to the true probabilities, clearly recalling the widespread concept of Wisdom of the crowd, even more evident if we were to deal with betting Exchanges. Anyhow p is dierent for each bookmaker b, reecting the imperfect competition intrinsic in the market. Every bookmaker seems to set prices indepen-dently so that the following rationale applies indiscriminately. Hence the bookmaker unconditional expected liabilities turn out to be:

3 X j=1 pby,i πby,i (pby,i(1 − zbi) + zbi)

where we specify b to highlight the dierent beliefs, even if on edge, on the odd-setters side. The total expected prot then reads:

T (π) = 1 − 3 X j=1 py,i πy,i (py,i(1 − zi) + zi)

Therefore, a bookmaker xes πi = c1

i (where ci are the posted odds) as to achieve the threefold goal of maximizing T (π), remain competitive (not posting odds either too poor with respect to competitors or too favorable to who could exploit them fully) and avoid negative odds. The latter two features are achieved by subjecting the maximization exercise to Π ≤ β, which means that the booksum is constrained from above by market conditions2, and 0 ≤ π

i ≤ 1. In particular β refer to the

2Π = 1 pih + 1 pid + 1 pia

(16)

CHAPTER 2. BETTING MARKETS 2.3. FORECASTING

ctitious outcome of a bid stage between entrant and incumbent bookmaker. We are not going to exhibit all the theory behind it, to which [48] is the main reference. The solution reads:

πShiny,i = q zipy,i+ (1 − zi)(py,i)2 3 X j=1 q zipj,i+ (1 − zi)(pj,i)2

where, with some abuse of notation, y refers to the specic event we are calcu-lating probabilities of, j = 1, 2, 3 just codes in an ordinal scale what we referred above as y = h, d, a, and i refers to the specic match instance. The parameter z, as sketched above, roughly stands as the percentage of informed bettors in the market and its dynamics are captured by the following:

zi = 3 X j=1 s z2 i + 4(1 − zi) π2 y,i Πi − 2

Which can be solved by xed-point iteration starting at z0 = 0 and exploiting

P3

j=1py = 1 as in [52]. In order to obtain a set of probabilities, Jullien and Salanié

[34] inverted Shin solution as follows: py,i= q z2 i + 4(1 − zi) π2 y,i Πi − zi 2(1 − zi) (2.2) As showed in [52], the latter method better captures the underlying market dynam-ics and proves to be more accurate. Moreover we largely dealt with biases in the demand side and Shin method seems to take account of them. Hence, as our use of bookmakers implicit probabilities is headed to maximize forecasting accuracy, we adopt it.

2.3 Forecasting

Stemming from the work by Hill [32, 1974], academic world started to move in the direction of forecasting football outcomes, other than believing in a purely chance-driven setting. Various models have been proposed, mainly referring either to a pure statistical approach or to more complex computer science applications. Either ways, past football data and ancillary information are gathered and used to form educated predictions. To provide some examples, Dixon and Coles [17, 1997] introduced a time-dependent Poisson regression model to forecast the amount of goal scored in a match; Rue and Salvesen [46, 2000] grounded predictions on the estimation of time-dependent attacking and defensive skills via a Bayesian dynamic generalized linear model; Goddard and Asimakoupoulos [30, 2004] proposed an ordered pro-bit model for match results; Costantinou, Fenton and Neil [12, 2013] presented a Bayesian network taking into consideration both objective and subjective informa-tion. Another perspective has been forwarded by Forrest and Simmons [26, 2000] and [45] where in both cases predictions were based on tipsters' advices, obtaining though not remarkable results (worse than a naïve statistical model and bookmak-ers implied predictions). More interesting proved to be the approach conditioning forecasts exercises to Bookmakers' probabilities implied by the odds. Successful

(17)

CHAPTER 2. BETTING MARKETS 2.3. FORECASTING

results with respect to statistical counterparts were reported by Forrest, Simmons and Goddard [25, 2005], overturning the widespread idea that statistical objective method always outperforms less complex methodologies. On the same line also [53, 2010] and Smith, Paton and Williams [50, 2009]. Given the parsimonious feature and a growing literature we focus our analysis on such a procedure of building fore-casts and we rely on it for betting purposes. Eectively, our approach is prediction eortless, as we are simply inverting the odds and extrapolating probabilities for forecasting ends.

Authors Year Source Method Data

Dixon and Coles 1997 Statistics Poisson model 6629matches

Rue and Salvesen 2000 Computer Science Bayesian model 932matches

Forrest and Simmons 2000 Tipsters Newspapers tipsters 1694 matches

Pope and Peel 1989 Tipsters Newspapers tipsters 1066 matches

Goddard and Asimakoupoulos 2004 Statistics Probit model 7781matches

Costantinou et al. 2013 Computer Science Bayesian network 6624matches

Forrest et al.. 2005 Bookmakers Odds study 10000matches

Table 2.1: Football forecasting literature review

2.3.1 Accuracy measures

After producing a set of probabilities, either objective or subejective, and put for-ward a forecasting exercise consistent to it, we are left with evaluating the goodness of the output. Accuracy have been tested dierently in the literature, dwelling, for instance, from pseudo-likelihood measures [46], Brier Score [53], and conditional logistic regression [50]. Given our setup and aims, though, another measure seems to t better. First proposed by Epstein [21, 1969], Rank Probability Score (RP S) has been adapted to the specic context by several authors, i.e. [53], to perform a multy-category forecast (Home win, Draw, Away win).

RP Si = 3 X j=1  j X k=1  pk  − j X k=1  yk 2 (2.3) As equation 2.3 shows, RP S represents the sum of the dierences between the cumulative forecast probability and the cumulative outcome probability (where the outcomes have been coded as h = 1, d = 2, a = 3 and y = 1 if the outcome is realized and hence y = y∗, 0 otherwise). The fundamental explanation relies on

the classication of football outcomes on an ordinal scale.3. The convenience of

RP S stands in being able to capture and penalize the dispersion of the predictions from the realized outcome. Being our goal, though, to exploit such probabilities for betting purposes we consider also an alternative forecasting measure. Our crude objective is to give prominence to the bookmaker which, via the implied predictions and the induced betting activity, assured the highest prot given the ex post realiza-tions. By doing so, we implicitly acknowledge the three outcomes as being mutually exclusive and therefore we consider predictions with a lexicographic reasoning. To

3An enlightening and exhaustive analysis on the rationale could be found in Costantinou and

(18)

CHAPTER 2. BETTING MARKETS 2.3. FORECASTING

do so we exploit the cumulative product of the probabilities of the realized outcome (Log likelihood measure (LLM)):

LLM = log n Y i=1 Iy∗t=ypy,i (2.4) Where, again, y∗

t = y refers to the realized outcome and py,i to its relative

(19)

Chapter 3

Experts and investment strategies

3.1 General idea

After having dealt with accuracy measures, we turn to nd betting strategies, if any, that could guarantee protability.

Various betting strategies have been proposed in the literature and, despite the un-deniable dierences, almost all sharing a common point. In fact the idea of Value Betting is the driver to the set up of a fairly high amount of investment schemes. With Value Betting we refer to the gambling activity grounded on a dierent set of probabilities than the bookmakers' ones. In [17] words, to achieve protability:"it requires a determination of probabilities that is suciently more accurate from those obtained by published odds". According to the latter a bet is then placed if the un-derlying model assigns an higher probability to a specic event with respect to the "istitutional" counterpart (via the study of the posted odds) and prots arise whenever the built-in margin is overcome. Precisely on this issue [45] state that bookmakers do not employ the available information optimally, not meeting the axioms of rational expectations, but that positive ows are not enough to balance expenses. Clearly then, protability strongly relies on the accuracy of the forecasts. In this manner, Buraimo, Peel and Simmons [5] suggest betting whenever Fink Tank predictions exhibit positive expected returns when compared to posted odds. Their focus, being it the outcome with the highest frequency, is on Home win solely. Likewise [16] play around the concept forwarded, rening and circumscribing it to the specic case of a sucient level of disagreement among bookmakers. Their approach leads to a drastic reduction in the size of the losses otherwise achieved. Also strategies focusing exclusively on the Draw outcome have been put forward (Archontakis and Osborne [4, 2007]).

Rening Value Betting, [12] employed dierent betting strategies. Yet, the focal idea is still to form predictions other than bookmakers' ones but the investment scheme relies on the intensity of the discrepancy between the two set of probabili-ties. Additionally, the number of outcomes to bet on in each match instance vary from 1 to 3 (the full array of possibilities). To the best of our knowledge, this is the only case of multiple betting in the literature hereby referenced. If not for [12] and [4] (but with a dierent rationale to what follows) and [45] with a seemingly Kelly avour, the applications covered cope extensively with the need of choosing which event to bet on, but scarcely with the amount of wealth to punt. In order to do so the most well known application resorts to the seminal paper by Kelly

(20)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.2. AN INFORMATION THEORY SET-UP

[37, 1956]. As it will be argued in the following section, by assuming a logarithmic utility function (which captures risk-aversion on the bettors side) we could nd the ideal bet size as the output of an optimization exercise.

As explained below, we attempt to build betting strategies with the twofold at-tempt to shrink the occurrence of bankroll ruin in a sequential investment frame and exploit experts views in the form of bookmakers implicit probabilities. We partially aim to reply to the challenge in [45], cornerstone in the betting literature: "Moreover, there are systematic dierences in the apparent odds-setting processes employed by the rms which suggest that pooling of information contained in the odds will lead to more ecient forecasts. However, these superior forecasts do not appear to be translatable into betting strategies that generate post-tax prots [pag.339]". Or-thogonal to the mainstream applications, our goal consists in allowing any punter to earn prots without any knowledge of the underlying event by simultaneously bet on each outcome. In order to do so, our focus inevitably heads to online portfolio selection literature. Next section provides a background on the subject.

3.2 An information theory set-up

Information theory and gambling are two elds strongly tied, the outset of this rela-tionship being the work by Kelly [37]. It is due to mention Portfolio Selection theory as the trait d'union between the two subjects, at least to our ends. As outlined by Hoi and Li [41, 2014] the two major schools for investigating on the distribution of wealth among assets are Mean Variance Theory and Capital Growth Theory. The latter fully draws from information theory and originates from the above-mentioned work by Kelly. It focuses on sequential portfolio selection and aims to maximize the portfolio's expected growth rate.

As for our setup and aims we lean on it and try to contextualize it to our case study. Capital growth theory is mainly characterized by the logarithmic shape of the objective function and sequential investment is multiplicative, due to the com-pounding nature of the process (Hakansson and Ziemba [31, 1995]), even if we will also outline an additive investment scheme.

Kelly recognizes the gambling framework as a communication channel, whose sym-bols represent the outcomes of a chance event.1.The receiver may prot from any

knowledge of the input symbol (exploiting insider information) or from a more ac-curate estimate of the underlying probabilities. The author then provides a rather simple betting rule through which a bettor receiving a determined symbol and punt-ing accordpunt-ingly, maximizes his/her capital growth rate2. Basically nature transmits

a symbol, which is the realized outcome, while what received consists in any sort of predictions (in our case experts views), formed before the realizations become common knowledge. The optimal fraction of capital to bet on the occurrence of the transmitted symbol, after observing the received symbol, is equal to the conditional probability of the transmitted symbol on hypothesis of received symbol. In the sections to come we will outline our usage of such a criterion.

Following this le rouge, various applications spread, directly or indirectly related. We focus on a Portfolio Theory strategy with established roots in information

the-1see Shannon [47, 1948] for a detailed explanation of transmission systems

(21)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.2. AN INFORMATION THEORY SET-UP

ory as main reference and we will attempt to adapt it to the specic betting context.

3.2.1 Universal Portofolio

Cover [13, 1991] developed an algorithm capable of outperforming the best stock in the market and to asymptotically track to rst order in the exponent the per-formance of the best constantly rebalanced portfolio in hindsight. Such an idea is synthetically conveyed as universality, through which we dene it as Universal Portfolio (UP ). The underlying concept acknowledges the arbitrary nature of the sequences of prices and the low learning power, strictu sensu, about the continu-ation of the sequence. Consequently its major strength relies in not formulating assumptions whatsoever on the statistical distribution of the future stock prices, if any.

Before entering in more details, some concepts and notations are due. We dene x = (x1, x2, . . . , xm)t ≥ 0 as a stock market vector for one investment period, xj

referring to the price relative (the ratio of closing to opening price) of the jth stock.

We invest our entire wealth according to the weights wj ≥ 0 , Pmj=1wj = 1. Such

weights dene the portfolio wi = (w1, w2, . . . , wm)i, and Sn(w) =

Qn

i=1wixi

repre-sents the wealth achieved after n investment periods.

The simplest investment stategy, Buy-and-Hold distributes in advance our wealth in xed proportions among m stocks, without trading any further in the periods to come. The wealth achieved then reads :

Sn(Q, xn) = m X j=1 Qj,1 n Y i=1 xj,i (3.1)

where Qj represents the distribution we rely on. In a way then, we do not

trade anymore after choosing the stocks in the rst period, so that wi (intended as

portion of portfolio wealth) may well vary, and extremely likely it does, throughout the investment span.

More rened strategies refer to the constantly rebalanced portfolios class. In such a context the weights above dened remain constant throughout time, and active trading ensures it to be the case. A CRP, if the price relatives are independent and identically distributed, achieves the optimal growth rate of capital ([14]). The resulting wealth is computed as:

Sn(w) = n

Y

i=1

wixi (3.2)

Universal portfolio, as already sketched, aims to track the performance of the best portfolio in the latter class (BCRP ), which we will largely deal with. In what follows with Q we refer to the class of all constantly rebalanced portfolios and with W = (W1, . . . , Wm) ∈ D to the vector dening each component of the class. Covers'

algorithm denes the following strategy: Pj,i(xi−1) = R DWjSi−1(W, x i−1)µ(W ) dW R DSi−1(W, xi−1)µ(W ) dW j = 1, . . . , m, i = 1, . . . , n (3.3)

(22)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.2. AN INFORMATION THEORY SET-UP

where µ could either be the uniform density or Dirichlet distribution3. As

Cesa-Bianchi and Lugosi [9, 2006] put it, the universal portfolio is just an average of the wealth achieved by the individual strategies in the (costant rebalanced portfolios) class:

Sn(P, xn) =

Z

D

Sn(W, xn)µ(W ) dW

Where D is the probability simplex in IRm.By mean of Riemann approximation we

obtain:

Sn(P, xn) ≈

X

i

QiSn(Wi, xn) (3.4)

With this formulation it is easy to see that the universal portfolio acts as a buy-and-hold strategy over all constantly rebalanced portfolios.4

Hence, in analogy with the discrete time example we will introduce later, but with a dierent approach, we dene the UP weighting process as:

ˆ wi+1 = R DwSi(w)dµ(w) R DSi(w)dµ(w)

In mathematical terms, universality as outilined above reads: lim n→∞xsupn,yn 1 nlog Sn∗(xn|yn) Sn(xn|yn) = 0 (3.5) where S∗

n(xn|yn) corresponds to the wealth achieved by the Best constantly

rebalanced portfolio. To link this idea to the simplest among our purposes we refer to [14] and next section explains.

3.2.2 Universal Portfolio with experts

We could incorporate experts opinion in the frame above discussed and still achieve universality. Cover and Ordentlich [14, 1996] show it for the discrete case, to which we are now focusing, as well as for the previous continuous case.

Let us consider a class of B = (1, . . . , b) experts and let wb

i be the portfolio

recom-mendation of the expert b on day i, with i = 1, . . . , n. We denote S(b)

n as the wealth

factor achieved by the bth expert's strategy. By allocating wealth 1

b per expert and

investing according to each sequence of portfolio selections, we would accrue: ˆ Sn = 1 b XB b=1 Sn(b)

which attains at least 1

bmaxb

PB

b=1S

(b)

n for every sequence x1, . . . , xn of price

relatives.

For the same reasoning in equation 3.5 this scheme is universal for the class of B expert strategies. Abandoning the idea of uniform distribution, the consequent portfolio ˆwi at time i reads:

3In other words Dirichlet(1,. . . ,1) or Dirichlet(1 2, . . . ,

1 2) 4To a technical analysis and a detailed explanation, see [9]

(23)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.2. AN INFORMATION THEORY SET-UP ˆ wi = PB b=1S (b) i−1wbi PB b=1S (b) i−1

Hence at each time the portfolio choice results in a performance weighted average of the portfolio allocations of the B experts. This implementation and, more gener-ally, Universal portfolio selection, belongs to the Follow the Winner online portfolio selection class, as classied by [41]. The basic idea is to weigh more prominently past winners when deciding how to distribute our wealth. In a way, the stock, or the expert performing better is assumed to keep the trend also in the near future, thus implying momentum. Clearly, this approach would highly payout during market rallies.

The approach of wealth maximization via repeated investment is strongly tied to sequential prediction aimed at minimizing logarithmic loss. Next section is devoted to it.

3.2.3 Logarithmic Loss

To introduce the subject, [9], using analogies with probability distribution, dene expert and forecaster respectively as:

fn(yn) = n Y i=1 fi(yi|yi−1), pˆn(yn) = n Y i=1 ˆ pi(yi|yi−1)

for every n ≥ 1 and outcome yn∈ Yn, where f and p are sequences of functions

respectively fi, pi : Yi−1 → D. Clearly then, past matters in the decision stage.

The analogy with probability distribution is then grasped noting that: X yn∈Yn fn(yn) = X yn∈Yn ˆ pn(yn) = 1

Then for a sequence of realized outcomes y∗1, y∗2, . . . , y∗n we dene the

cumu-lative loss function of expert and forecaster as: Lf(y∗n) = n X i=1 ln 1 fi(y ∗i|y∗i−1) = − ln fn(yn∗) ˆ L(y∗n) = n X i=1 ln 1 ˆ pi(y ∗i|y∗i−1) = − ln ˆpn(yn∗)

which basically state that the cumulative loss is nothing else than the negative log likelihood assigned to the outcome sequence by the agents.

The approach aims to minimize the dierence between the cumulative loss of the best expert and the forecaster, to which the authors refer to as regret:

ˆ L(yn∗) − inf f ∈FLf(y ∗ n) = sup f ∈F lnfn(y ∗ n) ˆ pn(yn∗) (3.6) To link the minimization of the logarithmic loss to the sequential investment outlined in the previous sections the following holds:

(24)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.2. AN INFORMATION THEORY SET-UP

where x1, . . . , xn are Kelly market vectors5 and by which f and ˆp are induced

by the relative investment strategies Φ and Θ.

Hence the regret that a forecaster may experience by not following the strategy of an expert, or, as for our usage, a strategy based on the expert views is dened as: ˆ Ln− Lf,n= ln fn(yn∗) ˆ pn(yn∗) = ln Φ(y ∗ n) Θ(y∗ n)

and equation 3.6 applies to dene the minimization of the regret also to this specic case at stake. In our application the dierence between forecaster and expert gets pretty blurred as it will become clear in the sections to come.

3.2.4 Gambling

A further step to tune what exposed to our case foresees the forecaster and the expert to invest consistently with the formed predictions in a gambling environ-ment. In other words, their entire wealth is always distributed among a nite set of mutually exclusive outcomes, and bet sequentially.

Given the outcomes sequence and the probabilities assigned (f, p) already dened (interchangeably we could refer to Φ and Θ), the capital of both agents after the ith event reads: C n Y i=1 ˆ pi(y∗i|y ∗ i−1)oi(yi∗), C n Y i=1 fi(yi∗|y ∗ i−1)oi(y∗i) (3.7)

where C stands for the initial capital and ˆpi(y∗i|y ∗

i−1)oi(yi∗) =

Pn

i=1Iyi=y∗ip(y)oˆ i(y) multiplies the capital by the odds and probability of the realized outcome same holds for fi(yi∗|y

i−1)oi(yi∗)))



Again our goal is to compare the forecaster and best expert performances via: supf ∈F CQn i=1fi(yi∗|yi−1∗ )oi(yi∗) CQn i=1pˆi(y ∗ i|yi−1∗ )oi(yi∗)) = sup f ∈F ˆ pn fn

The latter is independent from the odds posted6 and its logarithm corresponds

to the cumulative regret as outlined earlier.

In this vein we will consider bookmakers as experts, aiming at establish whether a pooling could provide better forecasts than individuals in the football betting con-text and if experts views could be translated into protable strategies in a portfolio selection fashion7.

We will then base a strategy on the outcomes of the portfolio selection and com-pare it to other simple online alternatives and benchmarks. Next chapter denes our exercise.

5Vectors with the ith component equal to 1 and 0 elsewhere

6As we will argue later, in our application we refer to a unique set of odds and every strategy

relies on it

7See Clemen and Winkler [10, 1986] and Fomby and Samant [24, 1991] on the benets of

(25)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.3. ONLINE PORTFOLIO SELECTION

3.2.5 Kelly criterion

As outlined earlier, Kelly aligns the gambling environment to a communication sys-tem. The purpose of his work lays in nding the optimal bet size which guarantees the maximal growth rate of wealth. To our ends, we refer to the implementation in [5]. We assume a logarithmic utility function, whose expected value goes as follows:

Ui = py,ilog(Si+ fio(i)) + (1 − py,i) log(Si − fi)

where Si refers to the wealth level, fi to the amount bet, o(i) to the odds to

which the bets are placed (as stated before o(i) = max c(i)) and py,i to the Shin

probabilities for the yth outcome.

The optimization exercise outputs the Kelly or optimal stake as a percentage of wealth:

fi

Si

= py,io(i) − (1 − py,i) o(i)

where obviously fi ≤ Si and fi = Si if and only if we consider insider bettors

(according to Shin model, the z portion in the bettors community), thus implying py,i= 1

3.3 Online Portfolio Selection

Universal Portfolio is not exempt from critics though. One stems from Samuelson and actually refers to Capital growth theory. He blames the theory to rely too heavily on the choice of the logarithmic utility function, which he acknowledges not to always be the better way to characterize agents, and to lead to poor assets diversication. Without going into details, the picture is cleared in Ziemba [59, 2015] where an interesting discussion is reported, and both Samuelson's objections and Ziemba replies are outlined. With partially the same arguments with respect to Bernoulli's log utility function, but not directly pointing to Kelly's ndings, worth mentioning is also Kanhemann and Tversky's Prospect Theory.

The more signicant aw, as for direct Cover's admission as well, consists in poor early growth. The point is clearly made in O'Sullivan and Eldeman [43, 2015] and a solution proposed. Despite UP algorithm seems to guarantee sensational asymptotic results, it needs a great number of days to recognize the dominant rebalanced portfolio. The authors then lay out an Adaptive variation to Cover's work with the introduction of a function αi in the eort to weigh more prominently

portfolios which have performed better up to time i. We are not going in great details into this strategy if not by describing the portfolio weighting process:

ˆ wi+1= R Dαi(w)wSi(w)dw R Dαi(w)Si(w)dw

Yet, still drawing from the Follow the winner class in the OLP S literature we deepen on a parallel approach to Cover's. The focus is on Successive constant rebalanced portfolio (SCRP ) and Weighted successive constant rebalanced portfolio (W SCRP ), developed by Gaivoronski and Stella [29, 2000] and coinciding with Universal Portfolio exclusively in terms of ultimate goal and target.

(26)

CHAPTER 3. EXPERTS AND INVESTMENT STRATEGIES 3.3. ONLINE PORTFOLIO SELECTION

As for UP the only available and used information regard the stocks price relatives up to the decision time. Also in this case, BCRP represents the benchmark to which compare performances. Dierently from UP , the approach proposed in [29] results to be computationally easier and thus not exposing to the so called curse of dimensionality arising with integration on a fairly high number of stocks. SCRP algorithm draws from methodologies developed for the solution of nonstationary optimization problems. In fact the market is not considered to be stationary and only the boundedness of the series of price relatives is required. The underlying idea stands in investing each period i with the weights corresponding to the BCRP up to time i − 1. Also here wealth is measured as:

Sn(w) = n

Y

i=1

wixi

The rst trading day the portfolio redistributes wealth equally in all assets w1 = (1

m, . . . , 1

m), while from the second day on, the portfolio is build as the outcome

of the following optimization:

maxw∈WFi−1(w) = maxw∈W

1 i − 1 i−1 X k=1 fk(w) (3.8)

where i = 1, . . . , n represent the decision time , fk(w) = log(wTxk) and W =

{w : wj ≥ 0,Pmj=1wj = 1}

In order to cope with scarce data the authors introduced W SCRP , by which smoothing the selection process in order to avoid that new data releases could dramatically change the portfolio composition. In other words the aim is to prevent outliers or very short term market situations to have a substantial long run weight. Such a smoothing is implemented via a linear combination between the previous portfolio and the new one (output of the same maximization as for SCRP ). In practical terms, the starting portfolio composition respond to the uniform weighting w1 = (1

m, . . . , 1

m), while from i = 2, . . . , n wealth is distributed according to:

wi = γwi−1+ (1 − γ)vi

where γ ∈ (0, 1) represents the weighting parameter, wi−1 stands for previous

period portfolio and vi = arg max

v∈W Fi−1(v) is dened in the same manner as in

(27)

Chapter 4

An application to Football Betting

Market.

4.1 Dataset and Rationale

The empirical analysis is carried out on data referring to the four major English Football Leagues (Premier League, Championship, League1, League2) for a total of 2033match instances. The historical odds, downloaded from www.football-data. co.uk, regard the season 2010 − 2011. The odds collected refer to Friday's data as far as weekend matches are concerned and to Tuesday's for weekdays matches. Table 4.1 shows the bookmakers sample and the relative coding.

Bet365 BetandW in Gamebookers Ladbrokers SportingBet W illiamHill StanJ ames V CBet BlueSquare

b1 b2 b3 b4 b5 b6 b7 b8 b9

Table 4.1: Bookmakers

The choice to cover English leagues vertically stems to a twofold rationale. First, betting in England is a very vivid activity. Hence we assume the odds to reect a more active and perhaps skilled demand. In second place, dierentiating through leagues with monotonically decreasing appeal could, in principle, shed lights on the behavior of bookmakers. The season analyzed, instead, represents a cherry pick, as an almost complete set of odds is available for it.

Ex-post realizations

League obs. Home win Draw Away win

Premier League 379 47% 29% 24%

Championship 552 44.6% 26.8% 28.6%

League 1 551 45% 25% 30%

League 2 551 41% 30% 29%

(28)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.2. EMPIRICAL SET-UP

4.2 Empirical set-up

Before introducing the exercise undertook we outline a guidance on how to read the graphs that are going to follow. When referred to all leagues, the order goes:

Premier

League 1

Championship

League 2

Whereas when all the bookmakers, or ET F s in the practical application, are con-sidered the following legend holds:

b1 b2 b3 b4 b5 b6 b7 b8 b9

Table 4.3: Legend

Despite the already listed similarities with nancial markets, obviously bets are not comparable to assets. Bets last one period and are contingent to the realization of the event. Then there is not exposure whatsoever to inventory risk and no possibility to trade1. Still the idea of constant rebalanced portfolio may be applied,

provided that the absence of trading activity makes meaningless any distinction with the concept of buy and hold. In fact we could in principle maintain xed proportions of our wealth bet in the three outcomes, regardless of the underlying events.

Even more so, the assumption of independence results compelling in such a context as every match is disputed by dierent teams and in dierent conditions. Other than this, we assume that league-wise matches are not played contemporaneously and what Thorp [56, 1997] denes divisibility of capital. As for the former assumption, in such a way we guarantee a fairly wide dataset to which investigate in. With the latter, instead, we allow to bet portions of wealth other than integers in the sequential investment context. Both assumptions could be overcome at the expense of greater computational costs, but with expected gains in terms of research quality and punctuality. Bets are by denition independently distributed and we do not encounter transaction costs. In our view bookmakers take a position in the market mainly to exploit the biased demand they interact with. Implicitly then, they assign a probability other than the one implied by a mere interest in balancing the book. Each bookmaker sets prices independently, at least as far as the model

(29)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.2. EMPIRICAL SET-UP

used is concerned 2, and thus the study of the underlying probabilities must be

idiosyncratically conducted.

The reference is to the simplest market: Home win, Draw and Away win.

Therefore, each period we dene the outcomes set to be y = {h, d, a} and the outcome sequence yi = (y1, . . . , yn), where n is the number of matches in a season.

As for the Kelly set-up, we consider the implicit probabilities as the signal received by each bookmaker and in every period we build a portfolio distributing wealth on each outcome consistently: wb

j,i = pbj,i (in line with the idea sketched in 3.2.3 and

given that P3

j=1pj = 1).

We assume the nine bookmakers to be the only dealers and every bet to always be placed at the best odds in the market3, regardless of the bookmaker from which

probabilities the investment strategy is based. Hence: o(i) = max cb(i)

where cb(i) = (cb1(i), . . . , cb9(i)) is the set of odds for the i

th match.

Given the evident hustles in considering bets as assets, we introduce a ction en-abling us to apply elements from portfolio selection literature to the context of interest. Specically we consider nine ET F s whose portfolio managers invest in every match according to the probabilities implied by each bookmaker and at the odds o(i). Accordingly, after the conclusion of every event, wealth gets multiplied as for equation 3.7. Hence we obtain the analogue of an ET F market price vector xi = (x1,i, x2,i, . . . , x9,i)t for every investment period (match instance), where xm,i,

with m = 1, . . . , 9, captures the wealth dynamics of the mth ET F after the ith

match. Therefore, by mean of this construction we are conditioning the investment scheme fully on the expert advices. As for the spirit of the work we voluntarily omit any subjective consideration about the teams playing or the particular contingent states. The average bettor is in fact assumed to be always endowed with an infe-rior information set and hence any allocation other than the ET F s' ones results sub-optimal. Evidently, we are not interested in insiders or skilled bettors (z), since they would not rely on similar strategies.

First we consider the computationally lighter solution outlined in subsection 3.2.2, which consists in a mere weighted average between the ET F s and where the cumu-lated wealth achieved by each of them up to the decision time acts as discriminant. The output of the process provides the weights applied to the 3-dimensional out-comes set for each match. We then sequentially bet according to Cover weights and we compare the results both with other online relabanced portfolios (either constantly or not) and in hindsight benchmarks. The purpose is to both prove the ecacy of the reliance on experts views in online selection and to build protable investment schemes.

For what concerns the peer alternatives, we refer to:

• Naïve strategy with equal portfolio weight for each match in the season: w =1 3, 1 3, 1 3 

2By the reasoning in section 4.3

(30)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS

• Ex-post strategy with previous season's (ps) nal outcomes proportions: w = (wpsHome, wpsDraw, wpsAway)

• Bayesian strategy with weights computed via Bayesian probabilities updating (up) :

wi = (wi,upHome, wi,upDraw, wi,upAway)

• Average strategy with weights computed averaging (av) among bookmakers, with equal weights:

wi = (wi,avHome, wi,avDraw, wi,avAway)

The benchmarks to which we compare our strategy with instead resort to: • Best Constant Rebalanced Portfolio based on the idea sketched in section 3.2

and directly computed on the matrix of the realized events odds. Hence such a scheme correspond to the best allocation that an agent who perfectly foresees the realizations could put forward if forced to distribute his/her wealth in xed proportions.

• Best stock, which in such a context refers to the expert whose forecasts led to the maximal prot or interchangeably to the minimal logarithmic loss

• Oracle, which refers to the unlikely case where we are always in the position to spot the expert assigning the highest probabilities to the outcome realized and base our portfolio choices accordingly

We then relax the assumption of divisibility of capital, switching to a milder investment approach. The latter foresees additive betting; in order words for every match one unit is bet, regardless of the wealth achieved through the previous bets. This strategy is in line with the mainstream football betting literature. To a fur-ther comparison, we put forward pure Kelly strategies (as outlined in section 3.2.5). Consistently to the spirit of the work, namely to invest being clueless of any fun-damentals, we set our Kelly strategies to bet repeatedly on the same outcome for every match.

Afterwards, we switch to the more complex olps methodologies described in sec-tion 3.3 and compare the results with the BCRP computed on the set of nine EF T s. It represents the best xed proportions in hindsight on the expert predictions.

4.3 Descriptive statistics

Before conducting individual analysis, we present the major features of the leagues in the attempt to draw comparisons.

First of all we need to justify the work motive itself. As a matter of fact, had the bookmakers the same posted odds or were they equally good forecasters, the exercise would result meaningless.

(31)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS

As for the rst point, we computed the highest-lowest odds ratio in the market for each event as τ = Oddmax

Oddmin. Table 4.4 reports the mean results where ratios considerably higher than the unity measure hints to persistent dierences in the odd-setting process.

P remier Championship League1 League2 ¯ τhome 1.09 1.09 1.09 1.09 ¯ τdraw 1.1 1.1 1.1 1.1 ¯ τaway 1.16 1.15 1.15 1.14

Table 4.4: Highest-lowest odds ratios

Despite the poor information added by these averaged values, we notice the ratios to be in the order of 9% at least. For all the leagues the same pattern seems to hold, as Away win is horizontally characterized by wider bounds with respect to the other outcomes. This may reect a risk prone attitude by bookmakers towards the realization of this event. Being it the less likely in the set, and being Drawconsidered as a random variable ([19, 2001]), hence not object of probabilities manipulation, such a feature could hint to an attempted exploitation of the biases on the bettors side.

Further, we computed the odds correlation matrices league-wise and outcome-wise. The results highlight coecients close to unity for Home and Away win odds, regardless of the specic league, while as far as Draw odds are concerned the degree of correlation sensibly decreases.

Therefore it appears that bookmakers share a general view on the market, and tune odds on the edge due to idiosyncratic valuations. The lower coecients regarding Draw conrm the peculiarity in the event handling. To example purposes, Table 4.5 reports the draw odds correlation matrix for League 2.

b1 b2 b3 b4 b5 b6 b7 b8 b9 b1 1.000 b2 0.753 1.000 b3 0.814 0.760 1.000 b4 0.787 0.687 0.761 1.000 b5 0.816 0.756 0.824 0.775 1.000 b6 0.784 0.710 0.727 0.689 0.713 1.000 b7 0.795 0.727 0.805 0.738 0.790 0.709 1.000 b8 0.873 0.748 0.814 0.766 0.795 0.792 0.787 1.000 b9 0.808 0.711 0.791 0.777 0.810 0.737 0.774 0.804 1.000

Table 4.5: Draw odds correlation (League2)

To asses the dierences, if any, in the accuracy of the sampled bookmakers we carry out the Friedman test (Friedman [28, 1937]) by using the output of the RPS analysis([53], [52]). We refuse the null hypothesis of equally goodness and Table 4.6 reports the relative p-values for each league. Thus, the dierent performances of the agents justify the philosophy behind our work.

(32)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS

P remierLeague Championship League1 League2 1.14x10−3 9.66x10−6 6.55x10−10 1.43x10−10

Table 4.6: Friedman Test

Moreover, we computed the average level of disagreement between bookmakers (measured by the standard deviation of the odds for each outcome and match). The higher this measure, the higher the chances that a bookmaker, either intentionally or not, set a price out of the market.

P remier Championship League1 League2 ¯ σhome 0.99% 1.12% 1.12% 1.1% ¯ σdraw 0.7% 0.67% 0.7% 0.6% ¯ σaway 0.93% 1.02% 1.04% 1.04% Table 4.7: Disagreement

The results in Table 4.7 hint to a stable hierarchical order in the disagreement measure, as the decreasing ranking ¯σhome, ¯σaway, ¯σdraw holds throughout the leagues.

Anyway the average levels are well contained below the 2% threshold, meaning that unlikely we could nd odds posted extremely out of the market or at least not in a repetitive fashion.

After the due prologue, we outline the descriptive statistics stricto sensu for the season. As shown in Table 4.2, Home win has been the most frequent outcome for all leagues, and the ex-post realizations' proportions appear to be rather stable through the dierent leagues.

Table 4.8 reports the average probabilities assigned by the sample of odd-setters along with the average level of entropy. Albeit bookmakers performed fairly well on average, all of them slightly overestimated the occurrence of Premier League's Away win and League 2 Home win. We consider entropy as the measure of the bookmak-ers' expected surprise with regard to the nal outcome of each match4. On average

it appears that the uncertainty in the odds for Premier League is considerably low with respect to other Leagues. We could impute such a result to the highest degree of information available for the league, the likely more leveled strength of the teams belonging to it and the larger demand appealed, that could ignite the mechanism of the Wisdom of the crowd eect, hence shrinking uncertainty to some extent. Both the probabilities and entropy levels look pretty close in an intra-bookmakers analysis, corroborating the necessity of a shared view of the market by the rms. We also try to relate entropy to the margin set, in order to investigate whether an higher uncertainty perceived could turn into an higher overround imposed on the population of bettors. We regressed the margin imposed, match by match, on each bookmaker entropy levels (league-wise). The econometric outputs though do not highlight any compelling relationship whatsoever.

Having prevented for the presence of bias, and hence distributed the margin for the outcomes with a criterion, we can now focus on the remarkable dierences in

4To ease of comparisons we normalized the measured Shannon Entropy dividing by the

(33)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS b1 b2 b3 b4 b5 b6 b7 b8 b9 Premier League ¯ phome 46.1% 46.1% 45.7% 46% 46% 45.5% 46.1% 46.3% 46% ¯ pdraw 25.4% 25.2% 25.5% 25.2% 25.1% 26% 25.4% 25.4% 25.5% ¯ paway 28.5% 28.7% 28.8% 28.8% 28.9% 28.5% 28.6% 28.3% 28.5% entropy 0.886 0.885 0.893 0.888 0.888 0.894 0.888 0.883 0.889 Championship ¯ phome 44.8% 44.6% 44.7% 44.9% 44.8% 44.3% 44.6% 45.1% 44.8% ¯ pdraw 27.3% 27.3% 27.1% 27.3% 27% 27.7% 27.1% 27.1% 27.2% ¯ paway 28% 28.2% 28.2% 27.8% 28.3% 28% 28.3% 27.8% 28% entropy 0.950 0.954 0.953 0.952 0.952 0.954 0.955 0.946 0.952 League 1 ¯ phome 44.6% 44.5% 44.4% 44.9% 44.7% 44.5% 44.6% 45.% 44.7% ¯ pdraw 27% 26.9% 26.8% 26.6% 26.8% 27.2% 26.9% 26.7% 26.7% ¯ paway 28.4% 28.6% 28.9% 28.4% 28.6% 28.3% 28.5% 28.3% 28.6% entropy 0.949 0.952 0.953 0.949 0.95 0.954 0.951 0.944 0.95 League 2 ¯ phome 43.8% 43.8% 43.8% 44.3% 44% 43.9% 44.2% 44.1% 43.9% ¯ pdraw 27.3% 27.3% 27% 27% 27% 27.4% 27.1% 27.1% 27% ¯ paway 28.8% 28.9% 29.1% 28.7% 29% 28.7% 28.7% 28.8% 29.1% entropy 0.960 0.961 0.962 0.959 0.961 0.955 0.96 0.955 0.96

Table 4.8: Shin probabilities and Entropy

the bookmakers margin-setting policy. We look for regularities to exploit and rene potential strategies. As already mentioned we are not able to disentangle the dier-ences in vigorish, but, as shown later, we are incline to refuse the idea of a causality between bad forecasting skills and higher margins. The margins are well contained in fairly tight bounds for all bookmakers, indicating a limited dependence from the particular underlying event. A close look to the margins' standard deviations con-rm the low dispersion to the mean. As reported below, though, it appears that lower mean margins correspond to higher mean volatility. Bookmakers seem to keep a margin policy that holds vertically to all the dierent leagues as responding to a precise rm-specic choice, rather than taking into account the likely dierences in the dierent championships.

(34)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS

Figure 4.1: Margins

Average z could be interpreted as the percentage turnover tied to the presence of insider traders (in Shin's model they are assumed to know the outcome in advance) or, relaxing a bit the assumption, as the favorite-longshot bias intensity. Obviously, as by the construction set out in subsection 2.2.1, z is proportional to the margin.

b1 b2 b3 b4 b5 b6 b7 b8 b9 Premier League ¯ λ 0.054 0.080 0.077 0.065 0.101 0.065 0.054 0.036 0.073 σλ 0.29% 0.75% 0.28% 0.82% 0.24% 0.95% 1.11% 1.04% 0.59% ¯ zShin 0.028 0.041 0.039 0.033 0.052 0.033 0.027 0.018 0.037 Championship ¯ λ 0.065 0.111 0.088 0.065 0.101 0.067 0.08 0.038 0.074 σλ 0.27% 0.61% 0.33% 0.57% 0.27% 1.26% 0.67% 1.15% 0.91% ¯ zShin 0.033 0.056 0.045 0.032 0.051 0.034 0.041 0.019 0.038 League 1 ¯ λ 0.065 0.112 0.100 0.065 0.101 0.068 0.083 0.043 0.074 σλ 0.38% 0.33% 0.39% 0.54% 0.25% 2.14% 0.49% 0.97% 0.38% ¯ zShin 0.033 0.056 0.050 0.033 0.051 0.034 0.042 0.022 0.037 League 2 ¯ λ 0.065 0.112 0.100 0.065 0.101 0.068 0.085 0.043 0.075 σλ 0.41% 0.21% 0.37% 0.66% 0.28% 1.49% 0.48% 1.84% 0.98% ¯ zShin 0.033 0.056 0.050 0.033 0.051 0.034 0.043 0.022 0.037

Table 4.9: Margin and z

The levels of z then, consistently with the margins, seem to be independent from the specic league and specic match. If we were to intend the value of z also as the intensity of the longshot bias ([52]), the implicit probabilities estimated by mean of Shin method could be considered fair , or at least fairer with regards to other applications.

4.3.1 Focus on forecasting

First we aim to check for the presence of a bookmaker in our sample whose prob-abilities outperformed competitors repeatedly. In order to do so we report the

(35)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS

percentages of best performances throughout the season, or in other words the frequency of best forecasts.

b1 b2 b3 b4 b5 b6 b7 b8 b9 Premier League 11% 12.1% 4.3% 11.3% 9.4% 15.4% 9.3% 15.1% 12.1% Championship 6.2% 13.5% 3.3% 13.4% 8% 13.1% 8% 23.3% 11.2% League 1 9% 9.5% 8.5% 9.5% 7% 12.5% 12% 20% 12% League 2 10.3% 12.7% 5.54% 10% 7.5% 18.72% 8.74% 16.3% 10.2%

Table 4.10: Percentage of experts performance

The gures reported in Table 4.10 show that there is not a clear leader for any league, if not for relative spikes in the order of ≈ 20%.

Consistent to subsection 2.3.1 we measured individual accuracy by means of RP S and LLM. The dierences among bookmakers, even if on the edge, conrm the conclusions drawn from the Friedman test. Table 4.11 reports the average and median RP S and LLM. A close look to it highlights the tight relationship between the two accuracy measures. In fact in 3 out of the 4 leagues they both indicate the same best bookmaker. As stated above there seems not to emerge any correlation between an higher margin and a lower performance. Not at all, as far as regard Championship and League2, the bookmaker (b2) xing the highest

margins reports also the best performances in protability terms (as measured by LLM). Provided, in fact, that we bet at the same set of best odds, holding for all strategies based on bookmakers probability, LLM is rightly considered as a protability measure. Nevertheless if we were to consider each time the highest multiplier of capital brought up by the bookmakers implied probabilities, we would nd that the rm keeping the lowest average margin (b8) happens to report the

higher frequency in almost all the scenarios.

It is worth investigating whether is there any relationship in place between performance accuracy and the cost imposed by bookmakers on every bet. Had we nd it, then a rather simple no sweat strategy exploiting the forecasts of the bookmaker charging the highest fees and placing bet at the best odds could lead to eortless gains. In order to do so we introduce another performance measure (P ), bearing in mind that every bet is placed at the same set of best odds regardless of the portfolio selection adopted 5. Hence we rely on the bookmaker-wise relative

entropy of each realized outcome (y∗):

Pb = log

1 pby∗

(36)

CHAPTER 4. AN APPLICATION TO FOOTBALL BETTING MARKET. 4.3. DESCRIPTIVE STATISTICS b1 b2 b3 b4 b5 b6 b7 b8 b9 Premier League RP S 0.1999 0.2010 0.2011 0.2005 0.2007 0.2000 0.2009 0.2011 0.2011 RP Smedian 0.1644 0.1660 0.1676 0.1689 0.1684 0.1664 0.1686 0.1700 0.1696 LLM −383.3 −384.4 −384.4 −383.4 −383.7 −382.6 −383.6 −384.1 −383.5 Championship RP S 0.2159 0.2156 0.2162 0.2162 0.2163 0.2167 0.2167 0.2159 0.2165 RP Smedian 0.1843 0.1902 0.1860 0.1845 0.1875 0.1839 0.1844 0.1867 0.1873 LLM −576.1 −574.9 −576.2 −576.0 −576.1 −576.6 −577.4 −575.9 −576.6 League 1 RP S 0.2204 0.2205 0.2203 0.2211 0.2204 0.2209 0.2214 0.2200 0.2206 RP Smedian 0.187 0.191 0.195 0.189 0.196 0.199 0.192 0.195 0.193 LLM −574.2 −574.7 −574.4 −575.8 −574.5 −575.9 −576.1 −573.9 −574.8 League 2 RP S 0.2159 0.2156 0.2162 0.2162 0.2163 0.2167 0.2167 0.2159 0.2165 RP Smedian 0.1843 0.1902 0.1860 0.1845 0.1875 0.1839 0.1844 0.1867 0.1873 LLM −576.1 −574.9 −576.2 −576.0 −576.1 −576.6 −577.4 −575.9 −576.6 Table 4.11: Accuracy measures

Clearly a lower Pb hints to a better performance. We then plot each bookmaker

λ against average performance levels Pb for each league. A no sweat strategy as

sketched above would require a linearly decreasing curve in order to justify the interest in it. As the graph below shows, though, on average it does not seem to be the rule.

Riferimenti

Documenti correlati

Non c’è evidenza scientifica sull’efficacia della chemio e radioterapia; tuttavia dall’analisi della sopravvivenza a 3 e 5 anni si evince che essa sia migliore se paragonata a

fructicola (strain AP47) isolated in northern Italy from apple fruit surfaces and used to control brown rot of peaches ( Zhang et al., 2010 ), was assembled by aligning Illumina

For crisis management or for the FFC themselves on the part of the network where they have not experienced significant flood, the determination of the impacted flooded area from

Therefore, in order to investigate the sociolinguistic link between language and social status on the basis of the case study of this dissertation, I will try

Throughout the book Diamond argues that the East-West axis of Eurasia provided an advantage in the dispersal of useful, mainly domesticated, plants and animals.. 87 identifies this

From, this chapter emerges a constant that will accompany us throughout the discussion: the imposition of the phenomenon of play is a logic arrangement only if you always keep in

High oleic sunflower oil showed the best frying performance, with lower total polar compounds, lower octanoic acid formation and a lower unsaturated/saturated fatty acids