Soccer Analytics:
how Data Science is changing the “Beautiful Game”
L. Pappalardo
@lucpappalard
Sports Analytics
● Popularized by book and
movie Moneyball
Sports Analytics
● On-field: performance analysis, soccer scouting
● Off-field: talent scouting, gambling, merchandising
● Popularized by book and
movie Moneyball
Sports Analytics
L. Bornn, D. Cervone, J. Fernandez, Soccer analytics: Unravelling the complexity of “the beautiful game”, Significance, 15: 26-29, 2018.
● Popularized by book and movie Moneyball
● Soccer:
first to try, last to adopt
● On-field: performance analysis, soccer scouting
● Off-field: talent scouting,
gambling, merchandising
Sports Analytics
C. Anderson, D. Sally, The numbers game: why everything you know about football is wrong, Penguin, 2013.
● Popularized by book and movie Moneyball
● Soccer:
first to try, last to adopt
● On-field: performance analysis, soccer scouting
● Off-field: talent scouting,
gambling, merchandising
Charles Reep
1950s
Charles Reep 1950s
R. Pollard, Charles Reep
(1904-2002): pioneer of notational and performance analysis in
football, Journal of Sports Sciences 20(10):853-855, 2002.
2,194 matches annotated by hand
from 1950s to 1990s
Long-ball theory 1950s
length of pass chain ended in a goal
frequency (%)
Long-ball theory 1950s
C. Reep and B. Benjamin, Skill and Chance in Association Football, Journal of the Royal Statistical Society. Series, 131(4):581-585, 1968.
Not more than three passes. “If a team tries to
play football and keeps it down to not more than
three passes, it will have a much higher chance of
winning matches. Passing for the sake of
passing can be disastrous.
Long-ball theory 1950s
C. Reep and B. Benjamin, Skill and Chance in Association Football, Journal of the Royal Statistical Society. Series, 131(4):581-585, 1968.
Not more than three passes. “If a team tries to
play football and keeps it down to not more than
three passes, it will have a much higher chance of
winning matches. Passing for the sake of
passing can be disastrous.
Long-ball theory 1950s
long/short passes
ranking
Charles Reepa
1950s
Valeri Lobanovskyi anni ‘70
Valeri Lobanovskyi
1970s
Valeri Lobanovskyi anni ‘70
Valeri Lobanovskyi 1970s
AM Zelentsov, V.V. Lobanovsky, МЕТОДОЛОГИЧЕСКИЕ ОСНОВЫ РАЗРАБОТКИ
МОДЕЛЕЙ ТРЕНИРОВОЧНЫХ ЗАНЯТИЙ
tagger
2010s
{'eventName': 'pass', 'eventSec': 8.221464, 'matchId': 2576132, 'matchPeriod': '1H', 'playerId': 8306,
'positions': [{'x': 42, 'y': 14}, {'x': 74, 'y': 33}],
'subEventName': 'key pass', 'tags': ['accurate'],
'teamId': 3158}
Soccer-logs
1700 events per match (on average)
{'eventName': 8,
'eventSec': 8.221464, 'id': 217097515,
'matchId': 2576132, 'matchPeriod': '1H', 'playerId': 8306,
'positions': [{'x': 42, 'y': 14}, {'x': 74, 'y': 33}],
'subEventName': 83,
'tags': [{'id': 1801}], 'teamId': 3158}
pass
accurate identifiers
passes xG pressing accuracy
... ... ... ... ... ... ... ...Performance vector
Ranking soccer players
Evaluate and Rank teams
Passing network
J. Duch, J.S. Waitzman, L.A.N. Amaral, Quantifying the Performance of Individual Players in a Team Activity, PLoS One, 5(6), 2010.
J.L. Pena, H. Touchette,,
A network theory analysis of football strategies, arXiv:1206.6904v1, 2012.
Flow centrality
Flow centrality
a player’s
betweenness centrality
J. Duch, J.S. Waitzman, L.A.N. Amaral, Quantifying the Performance of Individual Players in a Team Activity, PLoS One, 5(6), 2010.
J.L. Pena, H. Touchette,,
A network theory analysis of football strategies, arXiv:1206.6904v1, 2012.
Flow centrality
Flow centrality
a player’s
betweenness centrality
J. Duch, J.S. Waitzman, L.A.N. Amaral, Quantifying the Performance of Individual Players in a Team Activity, PLoS One, 5(6), 2010.
J.L. Pena, H. Touchette,,
A network theory analysis of football strategies, arXiv:1206.6904v1, 2012.
Flow centrality
Flow centrality
Team flow centrality
H indicator
H indicator
European Ranking, 2014
P. Cintia et al., The harsh rule of the goals: data-driven performance indicators for football teams, in Procs of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015.
H indicator
European Ranking, 2014
P. Cintia et al., The harsh rule of the goals: data-driven performance indicators for football teams, in Procs of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015.
H indicator simulation
P. Cintia et al., The harsh rule of the goals: data-driven performance indicators for football teams, in Procs of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015.
H indicator
H indicator
+11 points
+2 positions
H indicator
-13 points
-5 positions
H indicator
-17 points
-6 positions
Harshness
harshness
Pros
• simple representation
• consider interactions
Cons
• only passes
• all passes are equal
Passing network
5 seasons
18 competitions 30M events
20K matches
21K players
L. Pappalardo et al., PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach, arXiv:1802.04987, 2018.
Feature Weighting
= ?
= ?
= ?
= ?
= ?
= ? = ? = ?
= ?
Feature Weighting
team performance vector
passes xG pressing accuracy ...
team1
team2
?
Feature Weighting
passes xG pressing accuracy ...
Pappalardo and Cintia, (2017) Quantifying the relation between performance and success
in soccer, Advances in Complex Systems, doi:10.1142/S021952591750014X
Feature Weighting
76 features in total
Evaluating the weights
● stability
across competitions and roles
● evaluation of resulting ranking
Are these weights “universal”?
Are these weights “universal”?
Are these weights “universal”?
Are these weights “universal”?
leagues
Are these weights “universal”?
Rating Computation
performance rating
of u in game g
How to evaluate the evaluation?
algorithm expert 1 expert 2 expert 3
● majority agreement
● unanimity agreement
Evaluation of 211 pairs
Evolution of players
Evolution of players
Performance patterns
Versatility of players
Versatility of players
In summary...
1. weights are similar across different leagues 2. World and Euro Cups slightly differ
3. ranking has high agreement with experts…
4. ...when a difference emerges between players 5. Future: exploring new ways of extracting
weights, that capture non linearity
Open challenges
1. forecast the performance of players 2. search for the player(s) who best
adapt to a team’s playing style
3. make substitutions during a match
which maximize the probability of
winning given the opponents
● L. Pappalardo et al.. 2019.
PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach.
ACM TIST 10:(5)
● L. Pappalardo et al. 2019.
An open data set of spatio-temporal match events in soccer competitions.
Nature Scientific Data 6:236.
Flow Centrality (FC)
Duch et al. (2010) Quantifying the Performance of Individual Players in a Team Activity. PLoS ONE 5(6): e10937.
fraction of a player’s accurate shots
Validation: 8 of the 20 players in the list of the
competition’s best players
Pass Shot Value (PSV)
Brooks et al. (2016) Developing a Data-Driven Player Ranking in Soccer using Predictive Model Weights, SIGKDD
each pass is represented as a vector size=360
Pass Shot Value (PSV)
Brooks et al. (2016) Developing a Data-Driven Player Ranking in Soccer using Predictive Model Weights, SIGKDD