Were the refs more lenient on the hosts in Russia 2018?

We may be on the eve of the new domestic football season, but before we consign the World Cup, Russia 2018, to history, I just want to share this plot with you.

It compares the fouls committed per game by a team, and the number of yellow cards received.

By eye, there does appear to be some correlation between the number of fouls committed by a team and the amount of yellow cards they received. This would be expected.

The hosts, Russia, do appear to be an outlier – the third most fouling nation, yet their card count was low.

This could be for a number of reasons – e.g. if many of the fouls they commit were for, say, offside, you wouldn’t expect them to be receiving yellow cards for those offences. But everyone loves a good conspiracy theory – perhaps the refs, either intentionally or subconsciously, were a little more reluctant to flash their cards for the home team …

It prompts an interesting question to explore for the forthcoming season: do away teams get carded more frequently than home teams?

Many thanks to Answer Miner  for creating and sharing the plot. You can see his original tweet with it here.

Posted in Handling Data | Tagged , | Leave a comment

Saints or Sinners?

The football may be over, but the fun never stops!

There is plenty of data on the recent Russia 2018 World Cup to be found on the Official Fifa site

Using their statistics, I have compared the number of fouls committed versus the number of fouls suffered and plotted the scatter graph above.  Fouls committed are on the x axis, fouls suffered on the y. The line is a (computer generated) line of best fit using linear regression.

The greater the distance above the line, the more “saintly” we can say a team was – more fouled against than fouling; those below the line were the “sinners” of the tournament.

Using my criteria, we can say that, despite not coming home with the trophy, England were the Saints of the World Cup!

(A note of caution, however. As ever with data, we must always consider its validity.  Despite this data coming from the official FIFA website, it has a total of 1734 fouls committed,  but only 1642 fouls suffered – at the time of writing I can’t reconcile the difference.)


A few readers have (correctly) pointed out that the plot is skewed as not all teams play the same amount of games: one would expect France, Croatia, Belgium and England to all be towards the right of the graph as they played more games than other teams.

So I went back to the data produced a plot for fouls committed per game v fouls suffered per game. You can see the plot below. I think we can safely say those above the line were the saints, those below the sinners.


Posted in Handling Data | Tagged , , | Leave a comment

Anyone for tennis?

And so the sun sets on another Wimbledon tournament, one that will be remembered in part for the two losing singles finalists.

Serena Williams was the runner up in the ladies final, ten months after giving birth, and Kevin Anderson lost in the men’s final to Novak Djokovic, two days after playing the second-longest match in Wimbledon history, taking six hours, thirty six minutes to overcome John Isner in the semi-final.

Understandably, Anderson has called for a change in how close games are decided.

To win a tennis set, a player must win 6 games, and be two clear games ahead of their opponent. If the score is 6-6, the set is decided by a tie break, where the winner is the first to 7 points (as long as they lead by two clear points.)

Except for the final set of a match. If the fifth set is tied at 6-6, it continues – with no tie break – until one player leads by two games. And this is where the problem lies.

If both players are good and evenly matched (a reasonable expectation in the semi-final of a Grand Slam Tournament) then the maths tells us there will be a significant number of games before a player loses a game on their serve. i.e., stalemate sets in, with neither player able to break the opponents serve and win by the required two game margin.

The score in Friday’s final set was 26-24, i.e. it took fifty games to decide the final set.

Looking at the stats for the match, Anderson won 213 points from his 278 serves, giving him a probability of winning any point on his serve of 0.7661. Isner won 206 of his 291 serves, giving him a probability of 0.7071 of winning a point when he served.

To win game, you need to win (at least*) 4 points on your serve, with your opponent winning winning no, one or two points. We can use a bit of basic probability to work out the probability of winning “to love”, ” to 15″ and ” to 30″.

* The problem is made harder as we need to consider “deuce”, when the score reaches 40-40. Due to the need to win by two clear points, “deuce” games could go for ever. The good news is, is that we can model “deuce” games as a geometric series, which we can sum to infinity, thereby coming up with a probability for winning a “deuce” game.

Using the probabilities above, I was able to calculate the probability that each player would win a game when they were serving. More importantly, this allowed me to the calculate the probability they would lose a game on their serve, or have their serve broken.

To win the final set, the game would go on until (at least) a player lost a game on their serve, hence we could treat the game as a geometric distribution, meaning we could calculate the “Expectation” for games lost.

For Anderson, the expectation is 25, that means, you would expect him to lose one game for every twenty-five he plays.  The number is a little lower for Isner: we would expect him to lose one game in every eleven.

What this means is is that we can expect long final sets, unless the pragmatic decision is made to revert to allowing tie-breaks in fifth and final sets. As players continue to improve, and the advantage of serve continues to increase, the sport’s administrators will have to grapple with this conundrum, or look forward to future marathons as the sport descends into a war of attrition.


If you are interested in the formula I derived to calculate the expectation, you can see it below. p is the probability of a player winning a point on their own serve.

Expectation to lose a game on serve. p is the probability of winning a point on serve. The formula gives the average number of games in which one game would be lost (the others won) For example, Kevin Anderson has a probability of 0.7661 winning a point on his serve. The formula gives an expectation of 25 (rounded to the nearest whole number). This means we would expect him to lose one game in every twenty five he plays. (Note: it does not mean we would expect him to win the first twenty four then lose the twenty fifth, but in a series of twenty five games, he would lose one game.)

Posted in Probability | Tagged , | 2 Responses

Oh, what a night

A Maths Teacher Celebrates

or why football remains the most popular and exciting sport

Oh, what a night. It had drama, heroes and villains, and, for once, the tears shed at the end of game were tears of joy. On a night of pure theatre, England beat Colombia in a penalty shoot out to proceed to the quarter finals of the World Cup.

A nation rejoiced and when, perhaps still a little bleary eyed, it woke realising it wasn’t just a dream, the feel good factor across the land was palpable. Workmates chatted amiably, neighbors conversed happily, strangers smiled as they past each other; everyone was happy.

Everyone, that is, except for one man. One grumpy old man, writing in The Guardian.

I have no objection to Simon Jenkins expressing his opinion in the paper, but unfortunately for him, maths blows away all his arguments.

He objects to is games being decided by penalty shoot outs and in this article he calls on FIFA to make the goalposts bigger which will mean more goals, and therefore less draws.

And if you want the “best” team to always win, then he has a point.

But do we always want the best team to win? No, that is the beauty of football. Football is the least predictable of sports – by that I mean that the favourites win less often than in any other sport and that is what gives the game its drama, that’s why millions watch it.

Increasing the distance between goal posts would mean more goals, and more goals (or points in other sports) means more predictability, and more predictability means less drama, less excitement. It can be shown mathematically, (more goals/points = more predictability), but it can also be deduced intuitively as well.

Imagine that I take to the court to play Andy Murray. On any given point I may, just may, win the point but in the long term he is going to win (many) more points than me, so as the more points are scored the more likely he is to emerge victorious.

Suppose (and we really are stretching the bounds of reality here) I win one point in every ten (meaning Andy wins nine in ten). If we play a single point match, the probability of me winning is one in ten, i.e. if we played ten matches I could expect to win one of those games, or if we played 250 one point matches I would win 25 of them.

If we now play a two point match (first to two) then the probability of me winning a match is 0.028, or I would expect to win 7 out of 250 matches.  More points, more predictable.  By the time we are playing a three point match my chances of success start to become vanishingly small. (If you don’t believe me, sketch out a tree diagram, plug in the numbers and do the sums)

In his article, Simon Jenkins says:

At root, the trouble is soccer’s notorious inability to deliver scoring opportunities …. So far, only 16 out of the first 56 matches in the current World Cup have been decided by more than a single goal. The contrast with free-scoring rugby, cricket and tennis is stark.

As a neutral, we want games to be decided by a single goal – it keeps it exciting until the very end.

Rugby Union has become so free scoring it has a problem. International matches now see twice as many points scored as they did thirty years ago.

On the one hand this is a positive – we all want to see points (or goals) being scored, but the dominance of the favourites is detracting from the excitement of the spectacle.

The gulf in class between the best and weakest teams in football World Cup is vast, but there is always the chance of an upset. In Rugby Union, it is a major shock when even the world’s second or third ranked nation beats the All Blacks. Great if you are a New Zealander, but not much fun for the rest of the world.

To finish, I would like to look at the article’s last line:

Or we can always watch the tennis, whose scoring system is close to perfect.

It depends what you mean by perfect. Yes, it creates great drama, plenty of decisive moments, but if you asked a statistician to devise a system to determine the best tennis player, they wouldn’t come up with the current scoring system.

But sport isn’t about the perfect ranking system, its about drama, despair and hope. I think football has got it pretty much right.


Posted in Probability | Tagged , | 1 Response

Its all getting quite exciting

Russia 2018 – its turning out to be a vintage World Cup.

We’ve had shocks, upset and – at the time of writing – England are still in it (with a supposedly “easy” route to the final. Half of me dares to hope, the other half fears we will be dumped out tomorrow night by the Colombians.)

It is also the the tournament when stats and data have come of age, and are readily available to those of us who find the numbers (almost) as entrancing as the fancy footwork.

As I did at the end of Euro 2016, I will “Rank the rankings”: compare the finishing positions with the FIFA world rankings. There was no significant correlation in the Euros, and with only four of the top ten still in the competition, it doesn’t look to good this time round, either.

But that must wait until the conclusion of the competition. Today, I wanted to share some great work by John W. Miller who has produced the box plots and scatter graph on this page. You can read his full blog post here where he will also talk you through how to get your hands on the data he used.

Some great charts that should spark some interesting discussions in the classroom.

Distribution of Player Height sorted by Country

Height vs. weight by player position

Large Data Sets are the current buzzword in A level maths, and it is wise to ensure that your students are familiar with the data set provided by the exam board you are following. However, here is another large data set containing (nearly) 40,000 international football results from 1872 to 2018: to my mind far more interesting than the transport arrangements of unitary authorities.

And here are the World Football Elo Ratings – I must confess, I’m not (yet) 100% sure how an Elo rating is calculated, but I shall try and find out.

So, whether it ends in cheers or (more likely) tears, when the competition is over, we can fill the void by exploring all the wonderful stats and data the beautiful game generates.

(And thanks again to John W. Miller for allowing me to use his images. If you, or your students, have a penchant for data and computer science, his blog is well worth a visit.)

Posted in Handling Data, Large Data Sets | Tagged , | Leave a comment