An ‘L’ of a chance

If you look at the Premier League table this evening you will see that Liverpool sit proudly atop the football pyramid.

No great surprise there, you may think.

But then see who tops the Championship tonight: Leeds United.

League One? Luton Town, and League Two? Yes, you’ve guessed it, Lincoln City.

So all four of the top leagues are crowned by a team beginning with the letter L. All the more surprising as the only other team in the 92 that begin with L are Leicester City (who, coincidentally, play Liverpool this evening.)

But gets even better – if we go down to the next tier, The National League, we find that Leyton Orient lead that divsion!

What are the chances of that – having a team beginning with L at the of the top all first five leagues in English football?

Less than one in three million, I reckon.

(One in 3,317,760 to be precise)

So how did I arrive at this answer? Here’s how:

I assumed each side had the same chance of topping their table.  With two out of twenty teams in the Premier League beginning with L, the probability of one of those teams topping the table is 2/20, which simpilfies to 1/10. By no means a certainty, but not improbable, either.

The Championship, League One, League Two and National League all have twenty four teams, with only one beginning with an L. So the probability is 1/24 that an L will top, say, the Championship.

But for all five leagues to be topped by a team beginning with L, we need The Prem, and The Championship, and League One, League Two and National League to all have L’s at the top, so we need to multiply those probabilities together:

1/10 x 1/24 x 1/24 x 1/24 x 1/24 = 1/3,317,760


Not sure if its ever happened before, but take a moment to enjoy it whilst it lasts – not an everyday occurrence.

I know who I’ll be supporting between now and the end of the season!

Posted in Probability | Tagged , , | Leave a comment

Six Nations Stats

Back in the summer, we were all struck with World Cup fever and, in this post, I shared some stats and charts looking at the heights of players in the tournament.

In a few days time the Six Nations rugby tournament kicks off for another year, so I thought it appropriate to have a look at the stats of those involved.

Above you can see a box plot illustrating the weights of the six squads.

[In a box plot, the line through the box is the median – or middle – value: half the players are heavier than this value, half lighter; the top of the box the upper quartile – 75% of data values (in this case, player weights) are below this line; the bottom the lower quartile – 25% of data values are above this line; the upper and whiskers  lower whiskers denote data values that fall outside the middle 50%.]

The plot above suggests to me that England have the heaviest squad, France the lightest, and also the team with the greatest range of weights.

Another way to look at the weight of the squads is using density plot, below:

Having plotted the above, it looked like there were a couple of peaks in some of the distributions, particularly noticeable in the England, Ireland and Scotland squads.  If we took a random selection of the population we world expect to see a much more smooth “normal” distribution, or bell curve.

But a rugby squad is not normal! I suspected that the weights of those who play in the scrum would be more than those in the backs. So I let the data do the talking, comparing all players taking part in the tournament by position: scrum or backs (I combined the data from all nations as to do this on a country by country basis would result in sample sizes that were too small)

The individual distributions for the backs and scrum are fairly “normal”, but it is clear that backs tend to be lighter than those who ply their trade up front in the scrum.

So what about player height? Below you can see some charts that map this data.

I also had a look at the age of each squad:

I’m not sure if any of the above can help predict the eventual tournament winners, but you might find it interesting reading as we eagerly await Friday night’s kick off.

(Data source)

Posted in Handling Data | Tagged , | Leave a comment

The Ten Year Challenge

The “10 Year Challenge” is the current rage on social media, whereby you are encouraged to post a picture of yourself from 10 years ago, alongside one from now, to see how you’ve aged.

I thought it would be interesting to see how the last ten years have been for the nation’s football teams.

I found their league position at the end of the 2008/09 season and compared it to their current* league position.

*before kick off, Saturday 19th January 2019.

I’ve plotted the results above – those above the blue diagonal line are in a better place now than ten years ago, those below are now lower down the football league structure. The further from the line the better (or worse) a team has performed over those years.

I’ve reversed the direction of the x and y axis – it seemed more intuitive that way: the higher up the y axis, the higher your current league position (although a lower number e.g. Liverpool, who top the table on 19th January 2019, have a ranking of 1).

Winners and losers:

Bournemouth are the success story, a full 77 places above where they were ten years ago. Brighton are 47 places better off, all of which must be somewhat galling to their south coast cousins, Portsmouth who, along with Sunderland, have had the toughest ten years, slipping 31 places down the league ladder.

Same old, same old:

Four teams, Newcastle Utd., QPR, Sheffield Utd., and West Ham are in exactly the same place now, as they were ten years ago.

Gone, but not forgotten:

The observant amongst you will have spotted that there are not 92 clubs on the plot above. In that time ten clubs have been relugated from the league:

Aldershot Town
Chester City
Dagenham & Redbridge
Hartlepool United
Hereford United
Leyton Orient
Stockport County

To be replaced by:

Burton Albion
Forest Green
Mansfield Town
Oxford Utd

As ever, statistics only tell us so much, and end up posing more questions than they answer. What is the secret Bournemouth, Brighton, Luton & Rotherham’s success? I think it would be interesting to see if there is any correlation between length of tenure/number of managers and how well a team has fared? And if so, which is the cause and which the causation? Do managers stay because the team is successful, or is a team successful over the long term because the manager stays?  Does geography have a part to play? What happens if we look at trends over twenty years, not ten?

So how has your team done over the last ten years?

Below are some screen shots of the spreadsheet I used in my calculations. The number is the amount of places they have moved, positive is good, a negative number is how many places they have dropped.



Posted in Handling Data | Tagged | Leave a comment

GDP ranking by country

I recently discovered the video above that ranks the top ten countries by GDP from 1960 to 2017.

It is quite mesmeric watching it (I am reminded of the great Hans Rosling and how he presented data) and got me wondering: is it Economics, History or Statistics?  (It is, of course, all three)

And, as I’ve said before, good statistics always prompt more questions than they answer. The first of which may be:

What about GDP per capita?

Well here’s a video that answers that question:

… which prompts the question:

so what happened to Monaco in 2012 and Lichtenstein in 2016?

… for which I don’t (yet) have an answer.

Whatever you take from the above videos, however you use them, I hope you enjoyed them, and I hope they’ve raised some questions of your own.

The videos were made by WawamuStas with the original data coming from the World Bank, a great source of Large Data Sets

Posted in Handling Data, Large Data Sets | Leave a comment

A tough question

A student asked me a difficult question the other day.

I’m normally pretty confident with my subject knowledge, and am rarely stumped when quizzed out of the blue. Sometimes a tricky question from Further Maths, or a more esoteric A level problem may leave me scratching my head for a minute or two. Worst case scenario, I may need to ponder the problem for ten minutes in the calm, peace and quite of break time or lunch time, when I can focus on it without distraction, but, typically, I’ll get there in the end and give the pupil the answer they were seeking.

But not this time. As soon as the question was asked, I knew I could not give a definitive answer.

I tried flanneling and digressing, diverting and avoiding, but this Year 11 student was having none of it (perhaps a future career as a “Paxman” on Newsnight beckons?)

In the end I hand to come clean, I had to give an answer, so I did, but I still feel uneasy about it as I’m not sure I’d give the same answer today, as I did then (but I might  do.)

So what was this question that floored me?

Sir, what is your best, ever, music track?

And you can’t answer that question as it constantly changes (but if you want to know what answer I gave when my resistance crumbled, then keep reading.)

I was reminded of this exchange as I’ve just seen my Spotify data for the year.

We live in the age of Big Data and understanding this, how its used and how it shapes our lives is an important lesson for us all to learn.

Fortunately, I love data and I didn’t just stop with what Spotify told me in their glossy end of year review of me, and my listening habits.

I was able to work out that it costs me less than a penny a minute to listen to Spotify, over the year I spent about 50p an hour listening to my music through their streaming site.

Good value? I think it is, I love my music and having so much on tap makes that a price I’m happy to pay (and, as I’m on a family membership, the cost per hour for all four of us in the household is significantly less.)

But the important thing is is that I was able to calculate that cost, and then decide if it was good value for money for me. Many of your students will have received a similar review from Spotify – why not get them to calculate what it costs them (or, more likely, their parents) for each hour they use the service?  The maths is pretty simple, but the process and analysis is so important. I suspect that if I did a similar calculation for my gym membership it may not be such good value for money. Netflix – how much do a pay for each hour I watch?

Still with me? That’s probably because you want to know my “favourite track”.

Well this are my favourites based on my Spotify listening:

But how did I answer the question?

Well, the band in question – The Jam – is in the list above, but not the song.

So what is my all time favourite track? With the caveat that it changes, I can reveal it as “Thick as Thieves” by The Jam.

Posted in Handling Data, Large Data Sets | Leave a comment