Six Nations Stats

Back in the summer, we were all struck with World Cup fever and, in this post, I shared some stats and charts looking at the heights of players in the tournament.

In a few days time the Six Nations rugby tournament kicks off for another year, so I thought it appropriate to have a look at the stats of those involved.

Above you can see a box plot illustrating the weights of the six squads.

[In a box plot, the line through the box is the median – or middle – value: half the players are heavier than this value, half lighter; the top of the box the upper quartile – 75% of data values (in this case, player weights) are below this line; the bottom the lower quartile – 25% of data values are above this line; the upper and whiskers  lower whiskers denote data values that fall outside the middle 50%.]

The plot above suggests to me that England have the heaviest squad, France the lightest, and also the team with the greatest range of weights.

Another way to look at the weight of the squads is using density plot, below:

Having plotted the above, it looked like there were a couple of peaks in some of the distributions, particularly noticeable in the England, Ireland and Scotland squads.  If we took a random selection of the population we world expect to see a much more smooth “normal” distribution, or bell curve.

But a rugby squad is not normal! I suspected that the weights of those who play in the scrum would be more than those in the backs. So I let the data do the talking, comparing all players taking part in the tournament by position: scrum or backs (I combined the data from all nations as to do this on a country by country basis would result in sample sizes that were too small)

The individual distributions for the backs and scrum are fairly “normal”, but it is clear that backs tend to be lighter than those who ply their trade up front in the scrum.

So what about player height? Below you can see some charts that map this data.

I also had a look at the age of each squad:

I’m not sure if any of the above can help predict the eventual tournament winners, but you might find it interesting reading as we eagerly await Friday night’s kick off.

(Data source)

Posted in Handling Data | Tagged , | Leave a comment

The Ten Year Challenge

The “10 Year Challenge” is the current rage on social media, whereby you are encouraged to post a picture of yourself from 10 years ago, alongside one from now, to see how you’ve aged.

I thought it would be interesting to see how the last ten years have been for the nation’s football teams.

I found their league position at the end of the 2008/09 season and compared it to their current* league position.

*before kick off, Saturday 19th January 2019.

I’ve plotted the results above – those above the blue diagonal line are in a better place now than ten years ago, those below are now lower down the football league structure. The further from the line the better (or worse) a team has performed over those years.

I’ve reversed the direction of the x and y axis – it seemed more intuitive that way: the higher up the y axis, the higher your current league position (although a lower number e.g. Liverpool, who top the table on 19th January 2019, have a ranking of 1).

Winners and losers:

Bournemouth are the success story, a full 77 places above where they were ten years ago. Brighton are 47 places better off, all of which must be somewhat galling to their south coast cousins, Portsmouth who, along with Sunderland, have had the toughest ten years, slipping 31 places down the league ladder.

Same old, same old:

Four teams, Newcastle Utd., QPR, Sheffield Utd., and West Ham are in exactly the same place now, as they were ten years ago.

Gone, but not forgotten:

The observant amongst you will have spotted that there are not 92 clubs on the plot above. In that time ten clubs have been relugated from the league:

Aldershot Town
Chester City
Dagenham & Redbridge
Hartlepool United
Hereford United
Leyton Orient
Stockport County

To be replaced by:

Burton Albion
Forest Green
Mansfield Town
Oxford Utd

As ever, statistics only tell us so much, and end up posing more questions than they answer. What is the secret Bournemouth, Brighton, Luton & Rotherham’s success? I think it would be interesting to see if there is any correlation between length of tenure/number of managers and how well a team has fared? And if so, which is the cause and which the causation? Do managers stay because the team is successful, or is a team successful over the long term because the manager stays?  Does geography have a part to play? What happens if we look at trends over twenty years, not ten?

So how has your team done over the last ten years?

Below are some screen shots of the spreadsheet I used in my calculations. The number is the amount of places they have moved, positive is good, a negative number is how many places they have dropped.



Posted in Handling Data | Tagged | Leave a comment

GDP ranking by country

I recently discovered the video above that ranks the top ten countries by GDP from 1960 to 2017.

It is quite mesmeric watching it (I am reminded of the great Hans Rosling and how he presented data) and got me wondering: is it Economics, History or Statistics?  (It is, of course, all three)

And, as I’ve said before, good statistics always prompt more questions than they answer. The first of which may be:

What about GDP per capita?

Well here’s a video that answers that question:

… which prompts the question:

so what happened to Monaco in 2012 and Lichtenstein in 2016?

… for which I don’t (yet) have an answer.

Whatever you take from the above videos, however you use them, I hope you enjoyed them, and I hope they’ve raised some questions of your own.

The videos were made by WawamuStas with the original data coming from the World Bank, a great source of Large Data Sets

Posted in Handling Data, Large Data Sets | Leave a comment

A tough question

A student asked me a difficult question the other day.

I’m normally pretty confident with my subject knowledge, and am rarely stumped when quizzed out of the blue. Sometimes a tricky question from Further Maths, or a more esoteric A level problem may leave me scratching my head for a minute or two. Worst case scenario, I may need to ponder the problem for ten minutes in the calm, peace and quite of break time or lunch time, when I can focus on it without distraction, but, typically, I’ll get there in the end and give the pupil the answer they were seeking.

But not this time. As soon as the question was asked, I knew I could not give a definitive answer.

I tried flanneling and digressing, diverting and avoiding, but this Year 11 student was having none of it (perhaps a future career as a “Paxman” on Newsnight beckons?)

In the end I hand to come clean, I had to give an answer, so I did, but I still feel uneasy about it as I’m not sure I’d give the same answer today, as I did then (but I might  do.)

So what was this question that floored me?

Sir, what is your best, ever, music track?

And you can’t answer that question as it constantly changes (but if you want to know what answer I gave when my resistance crumbled, then keep reading.)

I was reminded of this exchange as I’ve just seen my Spotify data for the year.

We live in the age of Big Data and understanding this, how its used and how it shapes our lives is an important lesson for us all to learn.

Fortunately, I love data and I didn’t just stop with what Spotify told me in their glossy end of year review of me, and my listening habits.

I was able to work out that it costs me less than a penny a minute to listen to Spotify, over the year I spent about 50p an hour listening to my music through their streaming site.

Good value? I think it is, I love my music and having so much on tap makes that a price I’m happy to pay (and, as I’m on a family membership, the cost per hour for all four of us in the household is significantly less.)

But the important thing is is that I was able to calculate that cost, and then decide if it was good value for money for me. Many of your students will have received a similar review from Spotify – why not get them to calculate what it costs them (or, more likely, their parents) for each hour they use the service?  The maths is pretty simple, but the process and analysis is so important. I suspect that if I did a similar calculation for my gym membership it may not be such good value for money. Netflix – how much do a pay for each hour I watch?

Still with me? That’s probably because you want to know my “favourite track”.

Well this are my favourites based on my Spotify listening:

But how did I answer the question?

Well, the band in question – The Jam – is in the list above, but not the song.

So what is my all time favourite track? With the caveat that it changes, I can reveal it as “Thick as Thieves” by The Jam.

Posted in Handling Data, Large Data Sets | Leave a comment

Deaths due to terrorism in the UK

I came across the graph above and I was immediately struck by the stories it tells by forcing you, the reader, to ask the obvious questions.

Clearly something happened in the 1990’s.

The peace process was begun in Northern Ireland, culminating in the Good Friday Agreement of 1998. Surely this graph alone is enough to convince anyone of the importance and historical significance of the Good Friday Agreement? Why would anyone do anything, anything, to jeopardise its continued success? If anyone should need any convincing that we shouldn’t, we mustn’t, return to a hard border on the island of Ireland, then surely this graph must be all it takes.

86% of the deaths between 1970 and 1990 were in Northern Ireland

1988 – includes 271 deaths due to the Lockerbie bombing, when Pan Am flight 103 from Frankfurt to Detroit, via London & New York, was destroyed in the air over the Scottish town of Lockerbie by a terrorist bomb.

2005? The tragedy of the London bombings, or 7/7

A simple, sobering graph, but one that deserves – demands – to be viewed.

Posted in Handling Data | Leave a comment