The statistics of a squeaky bum

21 May 2015

Football, for a fan, often feels like a week of torture followed by 90 minutes of hell but never more so than when your team is involved in the play-offs. This Bank Holiday weekend sees the League 2, League 1 and Championship play-off finals take place at Wembley, prompting Sir Alex Ferguson to announce that it’s “squeaky bum time” and football commentators around the country to confidently proclaim that “it all boils down to this,” where ‘this’ is a tumbling maelstrom of apprehension; a cacophony of swirling butterflies; a mind electrified, distracted, unable to hold onto any thought that does not involve ‘this’. With only two days to go, with nerves drawing closer to a Himalayan precipice, the fan is liable to search for straws ever more frantically and clutch at them ever more possessively, taking solace in the company of similarly-afflicted supporters of the same team and football’s blurred lines of fact and myth.

The Agony so far

The play-offs occur at the end of the regular season and see the top four teams that have finished outside the automatic promotion places frenetically play a mini-knock out competition at the end of which the winning team is promoted to the league above. The tables below show the teams that are going through the agony this year.

Final league tables for the 2014/15 season. The teams highlighted in green were promoted automatically; the teams highlighted in blue must take part in the play-offs for their league, the winner of which is promoted into the league above.

To avoid confusion between leagues (League 2 has the top three teams going up automatically into League 1; as opposed to the Championship and League 1, which have the top two), we’ll ignore from now on the teams that were automatically promoted and label one to four the teams that went into the play-offs, with first being the position immediately below automatic promotion (Norwich, Preston and Wycombe) and fourth the team that just sneaked into the end of season bonanza (Ipswich, Chesterfield and Plymouth). Two teams from each league have already been knocked out in the semi-finals, and so this weekend will have Norwich — Middlesborough, Preston — Swindon and Wycombe — Southend as the final for each league.

We finished first, so we should win!

Fans of Norwich, Preston and Wycombe, all of whom finished just outside the automatic promotion places in their respective leagues, may be hoping that their superior position over the course of the season will give them an advantage when the hour comes. The statistics for the Championship to League 2 from the 1999/2000 to 2013/14 season, seem to bear out their hopes: as shown below, out of the 45 teams promoted during this time, 17 had finished first, 10 second, nine third and nine fourth (see above for our definition of first to fourth).

Before jumping to conclusions, however, we should measure if this distribution is statistically different to what would be expected if the final position was completely irrelevant, i.e. all four teams had equal probability of success (25%). What is known as a chi-square test will do this for us by computing a p-value: in this case, if p is less than 0.05 then we can say that the two distributions are different; otherwise, we cannot. Our p-value is 0.26: over the past fifteen years, there is no statistically significant difference between our distribution and a flat distribution with every position getting promoted $45 \div 4 = 11.25$ times.

Histogram showing the number of times a team finishing first, second, third or fourth has been promoted. Statistically speaking, there is no difference between this bar plot and one where all the columns are the same height.

With fans of Middlesborough, Swindon and Southend no longer reaching for tissues to mop sweaty brows, they may wish to consider the fact that this result is largely because finishing second, third or fourth has virtually no impact on the chances of getting promoted. If we re-run the chi-square goodness of fit test but this time merge the teams who finished in these positions together (so that 28 teams who did not finish first were promoted) we obtain a statistically significant p-value of 0.048: you are more likely to get promoted if you finished first than if you did not. Perhaps the best example of this is Lincoln City, who found themselves in the League 2 play-offs five years in a row between 2003 and 2007, finishing in every position except the one below the automatic promotion places: although they got to the final twice, they never won.

Histogram showing how many teams who finished first, second, third or fourth got to the play-off final. Interestingly — at least to a football fan — all three play-off finals this year are between teams who finished in the top two positions outside the automatic promotion places, a feat that was last achieved in the 1999/2000 season when the finals were contested between Ipswich and Barnsley (in the then Division One); Gillingham and Wigan (Division Two); Darlington (no longer in existence) and Peterborough (Division Three)

But our form is pretty good!

If league position isn’t a source of comfort for the fan mulling the future over a few pints, perhaps form going into the play-offs will. Every since the 2003/04 season in the First Division (now Championship) — when Crystal Palace rocketed up from being twentieth in the table in November, to getting into and winning the play-offs following a phenomenal run of form where they won 16 out of 23 games in the new year — commentators (in their wisdom) have been promoting the conquering power of “form” and “being on a roll”. Perhaps, therefore, there is something to be said for meticulously studying the form tables.

Trawling the Internet, one can find the results over a whole season for each team that has ended up in the play-offs since 1999/2000 (and possibly even further back to when the play-offs were invented in 1987): a data set was created containing the ten results obtained by each team (of which there were 180) leading up to the play-offs. From this data set, we compute the total number of points gained by each team over these ten games and then define a points difference in order to ensure that we have some way of taking into account the form of the teams you are playing against: a team that has garnered thirty points over the ten games playing a team that has taken 25 is probably not the same as a team that has also got 30 playing one that has 12.

Team	Position	Total Points	Points Difference
Darlington	1	12	0
Peterborough	2	16	4
Barnet	3	18	6
Hartlepool	4	17	5

Table showing the teams competing in the then Division Three (now League 2) play-offs in the 1999/2000 season. The total points they obtained over the last ten games of the season are shown in the third column, with the points difference we have defined shown in the fourth. Note that the team’s position (second column) is based on the number of points obtained throughout the season (not shown). Who won the play-offs? PeterboroughThe table best shows how we have done this. We first look for the least number of points obtained over the course of the ten games (in this case Darlington with 12) and subtract this from the ‘Total Points’ of each team. So the ‘Points Difference’ for Darlington is $12 – 12 = 0$; that of Peterborough is $16 – 12 = 4$; and so on.

Relative frequency distribution of teams finishing with the points difference on the $x$-axis winning (bars shaded in green) or losing (bars shaded in red) the play-offs. The relative frequency distribution is obtained by dividing the frequency of each points difference by the sum of all the frequencies.

The bar plot above has the relative frequency distributions for each points difference of both the 45 teams that won the play-offs (green) and the 135 that didn’t (red). Note that there is probably no need to get excited about the tall column appearing at 0, a points difference of zero will appear for each league for each year as there will always be at least one worst team; however, there isn’t always a team with a points difference of 1, or 2, etc. However, we do note that out of all the data the teams with the best form compared to their rivals were all unsuccessful in the play-offs (the red bars on the far right hand side). Despite this, the mean points difference of the teams that won the the play-offs ($\mu_{\text{win}} = 4.5$) is higher than that of those who did not ($\mu_{\text{lost}} = 3.7$), although the fan supporting the team with the better form may have realised by now the dangers of grabbing too quickly at straws: using again a $p$-value to see whether this difference is statistically significant (by this time applying what is known as a t-test) leads to the conclusion that there is no statistical difference between the two, i.e. it may be due to pure chance.

But we won 7 out of 10!

If we ignore the points difference and look only at the total number of points gained over the ten games before the play-offs, we can plot a histogram of how the teams with the best form and worst form did. This is shown in the figure below, where the teams with the best form are represented by the colour green and the teams with the worst by the colour red. Plotting the relative frequency and comparing the two, we see that a greater proportion of teams with the best form won the play-offs, a greater proportion of teams with the worst form lost in the play-off final and marginally more teams with the worst form than the best were knocked out in the semi-finals. However, there isn’t much difference between the two.

Relative frequency of outcome of play-offs (win, lose in the final or get knocked out in the semi-finals) for the teams with the best form (green shaded column) and worst form (red shaded column) going into the play-offs. Relative frequency is again the frequency of each occurrence divided by the total frequency

Running again the chi-square test of before on each set of data in turn to see if there is a significant difference between the distributions observed and an equal chance of each of the three scenarios occurring (25%, 25% and 50%, as two teams will be knocked out in the semi-finals) indicates that there isn’t: the chances of you winning the play-offs because you have the best total number of points seem to be statistically no different to just spinning a roulette wheel.

Get me another drink!

Does form really have anything to do with play-off success then? In short, no. We construct what is known as a logit regression model using the points difference as a variable and winning or not winning as the outcome. To see if our points difference variable can predict success we work out its p-value: in this case 0.205; not statistically significant. In fact, we might as well just randomly guess! The story is even worse if we consider only the last five or three games before the play-offs, with p-values getting even larger. Perhaps this is because as teams guarantee their play-off spot, they begin to rest their most important players in anticipation of the trials to come; this analysis suggests that there is no harm in doing so.

Football offers us a wealth of data to analyse, which is a joy to statisticians but often a bane for commentators and journalists. For every Crystal Palace in 2004 (22 points, +3 points difference compared to the team with the worst form), there’s a Rochdale in 2008 (24 points, +15 points difference) that did not get promoted. Or, indeed, there’s the Crystal Palace of 2013 vintage that won the play-offs despite only getting eight points from their last ten games (11 less than Brighton & Hove Albion, who went in with the best form but were knocked out in the semi-finals). There’s much more we could do, such as incorporating the number of points gained over the ten games into the model to see if it improves, or perhaps the number of points won over the entire season, or maybe seeing what happens the next season to teams who fail in the play-offs… The possibilities to find something that will make the football fan sleep marginally more easily are endless.

For now, though, I’m ready to sit back, turn on the television and let Alan Parry tell me that “it all boils down to this”…

Pietro would like to thank Rafael Prieto Curiel for making sure that his claims were not as far-fetched as some heard during football commentaries. You can download the dataset used here and please send us your discoveries or let us know if you have extended it back to 1987. For the record, Pietro is most relieved that Barnet won the Conference title and therefore did not have to add to his already large collection of white hairs.

Pietro Servini

Pietro is interested in history and sport. He also happens to be doing a PhD in fluid dynamics at UCL. If he can combine any two of the three it makes him a happy man.

The mathematics of human migration
Human migration with mathematical models, data and a hands-on experiment!
Florence Nightingale, statistician
What is the real story behind the lady with the lamp?
To share, or not to share
A tragic love story of shares and viral songs. To share, or not to share...
Should you buy a Valentine’s day present?
For those of you tackling this dilemma, here's your answer...
Advent facts III
Behind today's door... More fascinating facts!
Advent facts II
Santa's sack of scientific surprises