Simple Probability Model for Predicting Sports Results

In a Machine Learning meetup a few years back, we were talking about applying probability to sports. The question was that football has a small amount of goals which makes it hard for prediction. On the other hand, sports like Handball or Basketball have a large amount of goals, letting the errors cancel each other.

I’ve already read Nate Silver’s book The Signal and the Noise, but I want to do the simplest model for prediction.


Erik Prince Blackwater

Buy The Singal and the Noise.

Watching my girlfriend playing handball, I thought that I can predict — and give a confident interval of — the final score using the information given at the minute 20. I tried a binomial model, wait to the end of the game and it turns out to be quite good.

Here, I’m applying the model to two Spur’s matches for two reasons: I don’t want to cheat using favorable data; and the data from my gf’s match is not available on the internet.

The Model

In a Basketball game, you have 4 times of 12 minutes each (NBA). Given the information on the first time, I want to predict the final result. For instance, suppose team 1 made x1 goals in the first quarter. Then, the probability of making a goal in a given second is: p1 = x1/720, because 720=12*60.

In a complete game you have 2880 seconds, therfore, the final score follows a binomial distribution with parameters p = p1 and n = 2880.

I will demonstrate with two examples using two of the Spurs’ games in 2014.

1. Spurs vs Wolves

This game finished with 121 points made by Spurs and 92 made by Wolves. In the first quarter, the result was 29 to 26. Using this data, I’ve computed the distrbution of final results.


Spurs vs Wolves

Spurs vs Wolves

As can be seen, it was very clear that Spurs were going to win the game. Actually, the probability of winning the game was 78.3%. The expected result was 116 to 104, but the actual results was in the 95% interval for both teams.

2. Nets vs Spurs

This game was more interesting. The final result was 87 to the Nets and 99 to the Spurs while in the first quarter, it was 18 to 25 respectively.


Nets vs Spurs

Nets vs Spurs

The probability for the Nets to win this game was as low as 1.7%

The expected result was 100 for the Spurs, which is almost the actual result, telling that they behaved the whole match as the first quarter. On the contrary, the expected result for the Nets was 72, much lower than the actual result. Even more, the probability of getting a result at least as big as the actual result is 4.6%. This may imply that something in the final score is different to the first quarter.

Actually, the Nets’ full score for the quarters was 18, 18, 28 and 23. Clearly, something changed between the second and third quarter, and I want to know what.

Handball

For handball, the number of goals per match is lower than for basketball, so I found that using minutes instead of seconds gives good results. In the first third of the match (minute 20 of 60), the score was 10 to 6 to the Home and Away. With this data, we compute the binomial distribution.


San Miguel vs Argentinos

San Miguel vs Argentinos

As can be seen from the final result, with the information given in the first third, it was quite polarized.

The probability of the away team to win was less than 1%. The expected result was: 30 and 18 while the actual result was 30 and 21. The prediction for the home team was correct, and the probability for the away team to have the actual result (or higher) was 25%. This means that they did a great job to increase their chances.

I was happy with this model and also happy of watching Melisa making goals, moving the bell to the right.

Cover Image by Nazionale Calcio

Javier Burroni

About Javier Burroni

Data Science Consultant. Actuary, Msc Economics (candidate), future PhD student of Computer Science - Knowledge and Data Discovery "Entropy is what kills"

Comments