Monday, September 25, 2006

Double result?

An update to my forecasting/econometric model on attendances at Oldham Athletic matches to include all games up to the Gillingham game on Saturday just gone gave a model looking something like:

log(Att) = 5.593 + 0.081 log(Att)_{t-1} - 0.098 log(Distance)_t - 0.396 LeagueCup
- 0.569 LDVVans + 0.013 LeagueForm + 0.097 ClubSize + 0.446 LLaticsAveAtt
- 0.171 Tue -0.380 FridayGame - 0.100 Feb + 0.022 PlayersOut + 0.015 LoansIn
+ 0.031 LoansOut -0.464 LTicketprices + 0.793 Grimsby + 0.436 Torquay
+ 0.587 Playoffs + 0.200 OnTV - 0.269 DEarlySample + 0.267 BoxingDay

Using it to predict the attendance at the Gillingham game gave a figure of 4602.

Not only did Oldham slay Gillingham 4-1 to my great enjoyment, the attendance turned out to be 4652, just 50 out from my forecast! A vast improvement on the previous home game against Scunthorpe, where the model was 600 out.

However, it is exactly the same model, simply updated to include recent matches. Those recent matches have seen a marked upturn in form. Having scored one goal in the first five league matches, Oldham have scored 12 in the second five league matches. Having occupied 23rd place (only off the bottom of the table due to Rotherham being deducted 10 points) after five games, they now occupy 8th place.

It is possible that my model doesn't capture downturns in form quite as well as it captures normal form.

It is also possible that the PcGets algorithm used to select the model isn't as effective as it might be, because I've asked it to do something it's not designed for - choosing between various measures of the same thing. Hendry (2000 - Econometrics, Alchemy or Science, Epilogue) explicitly talks about how this is not what PcGets is designed for, and will likely not produce the best model as it will have to choose between a large number of competing models as variables measuring the same thing will likely be very collinear.

That's the subject of further work.

In the meantime, tomorrow night Oldham take on Rotherham United at Boundary Park. The prediction? 5074. Which sounds a bit too high really, given it's a mere 3 days since the last home game. However, a variable I included in the original model that counts the number of days since the previous home game, is very insignificant, and has a very small coefficient. Hence including that variable despite its insignificance, will be of little help. In fact it will be a hindrance, as the model with this variable included predicts an attendance of 5137!

On the other hand, adapting the Intercept Correction theory of Hendry and Clements (2001) to cover simply the recent bad forecast of the game against Scunthorpe (the 600 out game) and not the last observation, or the most recent observations, produces a forecast of 4524, which appears wholly more satisfying.

Yes, Oldham are in fantastic form, thrashing previously in-form Gillingham 4-1 on Saturday. But it's a Tuesday. But Oldham fans are famous sceptics and they know what will happen tonight: Rotherham, who are clearly a bogey side of ours, will turn us over. And despite my insignificant variable, I know that fans will not be able to turn up because they turned up on Saturday and can't afford both games.

So: 4524.

0 Comments:

Post a Comment

<< Home