Monday, September 25, 2006

Double result?

An update to my forecasting/econometric model on attendances at Oldham Athletic matches to include all games up to the Gillingham game on Saturday just gone gave a model looking something like:

log(Att) = 5.593 + 0.081 log(Att)_{t-1} - 0.098 log(Distance)_t - 0.396 LeagueCup
- 0.569 LDVVans + 0.013 LeagueForm + 0.097 ClubSize + 0.446 LLaticsAveAtt
- 0.171 Tue -0.380 FridayGame - 0.100 Feb + 0.022 PlayersOut + 0.015 LoansIn
+ 0.031 LoansOut -0.464 LTicketprices + 0.793 Grimsby + 0.436 Torquay
+ 0.587 Playoffs + 0.200 OnTV - 0.269 DEarlySample + 0.267 BoxingDay

Using it to predict the attendance at the Gillingham game gave a figure of 4602.

Not only did Oldham slay Gillingham 4-1 to my great enjoyment, the attendance turned out to be 4652, just 50 out from my forecast! A vast improvement on the previous home game against Scunthorpe, where the model was 600 out.

However, it is exactly the same model, simply updated to include recent matches. Those recent matches have seen a marked upturn in form. Having scored one goal in the first five league matches, Oldham have scored 12 in the second five league matches. Having occupied 23rd place (only off the bottom of the table due to Rotherham being deducted 10 points) after five games, they now occupy 8th place.

It is possible that my model doesn't capture downturns in form quite as well as it captures normal form.

It is also possible that the PcGets algorithm used to select the model isn't as effective as it might be, because I've asked it to do something it's not designed for - choosing between various measures of the same thing. Hendry (2000 - Econometrics, Alchemy or Science, Epilogue) explicitly talks about how this is not what PcGets is designed for, and will likely not produce the best model as it will have to choose between a large number of competing models as variables measuring the same thing will likely be very collinear.

That's the subject of further work.

In the meantime, tomorrow night Oldham take on Rotherham United at Boundary Park. The prediction? 5074. Which sounds a bit too high really, given it's a mere 3 days since the last home game. However, a variable I included in the original model that counts the number of days since the previous home game, is very insignificant, and has a very small coefficient. Hence including that variable despite its insignificance, will be of little help. In fact it will be a hindrance, as the model with this variable included predicts an attendance of 5137!

On the other hand, adapting the Intercept Correction theory of Hendry and Clements (2001) to cover simply the recent bad forecast of the game against Scunthorpe (the 600 out game) and not the last observation, or the most recent observations, produces a forecast of 4524, which appears wholly more satisfying.

Yes, Oldham are in fantastic form, thrashing previously in-form Gillingham 4-1 on Saturday. But it's a Tuesday. But Oldham fans are famous sceptics and they know what will happen tonight: Rotherham, who are clearly a bogey side of ours, will turn us over. And despite my insignificant variable, I know that fans will not be able to turn up because they turned up on Saturday and can't afford both games.

So: 4524.

Tuesday, September 12, 2006

Attendances again...

My model of Oldham Athletic attendances had its first test this last Saturday, and provided somewhat deflating results - a forecast that was almost 600 too high. The model suggested an attendance of 5399 (not trying to be precise about it), but the actual gate was 4812, just shy of 600 below what I predicted.

I'm not the only one to be a little deflated about numbers passing through the gates at Boundary Park in recent months and seasons. Simon Corney, the club's managing director, has spoke out recently, suggesting there is little more the club can do to attract fans.

My model doesn't appear to be capturing enough variation in attendances, which is a little disappointing. Potential variables to try include a variable for if Manchester City are playing, and maybe Manchester United too. However, I imagine City to be more of a rival for people thinking of attending Oldham matches, as their stadium is now very close to Boundary Park since they took over the City of Manchester stadium, plus I imagine you stand a small chance of getting into a Man City match at reasonably short notice, as opposed to Man U. And everyone knows, Mancunians don't support Man U anyway...

Suggestions for possible additional variables to consider will be gratefully recieved. Although it's fair to say, an error of 600 people was about as bad as any of the "training period" forecasts got in the paper itself, so maybe this was just a bad week. The next home game will shed more light.

The start to the season has been particularly poor, and one wonders whether poor early season form is more detrimental than mid- or late-season form, as people have little to go off when deciding whether or not to go to a match. If it's a mid-season dip in form, fans can look back to earlier season games to whet the appetite, but as yet there's little to do that. The two wins achieved have both been 1-0, three of the four defeats have been 1-0, with a 0-0 draw, and the other game an abberation - a 3-2 defeat at Bournemouth. However, goals in recent games is a variable in the model, yet it is insignificant. More work to be done...

Model Averaging and trying to get some theory behind my results

For the entirity of my Ph.D thus far, I've been dabbling with Model Averaging. It's helped me get to two conferences in nice places (Oslo and Santander), but it's generally always left me unsatisfied. I don't like rubbishing the work of others, and destroying a modelling/forecasting technique is much easier than proposing one.

However, model averaging is gaining acceptance as a modelling methodology as well as a forecasting methodology, and this is pretty concerning, if one wants to see econometrics telling us what the data tells us, and not what any particular econometrician would like to tell you. Even were model averaging to be disproved as a viable strategy however, it would not alter the fact that the Bible tells us this world is fallen, and as such, people will always try to manipulate and/or hide evidence to back their position up. So in this work I don't hold out some lofty ideal that all the ailments of econometrics will be solved, but I hope to make a positive contribution.

It's a concern because model averaging at its worst provides biased regression coefficients and can tell next to nothing about the effect any particular variable has on the variable/parameter of interest. This is because model averaging takes every possible subset of variables from a K variable dataset, and averages the results of these individual models. It's reasonable to suggest that within the set of models averaged over, there will be a "best" model. The "true" model might not be in there, but there will be a model which best captures the variation in the variable of interest, and is coherently specified with no autocorrelation, heteroskedasticity, and has normal residuals. But model averaging will give this "best" model a weight when it averages, and will give non-zero weights to very bad models.

One can see some sense in the arguments for model averaging: will we ever know which model this "best" model is? Won't there be a good few of these "good" models? Answer: yes. But one then needs to select this "best" models, and only average over these, and cut out the very bad models that one is bound to average over if one simply averages over every model in the space.

This is the crux of my work. Now, using Monte Carlo simulations, this is very easy to show. Even in very benign situations (i.e. perfectly nice datasets with none of the problems real-world datasets face), forecasting appears to be a bad idea using model averaging. But showing theoretically why this is the case is less straightforward, and a lot more messy. However, it's vital if people are to actually read anything I write on this and stick on the internet, and maybe hope to put towards my Ph.D.

Here's hoping...

Tuesday, September 05, 2006

To do with Oldham Athletic

Having slacked (well, done a Christian camp in North Wales, and then moved house) for a few weeks, I'm now back at work. Two areas are currently on my radar, and one I finished off to a very basic level of completion yesterday. It's now up on my website, and it's an econometric model of attendances at Oldham Athletic football matches. The second is an investigation of betting exchanges using high frequency data and econometric techniques used for financial data. The second is interesting because high volumes of money are placed on the outcomes of big sporting events on betting exchanges, and just as the stock market provides an approximation of the true value of a company, so a betting exchange provides an approximation to the true probability of an event occuring.

However, the latter is very tentative. The former, because I could just use OLS (coupled with PcGets), is very simple, and feels to me a lot like a school project. However, I liked it, and all comments are welcome. The pdf can be found here. The model will be used to predict future attendances at Oldham games, and it will be interesting to see how close the model gets, and whether improvements can be found. There is a huge amount of forecasting techniques available, but the best methods are surely those robust to the fact the world is constantly changing around us. It's too early to tell, but this current football season, attendances seem lower than they were last season at Oldham games. One way to potentially enable better forecasting is to employ a method of testing the model for the presence of a structural break in the last few observations. This is work yet to be done.

Attendances could well be lower because the end of the previous season at Oldham Athletic was very disappointing. A strong run of form had seen the team in contention for the end of season play-off competition to determine who gains the final promotion spot (the top two teams gain promotion automatically). However, the last seven games saw five defeats and two draws, to leave Oldham in a very deflating midtable position. Further to this, the supporters didn't particularly like the manager in charge, despite his previous good track record, and the fact he'd substantially improved the finishing position on the previous season. For a current update, this former manager, Ronnie Moore, has moved to Tranmere Rovers. Already, Tranmere have beaten Oldham this season, and Tranmere sit in 2nd place. Oldham, with four defeats in their opening six matches, occupy 22nd position. Too early to tell, but despite continual positive statements from manager and players, the team is not getting the results, and as such, a position even nearly as good as that attained last season is hard to imagine.

This is a variable that is hard to capture, particularly when actually it appears to make little difference to the attendance, if early games this season are a guide. Oldham Athletic fans appear to be footballing purists, and would like someone to get the team to play "good" football. Good football means passing the ball well, and creating goalscoring opportunities through patient (and pleasing on the eye) build-up play. It appears Ronnie Moore didn't encourage this enough in his time at Boundary Park. My view is the guy took the very unglamorous Rotherham United up two divisions, and made them competitive in the second tier of English football, when a club of their stature belongs in the fourth tier. He didn't do this by getting his team to play particularly pleasing on the eye football, but he gave Rotherham United fans three or so seasons of rubbing shoulders with some of the largest teams in this land. The kind of people who can do this do not grow on trees, yet because the brand of football wasn't pretty enough, he's been sacked, so he can instead get Tranmere Rovers promoted this season, as opposed to Oldham Athletic.

While in the meantime, Oldham begin to play nicer football, and lose games. An example of the mentality of Oldham fans can be found in a report on the new manager's comments after the most recent defeat, where a goal with virtually the last kick of the game condemned Oldham to a 3-2 defeat. John Sheridan, the manager, accepts blame for a poor first half performance, where he experimented with a 4-5-1 formation to try to nullify Bournemouth, the opposition. However, Oldham finished the first half 2-0 down. Apparently, Sheridan's honesty is "promising for the future", and "With a manager who is not afraid to hold up his hands when he makes a mistake, Athletic deserve that their luck should turn." I wonder since when has honesty in post-match interviews made the blindest bit of difference to where a team finishes in the league.

Stuart Pearce of Manchester City is very honest. Jose Mourinho of Chelsea would not be described as being honest, as Chelsea are without doubt, in his opinion, the best team in every match they take part in. However, Chelsea win things, Manchester City don't. I can see the argument that if the players have a manager they see to be genuine, decent and honest, they might be more motivated to play for him. But honesty doesn't substitute for managerial ability. Have Oldham sacked a good manager who perhaps doesn't accept the blame himself enough of the time, for a poor manager who is just a nice chap, who the fans and players like?