A few days back I was doing a phone support call from a Web Connection customer who deals with some very large companies that are running their vertical software. He’s been a loyal customer for years and has had tremendous success as well as occasional problems with this installation that is a large scale application as they are really pushing the Web Connection framework.

 

So, anyway I got the call and we’re on the phone going over the issues and having a two-way going back and forth. It wasn’t immediately obvious but in the half hour or so we went back and forth we looked at any number of fairly advanced configuration options for the OS and the application. After eliminating a lot of problem choices that would be responsible in the end the answer I gave was – need more information. Take the application out of system context run it on the desktop and see what actually fails.  

 

Well, it turned out it was a really simple thing – a log file (that wasn't even supposed to be running at all) had gotten so large it was starting to fill up the disk of the machine it was running and erroring out. Since this is a FoxPro COM application, that was running in SYSTEM context without a logon the failure was not trappable. VFP has this nasty problem with local file errors not causing an error but a file open dialog which can hang a server. By making the app visible on the desktop it was easy to see what was happening.

 

This is something that would have been easy to catch by looking at some obvious signs on the machine (like disk space ). The actual failure was not getting recorded because the usual error mechanism which logs errors was also effected by the lack of disk space so rather than the usual error handling that kicks in the error handler itself failed. Hence the mysterious failure and lock up. The app runs against SQL Server, so generally it doesn't even bother writing out files locally at all and there's an option to log to SQL Server as well. Except somehow the switch got thrown to turn logging on and to log locally.

 

Anyway, we basically both fell for this trap for looking at the hard stuff first before looking at the obvious. It's quite common. I spend a fair amount of time debugging customer support issues and I’d say easily 90% of the time it’s a very simple problem of overlooking one setting or obvious detail. It gets tough to find that from my end often because I don’t sit in front of the computer broken, but even so I’m just as guilty as the next often forgetting about the obvious.

 

You know always look at the obvious things first! The simple things more often than not are the culprit – an obvious omission.

 

I found out the hard way a couple of weeks ago that this is true in general. I went on a car shuttle mountain bike ride near Mount Hood with a couple of friends. We drove up into the middle of nowhere in the wilderness parked one car and drove the other – mine - further up into the wilderness. Started this 4 hour monster ride sloshing up in occasionally knee deep slow (walking mind you) and trying to find the trails that were still partially covered in snow. Although the ride is a huge downhill, there are still a lot of uphill parts and by the end of the ride all of us – most of all me on my first ride for the year after coming back from Maui - were cold and utterly exhausted at the bottom, 3,500 feet below what we started.

 

Ok, so now it’s back up mountain to get my car. It’s starting to get dark and up the mountain it’s freezing. We get there and – the car will not start. Cold, irritable and pissed off that the Volvo has to invoke Murphy's Law in the worst possible spot in the world to die at, we fumble with this or that – check for gas, spark etc. but only half hearted because it’s cold and miserable and about to get dark. So quickly we abandon the effort and just leave the car.

 

Next day I get the car towed (for $200 no less) and to my mechanic. The mechanic ends up with the car for 4 days trying to figure out what’s wrong exactly. It turns out several things failed but the key thing that didn’t work: A bad coil wire that if wiggled this way or that would have probably worked just fine. Oddly enough the mechanic also didn’t find this one right away – I drove off after the car was ‘fixed’ only to have it die making left turns. Try explaining that to a mechanic with a straight face. Yeah I turn left and it just quits. He looked at me strangely and didn't change the attitude until we test rode and died. Now that’s funny. Turn left and – click – the car dies. We get back to the garage and my mechanic looks at this and that and eventually wiggles the coil wire the wrong way and the car dies. Problem found.

 

Get it? The obvious is always staring us right in the face and a little intuition can get us there if only we weren’t so distracted by all the complicated things we know and want to apply to our problems . Ego gets in the way of the obvious.

 

Anyway, it was an odd realization as I was at the mechanic's, who has all this high tech stuff piled high only to find the real problem by wiggling a few wires and hammering on relays to see if the car will die. Kinda sounds like a programmer with all these tools only to miss the obvious boundary condition on a while loop, eh?

Simple debugging rules the day, both in code and in life.