Monday, March 08, 2010

Nearing the end - Part 2

As I said in my previous post, we are nearing the end of a large project that has gone on for over a year. But last week, after finding (at the last minute) a problem in my code and finding out that it was really a problem with some code we bought, I regretfully agreed to just disable the part of my code that showed the problem. There wasn't enough time to fix the problem. it was just a convenience feature and wasn't critical to the operation of our program. So, as unsatisfying as it was to leave it that way, I seemed to be off the hook for any further problems before we shipped - until someone had a problem reading the settings for my part of the program that he'd saved before. My first thought was that it was just a temporary problem and that just retrying the operation would fix it. But just before leaving for the day, he replied that no, it failed every time. Oh, no! How could that be happening this late in the project? The saving and reading of settings had been working for months. Why would it start failing all of a sudden - and only on one machine? I went home that night dreading the thought of the next day.

I couldn't duplicate the problem on my computer. So, I had to copy all the data from the computer showing the problem. I still couldn't see the problem on my computer (where I could run the program inside the development system and analyze it). So I tried different types of settings to save and read. I still couldn't see the problem. The pressure was on and I would have to fix this - even if it took a lot of work and a big change to the code. This was turning into a nightmare. I was looking for anything that would cause any kind of problem that I could imagine. One thing I noticed was that I was asking the screen to update more often than it really needed to. So, in the hope that this was taking time away from a more important part of the program, I reduced the number of times it updated - and it failed for me like it did for the guy who reported the problem! Finally I had something to work with!

The nice thing about modern programming environments is that they let you put Breakpoints in a program. It's almost like when your driving from one place to another and decide to stop for gas somewhere and you can check the tire pressure there, too. Then you decide to stop for lunch somewhere and you can look at the brake fluid while you're stopped. Likewise, in a computer program, when you are running the program from inside the development program, stopping at a Breakpoint lets you look at the state of the program at that point. And even better, but more time consuming, is Single-Stepping. This lets you execute your program one line at a time. You can examine the state of the program just before or after each line executes. But I noticed that the program would get to a certain part of the code and then would veer off into the code that handles errors. So, I had to adjust my Breakpoints until I got closer and closer to the problem. Finally, I saw what it was - I had been expecting a certain string of characters to either be a number (a combination of the digits 0 - 9) or a special word "Updating". Well, in this case it was neither. It was just blank. Because I wasn't updating the screen as often, the string of characters hadn't been initialized correctly. So, that was it. The string didn't say "Updating" so I had assumed it was a number and tried to convert it as a number. Well, the code that did that didn't know how to handle a blank space and gave up. It was simple to fix but I had to verify it on the machine where it first appeared. Maybe this wasn't the problem after all.

I rushed to make a new version of the program that could be installed on the problem machine and ran it and it worked! The computer that had shown the problem was running with so many connections and so much data that the initialization code I wrote was running slowly and wasn't setting up the strings of characters to either "Updating" or a number that could be converted. The slow machine hadn't had time to change the strings from their default "nothing" state before being asked for the data.

So after a night of tossing and turning in bed imagining what the problem could be, I'd fixed it in a few hours. I was exhausted. I felt like I'd been in a fight. It took me the rest of the day to calm down. But it felt great to have the problem behind me. Here's hoping no more problems crop up before we can release this thing.

No comments: