Archive for the ‘Firmware Bugs’ Category

Cut And Paste Engineering

Thursday, September 9th, 2010 Mike Ficco

Several years ago I was involved in a project that expected to have a large production volume.  The development group was working with a few prototypes but the manufacturing team was not yet fully engaged.  Part of my work required a unique device serial number for security and other purposes.  Unfortunately, our prototypes had no serial numbers since they were not produced by the normal manufacturing process.  I needed a serial number so I came up with a relatively simple solution.  On power up I would read the area of non-volatile memory that was intended to hold the serial number and other information.  If the information passed a validity check I would write the serial number into another special area of non-volatile memory.  If the validity check failed I would instead write a fictitious serial number[1].  All other code made use of this “special memory” that I created and managed.  My immediate development problem was solved and all the code would automatically start using real serial numbers as soon as the equipment was being made on a production line.  Problem solved!

The product shipped and actually did get produced in high volume.

Fast-forward about eight years.

A coworker came into my office and asked if I could take a look at some old code.  We walked to his office where he brought up a page of code I had not seen for many years.  It was my power up serial number initialization function.  Well, more accurately, it was a descendent of my code.  After many years and several million devices, the code was still present.  The operating system had changed at least three times and several people – perhaps more than a dozen had their fingers in the surrounding code.  The details had changed and the content of the structure that was validity checked had changed.  Even the method of doing the validity check was different.  Yet there was my fake serial number and my privately managed memory.

The developer said he didn’t understand what the code accomplished or why it was needed.  He asked the person that previously worked on the code and he didn’t know either.  Some of the original project folks had left the company.  I had gone off to new products and problems.  Others had seen the code but did not know the rationale behind it.  Eventually my name came up as he continued to ask questions, so he thought he would come talk to me.

I quickly explained what the code was for and that it was no longer needed.  I also took the opportunity to congratulate him for being conscientious in wanting to get the code right and bold enough to ask about the code when so many who came before him had not.

This is a true story and you may or may not have enjoyed it.  The problem is such stories of tracking down the meaning of mysterious code are far too rare.  More often code proliferates and becomes progressively more convoluted as programmers are afraid to touch or delete what they don’t understand.

One very popular coding technique is to copy an existing piece of code that solves a problem similar to the one on which you are working.  Over time a large code base becomes fabricated from bits and pieces of old code.  This is outstanding in that it is something like code reuse.  It is beyond horrible in that such reuse is occasionally perverted into a bloated and unreliable mess.  It seems basic instinct for most programmers to allow poorly understood code to remain.  I have seen developers and managers too fearful, and I truly mean fearful, to remove bizarre code because it might be doing something worthwhile.

Last year I worked on a one-chip-wonder micro controller.  I inherited over 90K of buggy code that needed additional features.  Four months later I had 5K of reliable code that had all the needed features.  Not all of my projects result in a 1800% code reduction, but this basic scenario has played itself out over and over.  A great deal of my work finding and fixing bugs on legacy products has involved removing large amounts of code.

Let me leak out a well-kept secret:  If you want your code to be reliable, you have to understand what it does.

You are not a very good developer – at least not one confident in your ability – if you are afraid to touch some mysterious code for fear of breaking it.  Poke it!  Tweak it!  Test it!  Figure out what it does and determine if it does that correctly.  You or your manager may worry that this is wasted time, but my personal experience has been that mysterious and bloated code is often the cause of problems and takes forever to debug.

Well-understood code is not only shipped faster, but you can also ship it with pride.

[1] Our serial numbers were to be based on the production shift, production line, and the date and time of production.  I worked with the manufacturing group to create a serial number based on a non-existent production shift and line to guarantee my fictitious serial number would never be mistaken as genuine.