Posts Tagged ‘safety’

First, Do No Harm

Monday, September 17th, 2001 Michael Barr

Many of the folks I hang around with own digital watches. I have one myself. In addition to keeping fairly accurate time, mine features a stopwatch, a countdown timer, and a set of five alarms. A friend has a different watch from the same company—one that features a digital compass. To take a compass reading, you stand still, hold the watch level, press the “Compass” button, and read a cardinal point (N, NNE, NE, ENE, E, etc.) and angle in degrees. The reading tells you where the top of the watch is currently pointing.

Multi-function devices like these aren’t unique. Lots of products have multiple features. Today’s buzzword for this phenomenon is “convergence.” Staying only with the watch theme for a minute, there are GPS watches, watches with digital cameras, calculator watches, and full-featured PDAs for your wrist. There are also watches that double as heart rate monitors, pagers, cell phones, and TVs. There’s even one watch that runs Linux—X-Windows and all. It seems the wrist is pretty valuable real estate.

No matter which of these devices you might choose to wear on your favored wrist, the device is still primarily your watch. No one in their right mind would wear a multi-function watch on one wrist and a backup timekeeping device on the other. It’s reasonable to expect that, no matter what goes wrong with the extra features of the watch, you’d at least be able to get the time from the thing. Or is it?

My friend learned the hard way that the failure of a watch extra can interfere with it’s implied ability to keep the time. On a recent trip to Alaska, my friend managed to confuse the digital compass in his watch twice. This happened first on the airplane, several miles above the Earth. The second time it happened he was hiking north of Anchorage. Each time, he attempted to take a compass reading only to have the watch reboot itself. Apparently, the strength of the Earth’s magnetic field isn’t within a useful range in those locations.

The designers of this particular watch must have decided that bad sensor readings were more likely than useless magnetic field values. And they deemed rebooting a good way to “fix” a bad sensor. This might have made sense in a lab environment, where the only bad readings were manufactured. However, in the real world, there are places where compasses—even those of the analog sort—aren’t useful. In the process of rebooting the watch, the current time was lost—a far more costly problem for the wearer than an out of range compass reading.

The lesson here is not exclusive to watch designers. All of us who are designing multi-function devices should consider the functions separate. A problem with one function should never be the cause of problems with another.

NOTE: this article was originally published on 7/13/01.

21st Century Blues

Thursday, September 13th, 2001 Michael Barr

Let me be the first to properly welcome you to the 21st century and the new millenium. Just one short year ago, it seemed as though life as we know it (or at least computing as we know it) might grind to a halt on the false millennial-eve because of short-sighted engineering decisions made decades earlier.

Having earned my stripes in the embedded trenches, I was quick to tell anyone who asked that there was nothing to fear on New Year’s Eve 1999. “Embedded developers simply don’t build unneeded functionality, like calendars, into their systems,” I must have explained to a hundred friends and family. It seems I was right. The power stayed on; the water ran; no elevators stuck; no airplanes fell from the sky; traffic lights continued to control access to intersections; and Dick Clark remained on the air–the latter however unfortunately.

But these days I’m less confident in the embedded systems we entrust our lives and livelihoods to. It seems that everywhere I go vendors are encouraging the inclusion of unneeded functionality, and far too many developers are taking them up on it. Consider embedded Linux. While not so unreasonable a choice in a few specific classes of systems—like settop boxes or embedded PCs—Linux is clearly overkill in the vast majority.

How do you even begin to test the safety and reliability of a system with so much complexity and so many authors? Can systems made from a mish-mash of off-the-shelf software components and rushed to the production floor be trusted? Who will certify that these systems are worthy of deployment or purchase? And who will ensure that they are safe and reliable?

Looking back now, I wonder how anyone even found time in 1999 to fix date-related bugs and/or certify systems as “Y2K Compliant.” The U.S. economy has been running at full-speed for well-nigh a decade. The high-tech job market is hot and the amount of work for each engineer to do astounding. In such a climate, anyone halfway to a technical degree can find a job writing software for real products. Combine that with the pressures to get products to market quickly and you’ve got a clear recipe for disaster.

Surely, despite such horrible past disasters as Therac-25, the worst software-induced losses of life and limb lie ahead of us. We must, as an industry and to a person, insist on a higher standard of engineering. We must test our systems and design them to ensure their consistent behavior. Safety and reliability must be our first goals, not our last.

I implore all of you to raise the issues of safety and reliability within your own companies. Avoid unneeded functionality at all cost. After all, years or decades from now human lives or livelihoods may still depend on the engineering decisions you make today.

NOTE: this article originally published 01/01/01.