embedded software boot camp

The Practice of Engineering

December 24th, 2001 by Michael Barr

As another academic semester draws to a close this month, I feel compelled to share some thoughts about how traditional institutions of higher learning are failing to train engineers and computer scientists to become embedded programmers.  

Writing software for embedded—particularly safety-critical or real-time—systems is not easy. Many folks with a degree in computer science or engineering will never be qualified. A deep understanding of computer hardware and specific I/O interfaces and devices is one requirement. Knowledge of and experience with programming techniques that hold up under soft or hard real-time deadlines is another. Willingness—even eagerness—to debug difficult interactions between software and hardware with limited visibility into a system is an absolute must.
The subject I teach, operating system internals, deals with some of these issues. I expect the students in my course to gain an understanding of real-time scheduling, multithreaded programming and synchronization, and device driver writing. These skills would serve them well as embedded programmers—but one course is not enough to make them such.
Most of my students arrive with only a limited understanding of computer architecture! It’s unbelievable, unless you’ve seen it first hand, that the bulk of a group of graduate and senior undergraduate students studying computer engineering don’t understand basic concepts like stack frames, interrupts, DMA, and cache. It’s not that they haven’t heard those words before; rather, they don’t have any first-hand knowledge and, therefore, lack understanding of these extremely important concepts.
This is symptomatic of a larger problem with engineering education in general. Engineering students are not currently challenged in the right ways. They’re challenged to understand computer architecture without designing a computer. They’re challenged to understand operating systems without writing one. And they’re challenged to understand (and write, in my class) multithreaded programs with only one freshman course in basic C programming behind them.
Why aren’t engineering departments challenging their students to be engineers? Engineering is the practice of science. Students are taught the basic science well: math, physics, chemistry, etc. But they’re not taught the practice. By the time they get to their junior year, and throughout grad school, all students should be required to work on real projects—every semester, in every course. A little assembly and C programming here, a CPU design there, and some hands-on DSP work could go a long way toward preparing them for the world of work.
Without that first-hand experience—and I know that’s not just lacking at the University of Maryland because my graduate students are mostly from other schools—these folks just plain aren’t useful as engineers. Among the poor and outdated requirements for accreditation of engineering degree programs is a recent innovation that seeks to address this problem. In their senior year, all undergraduate students are now required to work on a “capstone project”. This involves working on a small team, with the assistance of an advisor, on a project of their own choosing.
While the capstone project is certainly a good idea, it’s also too little, too late. Because students working on these projects haven’t had any practical experiences before, they rarely succeed in achieving much of anything. The capstone should be part of the end of the program, but it should also be the last in a long string of practical projects—in all manner of courses along the way.

Emergency

November 23rd, 2001 by Michael Barr

In the days immediately following September 11 a pair of articles, one in EE Times and the other in The Washington Post, about emergency cell phone location technology caught my attention. Both articles focused on renewed lobbying efforts on Capitol Hill aimed at forcing cellular providers to meet the FCC’s deadline for implementing Phase II of the Enhanced 911 standard. (At press time, most cellular carriers have requested waivers for the October 1 deadline.)

The contradiction underlying the timing of these new lobbying efforts is that the technology, as proposed, would have helped very few, if any, of the thousands of victims of the terrorist attacks. Ignoring that most of those in the twin towers probably didn’t live past the crushing collapses, consider these more technological issues.
Handset-based locators:
  • In order for a GPS receiver in a handset to determine the owner’s location to within the required 50 m (67% of 911 callers) or 150 m (95%), a clear view of several satellites is required. It can be difficult to get an adequate view of sky in a downtown section of a city (a big part of what those percentage of caller requirements are all about) even on a normal day. Imagine trying to acquire a signal from even one GPS satellite while you’re buried in a pile of rubble that’s ten stories deep and mostly underground (or in a subway system, a traffic tunnel, or many other likely emergency sites, for that matter).
  • Even if your handset could somehow manage to acquire a sufficient number of satellites, it’s questionable whether the mandated accuracy range would have been adequate in this disaster. With literally millions of tons of debris to move, even 50 m accuracy is nowhere near precise enough to point rescuers in the proper direction to dig. The larger 150 m radius could put you anywhere within the base of one tower. And how deeply should they look? (Start digging with heavy equipment or hands?)
Network-based locators:

  • Perhaps, in this disaster, network-based triangulation would have been more useful to rescuers. At least it wouldn’t have required victims to have recently upgraded their phones or have a clear view of the sky. Yet the lower required accuracy for this technology (100 m for 67% of 911 callers, 300 m for 95%) would have made the data that much less useful to the rescuers. 

In either case, both technologies would require that the victim’s phone be also: still in her possession after the collapse; still in a working condition; and partly or fully charged. In addition, the victim turning on her phone would have to be lucky enough to be greeted by something other than a lack of signal (several base stations were destroyed right in the vicinity of the World Trade Center) or network-busy (cellular and land-based telephone traffic surged even on networks clear across the country).

Rather than pointing to the need to implement the current generation of E-911 technology more quickly, this tragedy only points to the complete inadequacy of the current requirements for certain kinds of disasters. The current E-911 technology may, in fact, be useful in some sorts of emergencies. But we can’t stop there. In addition to implementing the current technology, other technologies and approaches need to be considered as well. For example, handheld devices that pinpoint the location and distance of handset signals should be available en masse within hours of such disasters.  
Surely someone in our industry is in a position to help solve this problem before the next disaster strikes.

Embedded Java Update

October 9th, 2001 by Michael Barr

I first encountered Java in 1996. At the time, I was working for a company that built satellite telemetry equipment. Parts of this were real-time/embedded—mostly VME boards running VxWorks. But other parts of the data post-processing were done on Sun workstations. I became interested in the language when, pulled aside from my usual embedded programming responsibilities for several months, I was enjoined to help write some workstation software in Java. It was love at first compile.

I was immediately wowed by the Java language. I loved that it’s basic syntax was very like C’s, but that its creators had addressed many of that languages inconsistencies and pitfalls. I was also impressed at how much faster it was to program in Java than C++, a language I’d tried to love but never been able to. At first blush, Java also had some cool features for embedded programmers—things like fixed-width integers, a portable binary format, guaranteed protection from memory leaks, and built-in support for threads and synchronization (no more porting your embedded code from one RTOS API to yet another).

The only problem was that I loved embedded programming more—I still do. So I’ve been on a five-year odyssey of hope; wanting to use Java in my embedded programming, but never having the opportunity to do anything more than experiment. I’m still convinced the language is awesome and that embedded programmers using C++ would benefit heavily from switching over to it. However, the priority in our field is always on shipping a product vs. hoping some new language or tool will make the process easier.

To find out where things stand with embedded Java today, I organized and moderated a panel called “Is Java Ready for Embedded Use?”, which took place last night at the Embedded Systems Conference in Boston. The panel consisted of two real-time Java gurus and four embedded programmers who’d recently used Java technologies from Wind River, Newmonics, aJile, and other vendors. Here’s what I and the other attendees learned from the panelists.

The most important thing we learned was that people are now building real systems (and even shipping some of them!) with Java. Of course, the Real-Time Specification for Java (RTSJ) is still several months away from having a complete reference implementation (RI) and testing compatibility kit (TCK), so today’s embedded Java users are mostly from the “no-real-time” side of the house.

A common current use for Java is as a user interface engine. In other words, the real-time and device-specific stuff is still being done in C. A Java Virtual Machine is simply added to the system as a mechanism for executing complex, and perhaps field-upgradable, user interface threads. The big advantages are that: (1) any competent Java programmer can be enlisted to produce that part of the embedded software, and (2) feedback from marketing and target customers can be sought and incorporated long before the actual hardware is complete. Wind River says that about 10% of its RTOS customers are also licensing a JVM from them and using it for such purposes.

Another option is to base your system design on a Java processor and write all of the firmware in Java. The processors from aJile Systems, in particular, support many of the features recommended by the draft RTSJ spec. This means that real-time programming in Java is available today, in quantity. And aJile has already stated they’ll conform their technology to the final spec, as soon as the RI and TCK are available.

It should be extremely interesting to watch the embedded Java space over the next one to two year period and to see how an increasing number of users will take advantage of its many great features.

Safety Patrol

September 20th, 2001 by Michael Barr

When I was in the sixth grade, I was a member of my school’s Safety Patrol. It was my responsibility to ensure that younger children got on and off the school bus safely. “Safeties” wear bright orange sashes and help other kids cross streets adjacent to their bus stops. This is just one measure in a complex web of overlapping steps taken to protect the most vulnerable members of our communities.

As children and adults alike increasingly place their lives in the hands of computer hardware and software, we need to add layers of safety there as well. No software bug or hardware glitch (or combination) can ever be allowed to bring down an aircraft, whether there are hundreds of passengers on board or just a pilot. The failure of many other systems must be similarly prevented. But software and hardware do fail—perhaps inevitably. As engineers, we use system partitioning, redundancy, protection mechanisms, and other techniques to contain and work around failures when they do occur.
As software’s role in safety-critical systems continues to expand, I expect we’ll see a rapid increase in the number of civil lawsuits filed against companies that design and manufacture embedded systems. (Adding several new levels of meaning to the phrase project post mortem.) Indeed, there is anecdotal evidence that lawsuits of this sort may already be on the rise. With most of the action in hush-hush settlements outside the courtroom, though, the media hasn’t yet noticed the trend.
One organization that has definitely taken notice of the hazards posed by software in products is Underwriter’s Laboratories. An independent, not-for-profit product safety certification and ANSI-accredited standards organization, UL initiated a “Standard for Software in Programmable Components” in 1994. The resulting ANSI/UL-1998 standard addresses “the detailed safety-related characteristics of specific software in a product.”
In addition to focusing on top down design and development processes, it may also be beneficial to utilize an operating system that’s been designed with safety-critical systems in mind. Above all else, an RTOS should not compromise the stability of the system. However, an operating system can go beyond and do many things to reduce the risks inherent in your application code. Keeping software tasks from overwriting each other’s data and stacks is merely the beginning of the matter.
In your rush to select an RTOS for use in a mission critical system or life-critical medical device, do make sure you know what you’re getting, though. It turns out that one prominent new operating system marketed specifically for inclusion in products of these sorts has a potentially dangerous hole in its “innovative” protection mechanism. You don’t want to wind up on the wrong side of something like that in court.
Ultimately, the key to designing safety-critical systems is to include multiple layers of protection. The hardware, the operating system, and your application software must each do everything they can to prevent catastrophe—even if the fault itself lies outside that subsystem.

First, Do No Harm

September 17th, 2001 by Michael Barr

Many of the folks I hang around with own digital watches. I have one myself. In addition to keeping fairly accurate time, mine features a stopwatch, a countdown timer, and a set of five alarms. A friend has a different watch from the same company—one that features a digital compass. To take a compass reading, you stand still, hold the watch level, press the “Compass” button, and read a cardinal point (N, NNE, NE, ENE, E, etc.) and angle in degrees. The reading tells you where the top of the watch is currently pointing.

Multi-function devices like these aren’t unique. Lots of products have multiple features. Today’s buzzword for this phenomenon is “convergence.” Staying only with the watch theme for a minute, there are GPS watches, watches with digital cameras, calculator watches, and full-featured PDAs for your wrist. There are also watches that double as heart rate monitors, pagers, cell phones, and TVs. There’s even one watch that runs Linux—X-Windows and all. It seems the wrist is pretty valuable real estate.

No matter which of these devices you might choose to wear on your favored wrist, the device is still primarily your watch. No one in their right mind would wear a multi-function watch on one wrist and a backup timekeeping device on the other. It’s reasonable to expect that, no matter what goes wrong with the extra features of the watch, you’d at least be able to get the time from the thing. Or is it?

My friend learned the hard way that the failure of a watch extra can interfere with it’s implied ability to keep the time. On a recent trip to Alaska, my friend managed to confuse the digital compass in his watch twice. This happened first on the airplane, several miles above the Earth. The second time it happened he was hiking north of Anchorage. Each time, he attempted to take a compass reading only to have the watch reboot itself. Apparently, the strength of the Earth’s magnetic field isn’t within a useful range in those locations.

The designers of this particular watch must have decided that bad sensor readings were more likely than useless magnetic field values. And they deemed rebooting a good way to “fix” a bad sensor. This might have made sense in a lab environment, where the only bad readings were manufactured. However, in the real world, there are places where compasses—even those of the analog sort—aren’t useful. In the process of rebooting the watch, the current time was lost—a far more costly problem for the wearer than an out of range compass reading.

The lesson here is not exclusive to watch designers. All of us who are designing multi-function devices should consider the functions separate. A problem with one function should never be the cause of problems with another.

NOTE: this article was originally published on 7/13/01.