Archive for September, 2006

Agile Embedded Development

Sunday, September 24th, 2006

Silicon Valley finally seems to be taking a serious look at “agile development” as a competitive advantage. Articles like “Reinventing the Software Development Strategy” by John Seybold give us a glimmer of hope that maybe software development doesn’t always need to be a “death march” of missed schedules, but rather can actually be fun.

If you accept arguments made in Seybold’s article (and many other articles and books about agile development and extreme programming), then you must look at testing in an entirely new light. Testing is not some pain-in-the-neck chore performed long after the design and coding by the most junior and inexperienced team members. Rather, continuous testing is the primary activity that drives everything else that’s going on in the project. In fact, if you truly put so understood Testing (with capital T) at the center, the whole agile process falls out more or less automatically from this single principle.

Testing, in the agile sense, has been notoriously difficult in the embedded space. The desktop guys have powerful, commodity hardware with plenty of standard development tools. We embedded folks, on the other hand, by definition work on some custom design interfaced to proprietary, often buggy (or not even yet existing) hardware.

But it doesn’t mean that embedded developers cannot dramatically improve Testability of their software. If you truly, seriously think about Testing, you need to bend everything in the project toward the Testing, not the other way around.

Let’s start with the design. Everybody knows that modular software with independently testable pieces is good. The trick, of course, is to build it that way.

The conventional approaches, unfortunately, aren’t helping here. Take for example a traditional RTOS. The natural units of decomposition are tasks. But when you try to unit-test any real-world task, you quickly notice that it is hopelessly intertwined with other tasks by means of semaphores, shared resources, mutexes, condition variables, event flags, message mailboxes, message queues, and so on. Surely, traditional RTOSes provide no shortage of mechanisms to tie the application in a hopeless knot.

Experienced embedded gurus know to be wary of most of the RTOS mechanisms, and strictly build applications around the message-passing paradigm. Strict encapsulation is the name of the game. A task hides all its internal data and resources and communicates with the outside world only by sending and receiving events. Such systems use only a tiny fraction of the RTOS, namely message queues, and have really no need for all the other tricky RTOS mechanisms. Software components designed that way are not only easier to unit-test. They are also safer, more reusable, maintainable, and extensible.

But at this point I need to ask the nagging questions. Why structuring all systems that way is not somehow enforced in the RTOS itself? Why RTOS vendors bend over backwards to keep adding even more ways to couple the tasks?

The second aspect of software development that can make or break any successful Testing strategy is the error and exception handling policy. I’m really amazed how much complexity is added to the code by “defensive programming” techniques that somehow attempt to “handle” erroneous situations that never should have occurred in the first place, like overrunning an array index or dereferencing a NULL-pointer. The problem is that defensive programming hinders Testing… and demoralizes the testers.

You see, defensively written code accepts much wider range of inputs than it should and by doing so hides bugs. Your tests don’t appear to uncover evident errors. Yet such tests don’t build much confidence in the system, because the code might be wondering around all nights and weekends silently sweeping the errors under the rug.

A much better alternative is to confront errors head-on, by liberally using assertions (or more scientifically the Design By Contract philosophy). Testing a piece of code peppered with assertions is an entirely different experience than “defensive” code. Every successful Test run means that the program passed all its assertions. Every Test failure is much harder to dismiss as “not reproducible”, because you have a record in form of a file name and line number where the assertion fired. This information gives you an excellent starting point for understanding and ultimately fixing the bug.

And finally, Testing almost always requires instrumenting the code to give the tester additional visibility into the inner workings of the software. Unfortunately, in many embedded systems even the primitive printf() facility is unavailable (no screen to print to). Obviously, you can do much better than printf() (e.g., see the Quantum Spy software trace facility).

As you can see, Testing in the agile sense requires serious upfront investments and rethinking many of the time-honored embedded practices. You can no longer build a system without accounting for Testing right from the start.

What do you think about agile embedded software development? What do you do to improve Testability of your systems?

What Embedded Programs have to do with Hollywood?

Tuesday, September 19th, 2006

I still remember the “Triumph of the Nerds” PBS special, where Steve Jobs recalled his early days at Apple and how the young Apple team picked up the brains of scientists at the Xerox Palo Alto Research Center (PARC) . Steve explained how PARC researchers showed them three revolutionary things: (1) the graphical user interface (GUI), (2) computer network, and (3) object-oriented programming. Out of these three things, Steve confessed to have understood only the first one at the time. This alone, however, proved enough to launch the Mac, and the rest is history.

I believe that the embedded industry still hasn’t learned from PARC even as much as Apple did some three decades ago. The question standing in my mind is: Why most embedded programs aren’t structured the same way as virtually all GUI programs are?

If you’re baffled why I am comparing embedded systems to GUIs, consider that just about every embedded system, just like every GUI, is predominantly event-driven, by nature. In both cases, the primary function of the system is reacting to events. In the case of embedded systems, the events might be different than GUI (e.g., time ticks or arrivals of data packets), rather than mouse clicks and button presses. But, the essential job is still the same: reacting to events that come at difficult to foresee order and timing.

Even the earliest GUIs, such as the original Mac, or the early-days Windows, were structured according to the “Hollywood principle“, which means “Don’t call us, we’ll call you”. The “Hollywood principle” recognizes that the program is not really in control—the events are. So instead of pretending that the program is running the system, the system runs your program by calling your code to process events.

This reversal of control seems natural, I hope, and has served well all GUI systems. However, the concept hasn’t really caught on in the embedded space. The time-honored approaches are still either the “superloop” (main+ISR) or an RTOS, none of which really embodies the “Hollywood principle”.

It really takes more than “just” an API, such as a traditional RTOS. What you typically need is a framework that provides the main body of the application and calls the code that you provide. Such event-driven real-time frameworks are not new. Today, virtually every design automation tool for embedded systems incorporates a variant of such an event-driven framework. The frameworks buried inside tools prove that the concept works very well in very wide range of embedded systems.

My point is that a Real-Time Framework (RTF) should, and I believe eventually will, replace the traditional RTOS. What do you think?