embedded software boot camp

A nail for a fuse

Friday, November 27th, 2009 by

If I were to search my soul, I’d have to admit that the use of assertions has helped me more than any other single technique, even more than my favorite state machines. But, the use of assertions, simple as they are, is surrounded by so many misconceptions and misunderstandings that it’s difficult to know where to start. The discussion around the recent Jack Genssle’s article “The Use of Assertions” shows many of the misunderstandings.

I suppose that the main difficulties in understanding assertions lay in the fact that while the implementation of assertions is trivial, the effective use of assertions requires a paradigm shift in the view of software construction and the nature of software errors in particular.

Perhaps the most important point to understand about assertions is that they neither handle nor prevent errors, in the same way as fuses in electrical circuits don’t prevent accidents or abuse. In fact, a fuse is an intentionally introduced weak spot in the circuit that is designed to fail sooner than anything else, so actually the whole circuit with a fuse is less robust than without it.

I believe that the analogy between assertions and fuses (which, by the way has been originally proposed by Niall Murphy in a private conversation at one of the Embedded Systems Conferences) is accurate and valuable, because it helps in making the paradigm shift in understanding many aspects of using assertions. Here I’d only like to elaborate just two aspects.

First, the analogy to fuses correctly suggests that assertions work best in the “weakest” spots. Such “weak spots” are often found at the interface between components (e.g., preconditions in a function) but there are many others. The best assertions are those that protect the most of the system. In other words, the best assertions catch errors that would have the most impact on the rest of the system.

The second important implication of the fuse analogy is the issue of disabling assertions in the production code. As the comments to the aforementioned article suggest, most engineers tend to disable assertions before shipping the code, especially in the safety critical products. I believe that this is exactly backwards.

I understand that the standard “assert.h” header file is designed to use assertions only in a debug build, so the macro assert() compiles to nothing when the symbol NDEBUG is defined. I strongly suggest rethinking this philosophy, because disabling assertions in the release configuration is like using nails, paper clips, or coins for fuses. Just imagine finding a nail in place of a fuse in a hospital’s operating room or in a dashboard of an airliner? What would you think of this sort of “repairs”?

Yet, by disabling assertions in our code we do exactly this.

I believe it is very important to understand that assertions have a very important role to play, especially in the filed and especially in the mission-critical systems, because they add additional safety layer in the software. Perhaps the biggest fallacy of our profession is the naïve optimism that our software will not fail. In a nutshell we somehow believe that when we stop checking for errors, they will stop occurring. After all–we don’t see them anymore. But this is not how computer systems work. An error, no matter how small, can cause catastrophic failure. With software, there are no “small” errors. Our software is either in complete control over the machine or it isn’t. Assertions help us know when we lose control.

So what do I suggest we do when the assertion fires in the filed? The proper course of action requires a lot of thinking and sometimes a lot of work. In safety-critical systems software failure should be part of the fault-tree analysis. Sometimes, reaching a fail-safe state requires some redundancy in the hardware. In any case, the assertion failures should be extensively tested.

But this is really the best we can do.

3 Responses to “A nail for a fuse”

  1. fileoffset says:

    One way to leave assertions in but not produce unexpected results in production, is to define a macro or set of macro's that you would use instead of the classic assert(). You can then modify the behavior of the assert macro's so they function differently for production and debug code. For example:#ifdef DEBUG#define ASSERT(a) assert(a)#elsif RELEASE#define ASSERT(a) lcd.displayError(a)#endif

  2. Miro Samek says:

    In embedded systems you typically need to define your own assertion macro, as the standard <assert.h> is not applicable (no stderr to write to and no OS to exit to either). As you define your own assetion macro, you typically end up calling a function to handle the assertion violation (see for example my DDJ article "An Exception or a Bug?"). This callback function is perhaps a better place to differentiate the behavior in the production code vs. debug version:void onAssert__(char const *file, int line) {#ifdef NDEBUG /* release version? */ . . .#else /* debug version */ . . . #endif}

  3. Jim Moore says:

    I’m not really a software type, having been focused mainly in the System engineering disciplines of modeling and simulation (M&S) and what is now called model based SE(MBSE). However, when I read this comment “….I’d have to admit that the use of assertions has helped me more than any other single technique”, I think I have a vague understanding of the author’s perspective. I had a similar experience after reading Steve Maguire’s “Writing Solid Code” back in the mid 1990’s. Maguire points out techniques for code construction where he doesn’t just use asserts to simply check values, but in fact in some cases actually invokes complete parallel test solutions. These assert algorithms serve as baseline reference comparisons of known results, to the “production code” implementation. This represented an entire paradigm shift in thinking for me about code testing using macro asserts in their various flavors.
    Just recently, I needed to implement some ECEF to LLA transformations, and I used two other sources (MATLAB, and a second online applet) to build specific references tests cases to compare my code to (using conditionally compiled tests). Once you have those checks(checking double precision math accuracy), I leave them in the code even if they are not activated. It makes it easy to go back and recheck if something becomes amiss(x86 vs. ARM implementations). At this point you might not call the macros “assert” any more; perhaps “assert_EFCF_tests” is more descriptive. However, I must profess to a certain extent, I have perhaps somewhat outgrown traditional asserts, as I regularly leave full test harnesses in my code which I conditionally compile. I could still leave “assert_mytest()” in the code, but I have pushed the conditional compilation well beyond the notion of assert to conditionally compiled tests harnesses that test the code within the context of the larger system operation (say like a machine to machine protocol test). Having said all of that, the flavor of this blog is more to do with the general topic of system/software safety design (v..s. pre-release inline software tests) and yes asserts can play an important role there as well.

    With respect to safety design and conditional compliations, over the years I have had to rethink some of my approaches to system design and a full embrace of the “walking wounded” concept where no matter what happens to the embedded safety critical software (e.g .a missile), the code needs to keep working as best it can. One of the traps, is that the software can be so resilient, that it masks not only external faults, but also internal bugs. This is an area where imagination with asserts can be very useful.
    One of the principles that I have used to reform my thinking of this walking wounded approach is that during development you want hard failures or loud protestations when there are faults/failures. The principle purpose is so that you can fix these things before fielding. In line with the the concept that asserts should be left in in production and ON, I agree; silent failures/faults are to be avoided. Remember that most software failures from unanticipated external inputs , and so having some form of “in-line” instrumentation for even the user can be very valuable in diagnosing and fixing issues quickly. For modeling and simulation, there is an M&S user to report these issue to.
    In the case of autonomous flight vehicles in operation, where there is no operator, we rely heavily on telemetry or on-board “black box” recording of the entire mission. Typical recordings are various system statuses. A higher level of sophistication might involve a error stacks, or other forms of encoded fault reporting that expand reporting to an human intelligible format in the delogging process.
    The realization here is that there are many opportunities to instrument code that remains in place even in production and in operation. You probably turn off the “verbose mode”, but turning it off all together is a poor idea.

Leave a Reply