embedded software boot camp

A Heap of Problems

January 24th, 2010 by admin

Some design problems never seem to go away. You think that anybody who has been in the embedded software development business for a while must have learned to be wary of malloc() and free() (or their C++ counterparts new and delete). Then you find that many developers actually don’t know why embedded real-time systems are so particularly intolerant of heap problems.

For example, recently an Embedded.com reader attacked my comment to the article “Back to the Basics – Practical Embedded Coding Tips: Part 1 Reentrancy, atomic variables and recursion“, in which I advised against using the heap. Here is this reader’s argumentation:

I have no idea why did you bring up the pledge not to use the heap, on modern 32-bit MCUs (ARMs etc) there is no reason – and no justification – to avoid using the heap. The only reason not to use the heap is to avoid memory fragmentation, but good heap implementation and careful memory allocation planning will overcome that.

As I cannot disagree more with the statements above, I decided that it’s perhaps the time to re-post my “heap of problems” list, which goes as follows:

  • Dynamically allocating and freeing memory can fragment the heap over time to the point that the program crashes because of an inability to allocate more RAM. The total remaining heap storage might be more than adequate, but no single piece satisfies a specific malloc() request.
  • Heap-based memory management is wasteful. All heap management algorithms must maintain some form of header information for each block allocated. At the very least, this information includes the size of the block. For example, if the header causes a four-byte overhead, then a four-byte allocation requires at least eight bytes, so only 50 percent of the allocated memory is usable to the application. Because of these overheads and the aforementioned fragmentation, determining the minimum size of the heap is difficult. Even if you were to know the worst-case mix of objects simultaneously allocated on the heap (which you typically don’t), the required heap storage is much more than a simple sum of the object sizes. As a result, the only practical way to make the heap more reliable is to massively oversize it.
  • Both malloc() and free() can be (and often are) nondeterministic, meaning that they potentially can take a long (hard to quantify) time to execute, which conflicts squarely with real-time constraints. Although many RTOSs have heap management algorithms with bounded, or even deterministic performance, they don’t necessarily handle multiple small allocations efficiently.

Unfortunately, the list of heap problems doesn’t stop there. A new class of problems appears when you use heap in a multithreaded environment. The heap becomes a shared resource and consequently causes all the headaches associated with resource sharing, so the list goes on:

  • Both malloc() and free() can be (and often are) non-reentrant; that is, they cannot be safely called simultaneously from multiple threads of execution.
  • The reentrancy problem can be remedied by protecting malloc(), free(), realloc(), and so on internally with a mutex, which lets only one thread at a time access the shared heap. However, this scheme could cause excessive blocking of threads (especially if memory management is nondeterministic) and can significantly reduce parallelism. Mutexes can also be subject to priority inversion. Naturally, the heap management functions protected by a mutex are not available to interrupt service routines (ISRs) because ISRs cannot block.

Finally, all the problems listed previously come on top of the usual pitfalls associated with dynamic memory allocation. For completeness, I’ll mention them here as well.

  • If you destroy all pointers to an object and fail to free it or you simply leave objects lying about well past their useful lifetimes, you create a memory leak. If you leak enough memory, your storage allocation eventually fails.
  • Conversely, if you free a heap object but the rest of the program still believes that pointers to the object remain valid, you have created dangling pointers. If you dereference such a dangling pointer to access the recycled object (which by that time might be already allocated to somebody else), your application can crash.
  • Most of the heap-related problems are notoriously difficult to test. For example, a brief bout of testing often fails to uncover a storage leak that kills a program after a few hours, or weeks, of operation. Similarly, exceeding a real-time deadline because of nondeterminism can show up only when the heap reaches a certain fragmentation pattern. These types of problems are extremely difficult to reproduce.

A nail for a fuse

November 27th, 2009 by admin

If I were to search my soul, I’d have to admit that the use of assertions has helped me more than any other single technique, even more than my favorite state machines. But, the use of assertions, simple as they are, is surrounded by so many misconceptions and misunderstandings that it’s difficult to know where to start. The discussion around the recent Jack Genssle’s article “The Use of Assertions” shows many of the misunderstandings.

I suppose that the main difficulties in understanding assertions lay in the fact that while the implementation of assertions is trivial, the effective use of assertions requires a paradigm shift in the view of software construction and the nature of software errors in particular.

Perhaps the most important point to understand about assertions is that they neither handle nor prevent errors, in the same way as fuses in electrical circuits don’t prevent accidents or abuse. In fact, a fuse is an intentionally introduced weak spot in the circuit that is designed to fail sooner than anything else, so actually the whole circuit with a fuse is less robust than without it.

I believe that the analogy between assertions and fuses (which, by the way has been originally proposed by Niall Murphy in a private conversation at one of the Embedded Systems Conferences) is accurate and valuable, because it helps in making the paradigm shift in understanding many aspects of using assertions. Here I’d only like to elaborate just two aspects.

First, the analogy to fuses correctly suggests that assertions work best in the “weakest” spots. Such “weak spots” are often found at the interface between components (e.g., preconditions in a function) but there are many others. The best assertions are those that protect the most of the system. In other words, the best assertions catch errors that would have the most impact on the rest of the system.

The second important implication of the fuse analogy is the issue of disabling assertions in the production code. As the comments to the aforementioned article suggest, most engineers tend to disable assertions before shipping the code, especially in the safety critical products. I believe that this is exactly backwards.

I understand that the standard “assert.h” header file is designed to use assertions only in a debug build, so the macro assert() compiles to nothing when the symbol NDEBUG is defined. I strongly suggest rethinking this philosophy, because disabling assertions in the release configuration is like using nails, paper clips, or coins for fuses. Just imagine finding a nail in place of a fuse in a hospital’s operating room or in a dashboard of an airliner? What would you think of this sort of “repairs”?

Yet, by disabling assertions in our code we do exactly this.

I believe it is very important to understand that assertions have a very important role to play, especially in the filed and especially in the mission-critical systems, because they add additional safety layer in the software. Perhaps the biggest fallacy of our profession is the naïve optimism that our software will not fail. In a nutshell we somehow believe that when we stop checking for errors, they will stop occurring. After all–we don’t see them anymore. But this is not how computer systems work. An error, no matter how small, can cause catastrophic failure. With software, there are no “small” errors. Our software is either in complete control over the machine or it isn’t. Assertions help us know when we lose control.

So what do I suggest we do when the assertion fires in the filed? The proper course of action requires a lot of thinking and sometimes a lot of work. In safety-critical systems software failure should be part of the fault-tree analysis. Sometimes, reaching a fail-safe state requires some redundancy in the hardware. In any case, the assertion failures should be extensively tested.

But this is really the best we can do.

Cute Creator

April 28th, 2009 by admin

For a long time I’ve been looking for a good cross platform development environment that would allow fast exploration and navigation of C/C++ source code, not just editing of individual files. For a while I though that Eclipse will fit the bill, but as I wrote previously, the CDT (C/C++ Development Tooling) was really disappointing for me.

In this post I’d like to tell you about my recent big hope for a truly productive IDE, which is the Qt Creator from qtsoftware.com. Qt Creator is based on the popular cross-platform Qt framework and runs natively on Windows, Linux, BSD, Mac OS X, and some embedded platforms. No Java (as in the case of Eclipse) means speed and snappy interface. Qt Software (previously Trolltech, acquired in 2008 by Nokia) offers free downloads of Qt Creator for all major platforms.

Qt Creator is primarily targeted as the IDE for Qt-related development. However, the recently released version 1.1 (April 23, 2009) supports external projects, so adding your embedded or any other projects unrelated to Qt is easy.

For example, I’ve created an embedded project for a “game” shown in the screen shot below (click on the image to see it full-size):

QtCreator

QtCreator

The editing surface maximizes the screen real-estate for file viewing and supports sophisticated splitting, so that my favorite side-by-side code editing is easy.

As shown in the left pane, you can add to your project as many files in different directories as you like. Given this information, Qt Creator builds an internal database of all symbols in your code to allow you exploring and navigating through your source code quickly. For example, you can jump from symbol usage to its definition by pressing F2 (press Alt-back-arrow to jump back to the previous context).

Everything in the editor is designed to enhance quick navigation. For example, every editor pane has a drop-down list of functions and other elements in the file. The editor also supports selective viewing with collapsible/expandable code sections, so you can fit more information on the screen. To quickly view the collapsed section you can simply hover your mouse cursor over it.

I immensely like the support for project-wide searching (as well as search-and-replace), which is available at the bottom of the screen. This feature alone is worth installing the tool.

Even though it is so new, Qt Creator is already very interesting, free, cross-platform IDE with features comparable to Visual Studio 2008 and other best-in-class tools. Qt Software seems very committed to enhancing Qt Creator and I hope that Qt Creator will soon catch up with Eclipse as third-party plug-ins will be developed. One feature that I will be looking forward to is side-by-side code differencing. But already, it is a powerful, free, cross-platform tool that you should try.

Insects of the computer world

March 9th, 2009 by admin

The recent Jack Ganssle’s “Breakpoints” blog on Embedded.com makes an excellent point that the same forces (the Moore’s law), which drive down the prices of high-end processors open even more market opportunities at the low-end of the price spectrum. I also agree that the most deciding factor for the price of a single-chip microcontroller (MCU) is the efficiency of its memory use, in other words, the code density. This becomes obvious when one looks at the silicon die of any MCU, which is completely dominated by the ROM and RAM blocks, the CPU being almost insignificant somewhere in the corner.

But, I would disagree with Jack’s statement that “tiny (8-bit) processors make more efficient use of memory”. From my experience with several single-chip MCUs I draw a different conclusion: the CPU size (8-, 16-, 32-bits) almost doesn’t matter for the code density. The deciding factor is how old a design is, whereas the newer instruction set architectures (ISAs) generally far outperform the older ISAs.

To support the point, I present below a table that shows the code size of a tiny state machine framework written in C (called QP-nano), which has been compiled for a dozen or so very different single-chip MCUs. The code consists of a small hierarchical state machine processor (called QEP-nano), and a tiny framework (called QF-nano). The QEP-nano consists mostly of a conditional logic to execute hierarchical state machines. QF-nano contains an event queue, a timer module, and a simple event loop. I believe that this code is quite representative to typical projects that run on these small MCUs.

CPU type          C Compiler         QEP-nano   QF-nano

(bytes)   (bytes)
---------------+-------------------+----------+---------
PIC18                MPLAB-C18         3,214     2,072

(student edition)

---------------+-------------------+----------+---------
8051 (SiLabs)      IAR EW8051            952       603

---------------+-------------------+----------+---------

PSoC (M8C)        ImageCraft M8C       2,765     2,425

---------------+-------------------+----------+---------

68HC08          CodeWarrior HC(S)08       957      660

---------------+-------------------+----------+---------

AVR (ATmega)     IAR EWAVR                541      650

---------------+-------------------+----------+---------

AVR (ATmega)      WinAVR(GNU)             998      810

---------------+-------------------+----------+---------

MSP430           IAR EW430                552      460

---------------+-------------------+----------+---------

M16C             HEW4/NC30                984      969

---------------+-------------------+----------+---------

TMS320C28x       C2000               369 words 331 words (Piccolo)                            738 bytes 662 bytes

---------------+-------------------+----------+---------

ARM7(ARM/THUMB)  IAR EWARM          588(THUMB)  1,112(ARM)

---------------+-------------------+----------+---------

ARM Cortex-M3    IAR EWARM          524         504

(THUMB2)

---------------+-------------------+----------+---------

Interestingly, the winner is MSP430, which is a 16-bit architecture.
It seems that the 16-bit ISA hits somehow the “sweet spot” for the best code density, perhaps because the addresses are also 16-bit wide and are handled in a single instruction. In contrast, 8-bitters need multiple instructions to handle 16-bit addresses.

I would also point out the excellent code density (and C-friendliness) of the new ARM Cortex-M3, which is a modern 32-bit ISA, and still far outperforms all 8-bitters, including the good ol’8051.

On the other hand, the venerable PIC architecture is by far the worst (or, C un-friendly). That’s interesting, because this is the 8-bit market leader. I honestly don’t understand how Microchip makes money when their chips require the most silicon for given functionality. Clearly some other forces than just technical merits must be at work here.

In conclusion, I understand that my data is highly subjective and different code sets (and different compilers) could perhaps produce different results. However, I believe that the general trend is true and this is an important lesson for engineers selecting MCUs.

RTOS Alternatives

January 7th, 2009 by admin

As hundreds of commercial and other RTOS offerings can attest, the greatest demand for third-party software in the embedded systems community is for the RTOS. But this is perhaps because most embedded developers believe that traditional preemptive RTOS on one end of the complexity spectrum and the customary superloop (main+ISRs) on the other are the only choices for the embedded software architecture.

However, a little less know alternative is *event-driven* software structure based on an event-driven framework and encapsulated state machines (called active objects in the UML). This active object-based architecture is not new, and in fact, has been in quite widespread use for at least two decades. Virtually all commercially successful design automation tools on the market today (Telelogic Rhapsody, Rose Real-Time, IAR visualSTATE, Mathworks StateFlow, and many others) are based on hierarchical state machines and incorporate internally a variant of an event-driven framework. For example, Rhapsody generates code either for the Object eXecution Framework (OXF) or the Interrupt-Driven Framework (IDF). OXF requires a traditional RTOS for preemptive scheduling, while IDF was created specifically to avoid the need for an RTOS.

Most developers are accustomed to the basic sequential control, in which a program (a task in an RTOS) waits for events in various places in its execution path by either actively polling for events or passively blocking on a semaphore or other such RTOS mechanism. Though this approach is functional in many situations, it doesn’t work very well when the system must timely react to multiple events whose arrival times and order one cannot predict. The fundamental problem is that while a sequential task is waiting on one kind of event, it is not doing any other work and is not *responsive* to other events.

Event-driven programming requires a distinctly different way of thinking than conventional sequential programs, such as “superloops” or tasks in a traditional RTOS. Event-driven systems are structured according to the Hollywood principle, which means “Don’t call us, we’ll call you”. So, an event-driven program is not in control while waiting for an event; in fact, it’s not even active. Only once the event arrives, the program is called to process the event and then it quickly relinquishes the control again. This arrangement allows an event-driven system to wait for many events in parallel, so the system remains *responsive* to all events it needs to handle.

This scheme has three important consequences. First, it implies that an event-driven system is naturally divided into the application, which actually handles the events, and the supervisory event-driven infrastructure (framework), which waits for events and dispatches them to the application. Second, the control resides in the event-driven infrastructure, so from the application standpoint, the control is inverted compared to a traditional sequential program. And third, the event-driven application must return control after handling each event, so the execution context cannot be preserved in the stack-based variables and the program counter as it is in a sequential task. Instead, the event-driven application becomes a *state machine*, or actually a set of collaborating state machines that preserve the context from one event to the next in the static variables.

Traditionally, event-driven programming was done with a specific design-automation tool, such as Rose-RT or Rhapsody (now both acquired by IBM). But recently, lightweight, open source event-driven frameworks became available. The lightweight frameworks allow direct coding of hierarchical state machines (UML statecharts) in C or C++ and then combining multiple concurrent state machines into systems, all without big tools (e.g., see www.state-machine.com).