embedded software boot camp

RTOS without blocking?

April 19th, 2010 by Miro Samek

In my previous post, “I hate RTOSes”, I have identified blocking as the main cause of the particular brittleness and inflexibility of the programs based on RTOSes. Here I’d like to discuss techniques of minimizing blocking and eradicating it completely from the application-level code. In other words, I’d like to show you how to use an RTOS for building responsive event-driven software.

For reasons I’ve outlined before, experienced RTOS users have learned to be weary of peppering the code with the blocking calls to the RTOS. So, even though every RTOS boasts a plethora of various communication and synchronization mechanisms (all of them based on blocking), advanced real-time developers intentionally limit their designs to just one generic blocking call per task, as shown in the following pseudocode:

void task_routine(void *arg) {
    while (1) {
        // block on any event designated for this task (generic)
        // process the event *without* further blocking (task specific)
    }
}

Most RTOSes provide mechanisms to wait for multiple events in a single blocking call, for example: event flags, message mailboxes, message queues, the select() call, condition variables, and many others. From all these possibilities, I’d like to single out the message queue, because it is the most generic and flexible mechanism. A message posted to a message queue not only unblocks any task that waits on the queue (synchronization), but the message can also contain any information associated with the event (interprocess communication). For example, a message from an analog-to-digital converter (ADC) can signal when the conversion has completed as well as the actual value of the conversion result.

The generic pseudocode of a task based on a message queue looks as follows:

void task_routine(void *arg) {
    while (1) { // main event loop of the task
        void *event = msg_queue_get(); // wait for event
        // process the event *without* further blocking (task specific)
    }
}

The most important premise of this event-loop design is that the task-specific code that processes the events obtained from the queue is not allowed to block. The event-processing code must execute quickly and return back to the event loop, so that the event loop can check for other events.

This design also automatically guarantees that each event is processed in run-to-completion (RTC) fashion. By design, the event loop must necessarily complete processing of the current event before looping back to obtain and process the next event. Also note that the need for queuing events is an immediate consequence of the RTC processing style. Queuing prevents losing events that arrive while the event-loop is executing an RTC step.

The event-loop pseudocode shown above is still task-specific, but it is quite easy to make it completely generic. As shown below, you can combine a message queue and an event-handler pointer-to-function in the TCB structure. A pointer to the TCB struct can be then passed to the task in the argument of the task routine (arg). This is quite easily achieved when the task is created.

typedef struct {
    MessageQueue queue;        // event queue associated with the task
    void (*handler)(void *event); // event handler pointer-to-function
} TCB;   // task control block

void task_routine(void *arg) {
    while (1) { // main event loop of the task
        void *event = msg_queue_get(((TCB *)arg)->queue); // wait for event
        (*((TCB *)arg)->handler)(event);// handle the event without blocking
    }
}

The last snippet of code is generic, meaning that this simple event-loop can be used for all tasks in you application. So at this point, you can consider the task_routine() function as part of the generic event-driven infrastructure for executing your applications, which consist of event-handler functions.

What this way of thinking gives you is quite significant, because in fact you have just created your first event-driven framework.

The distinction between a framework and a toolkit is simple. A toolkit, such as an RTOS, is essentially a collection of functions that you can call. When you use a toolkit, you write the main body of the application (such as all the task routines) and you call the various functions from the RTOS. When you use a framework, you reuse the main body (such as the task_routine() function) and you provide the code that the framework calls. In other words, a framework uses inverted control compared to a traditional RTOS.

Inversion of control is a very common phenomenon in all event-driven architectures, because it recolonizes the plain fact that the events are controlling the application, not the other way around.

In my next post in the “I hate RTOSes” series, I’ll talk about challenges of programming without blocking. I’ll explain what you need to sacrifice when you write non-blocking code and why this often leads to “spaghetti” code. Stay tuned!

RTOS considered harmul

April 12th, 2010 by Miro Samek

I have to confess that I’ve been experiencing a severe writer’s block lately. It’s not that I’m short of subjects to talk about, but I’m getting tired of circling around the most important issues that matter to me most and should matter the most to any embedded software developer. I mean the basic software structure.

Unfortunately, I find it impossible to talk about truly important issues without stepping on somebody’s toes, which means picking a fight. So, in this installment I decided to come out of the closet and say it openly: I consider RTOSes harmful, because they are a ticking bomb.

The main reason I say so is because a conventional RTOS implies a certain programming paradigm, which leads to particularly brittle designs. I’m talking about blocking. Blocking occurs any time you wait explicitly in-line for something to happen. All RTOSes provide an assortment of blocking mechanisms, such as various semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles. Typically, however, tasks block in many places scattered throughout various functions called from the task routine (the endless loop). For example, a task can block and wait for a semaphore that indicates end of an ADC conversion. In other part of the code, the same task might wait for a timeout event flag, and so on.

Blocking is insidious, because it appears to work initially, but quickly degenerates into a unmanageable mess. The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such task cannot be easily extended to handle other events, not just because the system is unresponsive, but also due to the fact the the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.

You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is developed over time by gradually adding new events and behaviors. The inflexibility prevents an application to grow that way, so the design degenerates in the process known as architectural decay. This in turn makes it often impossible to even finish the original application, let alone maintain it.

The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data as other feature in another tasks (we call such features cohesive). But placing the new feature in a different task requires very careful sharing of the common data. So mutexes and other such mechanisms must be applied. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, hairy, unintended side-effects.

For decades embedded engineers were taught to believe that the only two alternatives for structuring embedded software are a “superloop” (main+ISRs) or an RTOS. But this is of course not true. Other alternatives exist, specifically event-driven programming with modern state machines is a much better way. It is not a silver bullet, of course, but after having used this method extensively for over a decade I will never go back to a raw RTOS. I plan to write more about this better way, why it is better and where it is still weak. Stay tuned.

Free store is not free lunch

January 29th, 2010 by

In my previous post “A Heap of Problems” I have compiled a list of problems the free store (heap) can cause in real-time embedded (RTE) systems. This was quite a litany, although I didn’t even touch the more subtle problems yet (for example, the C++ exception handling mechanism can cause memory leaks when a thrown exception bypasses memory de-allocation).

But even though the free store is definitely not a free lunch, getting by without the heap is certainly easier said than done. In C, you will have to rethink implementations that use lists, trees, and other dynamic data structures. You’ll also have to severely limit your choice of the third-party libraries and legacy code you want to reuse (especially if you borrow code designed for the desktop). In C++, the implications are even more serious because the object-oriented nature of C++ applications results in much more intensive dynamic-memory use than in applications using procedural techniques. For example, most standard C++ libraries (e.g., STL, Boost, etc.) requrie the heap. Without it, C++ simply does not feel like the same language.

Here are a few common sense guidelines for dealing with the heap:

1. For smaller systems, such as microcontrollers with only on-chip RAM, you probably don’t want to open the heap can of worms at all. The problems and waste that goes with the heap aren’t simply worth the trouble.

For systems with sufficient RAM, such as processors with megabytes of external DRAM, trading some of this cheap RAM for convenience in programming might be a reasonable deal. In the following discussion I assume that the system is big enough to run under a preemptive RTOS.

2. The simplest option is to limit the use of the heap to just one task. In this case, heap is not being shared concurrently and does not need any mutual-exclusion protection mechanism. To limit the non-determinism of the heap, I would recommend assigning low priority to the task that uses the heap. The priority should be lower than any real-time task.

3. At the expense of introducing a mutual protection to *all* heap operations (e.g., a mutex), you can allow more than one task to use the heap. However, I would still strongly recommend against using the heap in any tasks with real-time deadlines. All tasks that use the heap should run at a lower priority than any of the real-time tasks.

4. In any case, heap should never be used inside the interrupt service routines (ISRs).

In summary, using the heap in real-time embedded (RTE) systems always requires extra thought and discipline. You should always make sure that the heap is correctly integrated with your runtime environment.

A Heap of Problems

January 24th, 2010 by

Some design problems never seem to go away. You think that anybody who has been in the embedded software development business for a while must have learned to be wary of malloc() and free() (or their C++ counterparts new and delete). Then you find that many developers actually don’t know why embedded real-time systems are so particularly intolerant of heap problems.

For example, recently an Embedded.com reader attacked my comment to the article “Back to the Basics – Practical Embedded Coding Tips: Part 1 Reentrancy, atomic variables and recursion“, in which I advised against using the heap. Here is this reader’s argumentation:

I have no idea why did you bring up the pledge not to use the heap, on modern 32-bit MCUs (ARMs etc) there is no reason – and no justification – to avoid using the heap. The only reason not to use the heap is to avoid memory fragmentation, but good heap implementation and careful memory allocation planning will overcome that.

As I cannot disagree more with the statements above, I decided that it’s perhaps the time to re-post my “heap of problems” list, which goes as follows:

  • Dynamically allocating and freeing memory can fragment the heap over time to the point that the program crashes because of an inability to allocate more RAM. The total remaining heap storage might be more than adequate, but no single piece satisfies a specific malloc() request.
  • Heap-based memory management is wasteful. All heap management algorithms must maintain some form of header information for each block allocated. At the very least, this information includes the size of the block. For example, if the header causes a four-byte overhead, then a four-byte allocation requires at least eight bytes, so only 50 percent of the allocated memory is usable to the application. Because of these overheads and the aforementioned fragmentation, determining the minimum size of the heap is difficult. Even if you were to know the worst-case mix of objects simultaneously allocated on the heap (which you typically don’t), the required heap storage is much more than a simple sum of the object sizes. As a result, the only practical way to make the heap more reliable is to massively oversize it.
  • Both malloc() and free() can be (and often are) nondeterministic, meaning that they potentially can take a long (hard to quantify) time to execute, which conflicts squarely with real-time constraints. Although many RTOSs have heap management algorithms with bounded, or even deterministic performance, they don’t necessarily handle multiple small allocations efficiently.

Unfortunately, the list of heap problems doesn’t stop there. A new class of problems appears when you use heap in a multithreaded environment. The heap becomes a shared resource and consequently causes all the headaches associated with resource sharing, so the list goes on:

  • Both malloc() and free() can be (and often are) non-reentrant; that is, they cannot be safely called simultaneously from multiple threads of execution.
  • The reentrancy problem can be remedied by protecting malloc(), free(), realloc(), and so on internally with a mutex, which lets only one thread at a time access the shared heap. However, this scheme could cause excessive blocking of threads (especially if memory management is nondeterministic) and can significantly reduce parallelism. Mutexes can also be subject to priority inversion. Naturally, the heap management functions protected by a mutex are not available to interrupt service routines (ISRs) because ISRs cannot block.

Finally, all the problems listed previously come on top of the usual pitfalls associated with dynamic memory allocation. For completeness, I’ll mention them here as well.

  • If you destroy all pointers to an object and fail to free it or you simply leave objects lying about well past their useful lifetimes, you create a memory leak. If you leak enough memory, your storage allocation eventually fails.
  • Conversely, if you free a heap object but the rest of the program still believes that pointers to the object remain valid, you have created dangling pointers. If you dereference such a dangling pointer to access the recycled object (which by that time might be already allocated to somebody else), your application can crash.
  • Most of the heap-related problems are notoriously difficult to test. For example, a brief bout of testing often fails to uncover a storage leak that kills a program after a few hours, or weeks, of operation. Similarly, exceeding a real-time deadline because of nondeterminism can show up only when the heap reaches a certain fragmentation pattern. These types of problems are extremely difficult to reproduce.

A nail for a fuse

November 27th, 2009 by Michael Barr

If I were to search my soul, I’d have to admit that the use of assertions has helped me more than any other single technique, even more than my favorite state machines. But, the use of assertions, simple as they are, is surrounded by so many misconceptions and misunderstandings that it’s difficult to know where to start. The discussion around the recent Jack Ganssle’s article “The Use of Assertions” shows many of the misunderstandings.

I suppose that the main difficulties in understanding assertions lay in the fact that while the implementation of assertions is trivial, the effective use of assertions requires a paradigm shift in the view of software construction and the nature of software errors in particular.

Perhaps the most important point to understand about assertions is that they neither handle nor prevent errors, in the same way as fuses in electrical circuits don’t prevent accidents or abuse. In fact, a fuse is an intentionally introduced weak spot in the circuit that is designed to fail sooner than anything else, so actually the whole circuit with a fuse is less robust than without it.

I believe that the analogy between assertions and fuses (which, by the way has been originally proposed by Niall Murphy in a private conversation at one of the Embedded Systems Conferences) is accurate and valuable, because it helps in making the paradigm shift in understanding many aspects of using assertions. Here I’d only like to elaborate just two aspects.

First, the analogy to fuses correctly suggests that assertions work best in the “weakest” spots. Such “weak spots” are often found at the interface between components (e.g., preconditions in a function) but there are many others. The best assertions are those that protect the most of the system. In other words, the best assertions catch errors that would have the most impact on the rest of the system.

The second important implication of the fuse analogy is the issue of disabling assertions in the production code. As the comments to the aforementioned article suggest, most engineers tend to disable assertions before shipping the code, especially in the safety critical products. I believe that this is exactly backwards.

I understand that the standard “assert.h” header file is designed to use assertions only in a debug build, so the macro assert() compiles to nothing when the symbol NDEBUG is defined. I strongly suggest rethinking this philosophy, because disabling assertions in the release configuration is like using nails, paper clips, or coins for fuses. Just imagine finding a nail in place of a fuse in a hospital’s operating room or in a dashboard of an airliner? What would you think of this sort of “repairs”?

Yet, by disabling assertions in our code we do exactly this.

I believe it is very important to understand that assertions have a very important role to play, especially in the filed and especially in the mission-critical systems, because they add additional safety layer in the software. Perhaps the biggest fallacy of our profession is the naïve optimism that our software will not fail. In a nutshell we somehow believe that when we stop checking for errors, they will stop occurring. After all–we don’t see them anymore. But this is not how computer systems work. An error, no matter how small, can cause catastrophic failure. With software, there are no “small” errors. Our software is either in complete control over the machine or it isn’t. Assertions help us know when we lose control.

So what do I suggest we do when the assertion fires in the filed? The proper course of action requires a lot of thinking and sometimes a lot of work. In safety-critical systems software failure should be part of the fault-tree analysis. Sometimes, reaching a fail-safe state requires some redundancy in the hardware. In any case, the assertion failures should be extensively tested.

But this is really the best we can do.