Archive for the ‘RTOS Multithreading’ Category

Beyond the RTOS: A Better Way to Design Real-Time Embedded Software

Wednesday, April 27th, 2016 Miro Samek

An RTOS (Real-Time Operating System) is the most universally accepted way of designing and implementing embedded software. It is the most sought after component of any system that outgrows the venerable “superloop”. But it is also the design strategy that implies a certain programming paradigm, which leads to particularly brittle designs that often work only by chance. I’m talking about sequential programming based on blocking.

Blocking occurs any time you wait explicitly in-line for something to happen. All RTOSes provide an assortment of blocking mechanisms, such as time-delays, semaphores, event-flags, mailboxes, message queues, and so on. Every RTOS task, structured as an endless loop, must use at least one such blocking mechanism, or else it will take all the CPU cycles. Typically, however, tasks block not in just one place in the endless loop, but in many places scattered throughout various functions called from the task routine. For example, in one part of the loop a task can block and wait for a semaphore that indicates the end of an ADC conversion. In other part of the loop, the same task might wait for an event flag indicating a button press, and so on.

This excessive blocking is insidious, because it appears to work initially, but almost always degenerates into a unmanageable mess. The problem is that while a task is blocked, the task is not doing any other work and is not responsive to other events. Such a task cannot be easily extended to handle new events, not just because the system is unresponsive, but mostly due to the fact that the whole structure of the code past the blocking call is designed to handle only the event that it was explicitly waiting for.

You might think that difficulty of adding new features (events and behaviors) to such designs is only important later, when the original software is maintained or reused for the next similar project. I disagree. Flexibility is vital from day one. Any application of nontrivial complexity is developed over time by gradually adding new events and behaviors. The inflexibility makes it exponentially harder to grow and elaborate an application, so the design quickly degenerates in the process known as architectural decay.


The mechanisms of architectural decay of RTOS-based applications are manifold, but perhaps the worst is the unnecessary proliferation of tasks. Designers, unable to add new events to unresponsive tasks are forced to create new tasks, regardless of coupling and cohesion. Often the new feature uses the same data and resources as an already existing feature (such features are called cohesive). But unresponsiveness forces you to add the new feature in a new task, which requires caution with sharing the common data. So mutexes and other such blocking mechanisms must be applied and the vicious cycle tightens. The designer ends up spending most of the time not on the feature at hand, but on managing subtle, intermittent, unintended side-effects.

For these reasons experienced software developers avoid blocking as much as possible. Instead, they use the Active Object design pattern. They structure their tasks in a particular way, as “message pumps”, with just one blocking call at the top of the task loop, which waits generically for all events that can flow to this particular task. Then, after this blocking call the code checks which event actually arrived, and based on the type of the event the appropriate event handler is called. The pivotal point is that these event handlers are not allowed to block, but must quickly return to the “message pump”. This is, of course, the event-driven paradigm applied on top of a traditional RTOS.

While you can implement Active Objects manually on top of a conventional RTOS, an even better way is to implement this pattern as a software framework, because a framework is the best known method to capture and reuse a software architecture. In fact, you can already see how such a framework already starts to emerge, because the “message pump” structure is identical for all tasks, so it can become part of the framework rather than being repeated in every application.


This also illustrates the most important characteristics of a framework called inversion of control. When you use an RTOS, you write the main body of each task and you call the code from the RTOS, such as delay(). In contrast, when you use a framework, you reuse the architecture, such as the “message pump” here, and write the code that it calls. The inversion of control is very characteristic to all event-driven systems. It is the main reason for the architectural-reuse and enforcement of the best practices, as opposed to re-inventing them for each project at hand.

But there is more, much more to the Active Object framework. For example, a framework like this can also provide support for state machines (or better yet, hierarchical state machines), with which to implement the internal behavior of active objects. In fact, this is exactly how you are supposed to model the behavior in the UML (Unified Modeling Language).

As it turns out, active objects provide the sufficiently high-level of abstraction and the right level of abstraction to effectively apply modeling. This is in contrast to a traditional RTOS, which does not provide the right abstractions. You will not find threads, semaphores, or time delays in the standard UML. But you will find active objects, events, and hierarchical state machines.

An AO framework and a modeling tool beautifully complement each other. The framework benefits from a modeling tool to take full advantage of the very expressive graphical notation of state machines, which are the most constructive part of the UML.

In summary, RTOS and superloop aren’t the only game in town. Actor frameworks, such as Akka, are becoming all the rage in enterprise computing, but active object frameworks are an even better fit for deeply embedded programming. After working with such frameworks for over 15 years , I believe that they represent a similar quantum leap of improvement over the RTOS, as the RTOS represents with respect to the “superloop”.

If you’d like to learn more about active objects, I recently posted a presentation on SlideShare: Beyond the RTOS: A Better Way to Design Real-Time Embedded Software

Also, I recently ran into another good presentation about the same ideas. This time a NASA JPL veteran describes the best practices of “Managing Concurrency in Complex Embedded Systems”. I would say, this is exactly active object model. So, it seems that it really is true that experts independently arrive at the same conclusions…

RTOS, TDD and the “O” in the S-O-L-I-D rules

Monday, June 11th, 2012 Miro Samek

In Chapter 11 of the “Test-Driven Development for Embedded C” book, James Grenning discusses the S-O-L-I-D rules for effective software design. These rules have been compiled by Robert C. Martin and are intended to make a software system easier to develop, maintain, and extend over time. The acronym SOLID stands for the following five principles:

S: Single Responsibility Principle
O: Open-Closed Principle
L: Liskov Substitution Principle
I: Interface Segregation Principle
D: Dependency Inversion Principle

Out of all the SOLID design rules, the “O” rule (Open-Closed Principle) seems to me the most important for TDD, as well as the iterative and incremental development in general. If the system we design is “open for extension but closed for modification”, we can keep extending it without much re-work and re-testing of the previously developed and tested code. On the other hand, if the design requires constant re-visiting of what’s already been done and tested, we have to re-do both the code and the tests and essentially the whole iterative, TDD-based approach collapses. Please note that I don’t even mean here extensibility for the future versions of the system. I mean small, incremental extensions that we keep piling up every day to build the system in the first place.

So, here is my problem: RTOS-based designs are generally lousy when it comes to the Open-Closed Principle. The fundamental reason is that RTOS-based designs use blocking for everything, from waiting on a semaphore to timed delays. Blocked tasks are unresponsive for the duration of the blocking and the whole intervening code is designed to handle this one event on which the task was waiting. For example, if a task blocks and waits for a button press, the code that follows the blocking call handles the button. So now, it is hard to add a new event to this task, such as reception of a byte from a UART, because of the timing (waiting on user input is too long and unpredictable) and because of the whole intervening code structure. In practice, people keep adding new tasks that can wait and block on new events, but this often violates the “S” rule (Single Responsibility Principle). Often, the added tasks have the same responsibility as the old tasks and have high degree of coupling (cohesion) with them. This cohesion requires sharing resources (a nightmare in TDD) and even more blocking with mutexes, etc.

Compare this with the event-driven approach, in which the system processes events quickly without ever blocking. Extending such systems with new events is trivial and typically does not require re-doing existing event handlers. Therefore such designs realize the Open-Closed Principle very naturally. You can also much more easily achieve the Single Responsibility Principle, because you can easily group related events in one cohesive design unit. This design unit (an active object) becomes also natural unit for TDD.

So, it seems to me that TDD should naturally favor event-driven approaches, such as active objects (actors), over traditional blocking RTOS.

I’m really curious about your thoughts about this, as it seems to me quite fundamental to the success of TDD. I’m looking forward to an interesting discussion.

ESD closes shop. What’s next in store for embedded programming?

Sunday, April 29th, 2012 Miro Samek

The demise of the ESD Magazine marks the end of an era. In his recent post “Trends in Embedded Software Design“, the magazine insider Michael Barr commemorates this occasion by looking back at the early days and offering a look ahead at the new emerging trends. As we all enjoy predictions, I’d also like to add a couple of my own thoughts about the current status and the future developments in embedded systems programming.


It’s never been better to be an embedded software engineer!

When I joined the embedded field in the mid 1990s, I thought that I was late to the party. The really cool stuff, such as programming in C, C++, or Ada (as opposed to assembly), RTOS, RMA, or even modeling (remember ROOM?) have been already invented and applied to programming embedded systems. The Embedded Systems Programming magazine brought every month invaluable, relevant articles that advanced the art and taught the whole generation. But then, right around the time of the first Internet boom, something happened that slowed down the progress. The universities and colleges stopped teaching C in favor of Java and, the numbers of graduates with skills necessary to write embedded code dropped sharply, at least in the U.S. In 2005 the Embedded Systems Programming magazine has been renamed to Embedded Systems Design (ESD), and lost its focus on embedded software. After this transition, I’ve noticed that the ESD articles became far less relevant to my software work.

All this is ironic, because at the same time embedded software grew to be more important than ever. The whole society is already completely dependent on embedded software and looks to embedded systems for solutions to the toughest problems of our time: environment protection, energy production and delivery, transportation, and healthcare. With the Moore’s law marching unstoppably into its fifth decade, even hardware design starts looking more like software development. There is no doubt in my mind: the golden time of embedded systems programming is right now. The skyrocketing demand for embedded software, on one hand, and diminished supply of qualified embedded developers on the other hand, creates a perfect storm. It simply has never been better to be an embedded software engineer! And I don’t see this changing in the foreseeable future.


Future trends

Speaking about the future, it’s obvious that we need to develop more embedded code with less people in less time. The only way to achieve this is to increase programmer’s productivity. And the only know way to boost productivity is to increase the level of abstraction either by working with higher-level concepts or by building code from higher-level components (code reuse).

Trend 1: Real-time embedded frameworks

My first prediction is that embedded applications will increasingly be based on specialized real-time embedded frameworks, which will increasingly augment and eventually displace the traditional RTOSes.

The software development for the desktop, the web, or mobile devices has already moved to frameworks (.NET, Qt, Ruby on Rails, Android, Akka, etc.), far beyond the raw APIs of the underlying OSes. In contrast, the dominating approach in the embedded space is still based directly on the venerable RTOS (or the “superloop” at the lowest end). The main difference is that when you use an RTOS you write the main body of the application (such as the thread routines for all your tasks) and you call the RTOS services (e.g., a semaphore, or a time delay). When you use a framework, you reuse the main body and write the code that it calls. In other words, the control resides in the framework, not in your code, so the control is inverted compared to an application based on an RTOS.

The advantage, long discovered by the developers of “big” software, is that frameworks offer much higher level of architectural reuse and give conceptual integrity (often based on design patterns) to the applications. Frameworks also encapsulate the difficult or dangerous aspects of their application domains. For example, most programmers vastly underestimate the skills needed to use the various RTOS mechanisms, such as semaphores, mutexes, or even simple time delays, correctly and therefore developers vastly underestimate the true costs of using an RTOS. A generic, event-driven, real-time framework can encapsulate such low-level mechanisms, so the programmers can develop responsive, multitasking applications without even knowing what a semaphore is. The difference is like having to build a bridge each time you want to cross a river, and driving over a bridge that experts in this domain have built for you.

Real-time, embedded frameworks are not new and, in fact, have been extensively used for decades inside modeling tools capable of code generation. For example, a leading such tool, IBM Rhapsody, comes with an assortment of frameworks (OXF, IDF, SXF, MXF, MicroC, as well as third-party frameworks, such as RXF from Willert).

However, I don’t think that many people realize that frameworks of this sort can be used even without the big modeling tools and that they can be very small, about the same size as bare-bones RTOS kernels. For example, the open source QP/C or QP/C++ frameworks from Quantum Leaps (my company) require only 3-4KB of ROM and significantly less RAM than an RTOS.

I believe that in the future we will see proliferation of various real-time embedded frameworks, just as we see proliferation of RTOSes today.


Trend 2: Agile modeling and code generation

Another trend, closely related to frameworks is modeling and automatic code generation.

I realize of course, that big tools for modeling and code generation have been tried and failed to penetrate the market beyond a few percentage points. The main reason, as I argued in my previous post “Economics 101: UML in Embedded Systems“, is the poor ROI (return on investment) of such tools. The investment, in terms of tool’s cost, learning curve, maintenance, and constant “fighting the tool” is so large, that the returns must also be extremely high to make any economic sense.

As the case in point take for example xtUML (executable UML) tools, which are perhaps the highest-ceremony tools of this kind on the market. In xtUML, you create platform-independent models (PIMs), which need to be translated to platform-specific models (PSMs) by highly customizable “model compilers”. To preserve the illusion of complete “platform and technology independence” (whatever that means in embedded systems that by any definition of the term are specific to the task and technology), any code entered into the model is specified in “Object Action Language” (OAL). AOL functionally corresponds to crippled C, but has a different syntax. All this is calculated to keep you completely removed from the generated code, which you can only influence by tweaking the “model compiler”.

But what if you could skip the indirection level of AOL and use C or C++ directly for programming “action functions”? (Maybe you don’t care for the ability to change the generated code from C to, say, Java). What if you could replace the “model compiler” indirection layer with a real-time framework? What if the tool can turn the code generation “upside down” and give you direct access to the code and the freedom to actually do the physical design (see my previous post “Turning code generation upside down“)?

Obviously, the resulting tool will be far less ambitious and “lover level” than xtUML. But “lower level” is not necessarily pejorative. In fact, the main reason for popularity of C in embedded programming is that C is a relatively “low level” high-level programming language.

I predict that we will see more innovation in the area of “low level” and low-cost modeling and code generating tools (such as the free QM modeling tool from Quantum Leaps). I hope that this innovation will eventually bring us “modeling and code generation for the masses”.

What’s the state of your Cortex?

Monday, September 26th, 2011 Miro Samek

Recently, I’ve been involved in a fascinating bug hunt related to a very peculiar behavior of the ARM Cortex-M3 core. Given the incredible popularity of this core, I thought that digging a little deeper into the mysteries of ARM Cortex could be interesting and informative.

First, I need to provide some background. So, the bug was related to the very unique ARM Cortex-M exception type called PendSV. This is an exception triggered by software, but unlike any regular software interrupt, PendSV is an asynchronous exception. This means that PendSV typically does not run immediately after it is triggered, but only after the Nested Vectored Interrupt Controller (NVIC) determines that the priority of the currently executing code drops below the priority associated with PendSV.

At this point, you might wonder, why and where would such “Pended Software Interrupt” be useful? Well, it turns out that PendSV is the only reliable way on ARM Cortex-M to find out when all (possibly nested) interrupt service routines (ISRs) have completed. And this determination is essential to run the scheduler in any preemptive real time kernel.

Virtually all preemptive RTOSes for ARM Cortex-M processors work as follows. Upon initialization the priority associated with PendSV is set to be the lowest of all exceptions (0xFF). All ISRs in the system, prioritized above PendSV, trigger the PendSV exception by writing 1 to the PENDSVSET bit in the NVIC ICSR register, like this:

*((uint32_t volatile *)0xE000ED04) = 0x10000000;

Now, the heavy lifting is left entirely to the NVIC hardware. NVIC will activate PendSV only after the last of all nested interrupts completes and is about to return to the preempted task context. This is exactly the right time for a context switch. In other words, the PendSV exception is designed to call the scheduler and perform the task preemption. ARM Cortex is so smart that it eliminates the overhead of exiting one exception (the last nested interrupt) and activating another (the PendSV) in the trick called “tail-chaining”.

Everything looks easy so far, but ARM Cortex has one more trick up it’s sleeve and this optimization, called “late-arrival”, has interesting side effects related to PendSV. This subtle interaction between PendSV and late-arrival leads essentially to a hardware race condition I’ve recently had a pleasure to chase down.

To illustrate the events that lead up to the bug, I’ve prepared a distilled hardware trace available for viewing at ARM-Cortex-M3_bug.txt. Please go ahead and click on this link to follow along.

The trace starts with an interrupt entry (labelled as Exception 83). This system runs under the preemptive kernel called QK, so the ISR calls QK_ISR_ENTRY() and later QK_ISR_EXIT() macros to inform the kernel about the interrupt. At trace index 069545 the QK_ISR_EXIT() macro triggers the PendSV exception by writing 0x10000000 into the ICSR register.

After this, the Exception 83 runs to completion and eventually tail-chains to Exception 14 (PendSV). This is all as expected.

However, the real problem starts at trace index 069618, at which the execution of the first instruction of PendSV (CPSID i) is cancelled due to arrival of a higher-priority Exception 36 (another interrupt).

This cancellation of low-priority Exception 14 in favor of the higher-priority Exception 36 is the ARM Cortex-special called late arrival. The ARM core optimizes the interrupt entry (which is identical for all exception), and instead of entering the low-priority exception and than immediately high-priority exception, it simply enters the high-priority exception.

The problem is that just before the late arrival, the PENDSVSET bit in the NVIC-ICSR register is already cleared.

However, the late-arriving Exception 36 sets this bit again in QK_ISR_EXIT(), which is normal for any interrupt (trace index 070126).

The Exception 36 eventually exits to the original PendSV (trace index 070130), but this is not the usual tail-chaining (the trace indicates tail-chaining by the pair Exception Exit/Exception Entry). This time around the trace shows only Exception Exit, but no entry.

This difference has very important implication, which is that the PENDSVSET bit in the NVIC-ICSR register is not cleared (remember that it is set, however).

What unfolds next is the consequence of the PENDSVSET bit being set. PendSV executes, fakes its own return to the QK scheduler, and eventually it unlocks interrupts. But before SVCall (Exception 11) can execute, the PendSV Exception 14 is taken again (because it is triggered by the PENDSVSET bit). This makes no sense and should never happen, because PendSV should never be in the triggered state at this point.

So, what are the consequences of this behavior and what is the fix?

Well, as you can see, due to late-arrival PendSV can be occasionally entered with the PENDSVSET bit being set, so it will be triggered again immediately after it completes. This might or might not have adverse consequences. In case of the QK kernel, this was unacceptable and led to a Hardware Fault. In other RTOSes it might simply cause another scheduler call, waste of some CPU, and delay of the task-level response, but perhaps not a catastrophic failure.

The actual fix of the problem is very simple. Since you cannot rely on the automatic clearing of the PENDSVSET bit in the NVIC-ICSR register, you need to clear it manually (by writing 1 to the PENDSVCLR bit in the NVIC-ICSR register.) Of course this is wasteful, because only one time in a million this bit is actually not cleared automatically.

Interestingly, I have not seen such writing to the PENDSVCLR bit in open source RTOSes for ARM Cortex-M (such as Recently, I’ve come across some posts to the ARM Community Forums that this problem exists for the Frescale MQX RTOS (see PendSV pending inside PendSV handler? (Cortex-M4)).

If you use a preemptive kernel on ARM Cortex-M0/M3, perhaps you could check how your kernel handles PendSV. If you don’t see an explicit write to the PENDSVCLR bit, I would recommend that you think through the consequences of re-entering PendSV. I’d be very interested to collect a survey of how the existing kernels for ARM Cortex-M handle this situation.

Is an RTOS really the best way to design embedded systems?

Tuesday, June 7th, 2011 Miro Samek

LinkedIn RTE group

Recently I’ve been involved in a discussion on the LinkedIn Real-Time Embedded Engineering group, which I started with the question “Is an RTOS really the best way to design embedded systems?“.

The discussion, which has swollen to way over 600 comments by now, has sometimes low signal to noise ratio, but I believe it is still interesting.

I consider this discussion to be a continuation of the topic from my April blog post I hate RTOSes.

As before, my main point is centered on the fundamental mismatch of using the sequential programming paradigm (RTOS or superloop) to solve problems that are event-driven by nature.

I’m really curious what the visitors to EmbeddedGurus think.