embedded software boot camp

What’s the state of your Cortex?

September 26th, 2011 by Miro Samek

Recently, I’ve been involved in a fascinating bug hunt related to a very peculiar behavior of the ARM Cortex-M3 core. Given the incredible popularity of this core, I thought that digging a little deeper into the mysteries of ARM Cortex could be interesting and informative.

First, I need to provide some background. So, the bug was related to the very unique ARM Cortex-M exception type called PendSV. This is an exception triggered by software, but unlike any regular software interrupt, PendSV is an asynchronous exception. This means that PendSV typically does not run immediately after it is triggered, but only after the Nested Vectored Interrupt Controller (NVIC) determines that the priority of the currently executing code drops below the priority associated with PendSV.

At this point, you might wonder, why and where would such “Pended Software Interrupt” be useful? Well, it turns out that PendSV is the only reliable way on ARM Cortex-M to find out when all (possibly nested) interrupt service routines (ISRs) have completed. And this determination is essential to run the scheduler in any preemptive real time kernel.

Virtually all preemptive RTOSes for ARM Cortex-M processors work as follows. Upon initialization the priority associated with PendSV is set to be the lowest of all exceptions (0xFF). All ISRs in the system, prioritized above PendSV, trigger the PendSV exception by writing 1 to the PENDSVSET bit in the NVIC ICSR register, like this:

*((uint32_t volatile *)0xE000ED04) = 0x10000000;

Now, the heavy lifting is left entirely to the NVIC hardware. NVIC will activate PendSV only after the last of all nested interrupts completes and is about to return to the preempted task context. This is exactly the right time for a context switch. In other words, the PendSV exception is designed to call the scheduler and perform the task preemption. ARM Cortex is so smart that it eliminates the overhead of exiting one exception (the last nested interrupt) and activating another (the PendSV) in the trick called “tail-chaining”.

Everything looks easy so far, but ARM Cortex has one more trick up it’s sleeve and this optimization, called “late-arrival”, has interesting side effects related to PendSV. This subtle interaction between PendSV and late-arrival leads essentially to a hardware race condition I’ve recently had a pleasure to chase down.

To illustrate the events that lead up to the bug, I’ve prepared a distilled hardware trace available for viewing at ARM-Cortex-M3_bug.txt. Please go ahead and click on this link to follow along.

The trace starts with an interrupt entry (labelled as Exception 83). This system runs under the preemptive kernel called QK, so the ISR calls QK_ISR_ENTRY() and later QK_ISR_EXIT() macros to inform the kernel about the interrupt. At trace index 069545 the QK_ISR_EXIT() macro triggers the PendSV exception by writing 0×10000000 into the ICSR register.

After this, the Exception 83 runs to completion and eventually tail-chains to Exception 14 (PendSV). This is all as expected.

However, the real problem starts at trace index 069618, at which the execution of the first instruction of PendSV (CPSID i) is cancelled due to arrival of a higher-priority Exception 36 (another interrupt).

This cancellation of low-priority Exception 14 in favor of the higher-priority Exception 36 is anotehr ARM Cortex-special called late arrival. The ARM core optimizes the interrupt entry (which is identical for all exception), and instead of entering the low-priority exception and than immediately high-priority exception, it simply enters the high-priority exception.

The problem is that just before the late arrival, the PENDSVSET bit in the NVIC-ICSR register is already cleared.

However, the late-arriving Exception 36 sets this bit again in QK_ISR_EXIT(), which is normal for any interrupt (trace index 070126).

The Exception 36 eventually exits to the original PendSV (trace index 070130), but this is not the usual tail-chaining (the trace indicates tail-chaining by the pair Exception Exit/Exception Entry). This time around the trace shows only Exception Exit, but no entry.

This difference has very important implication, which is that the PENDSVSET bit in the NVIC-ICSR register is not cleared (remember that it is set, however).

What unfolds next is the consequence of the PENDSVSET bit being set. PendSV executes, fakes its own return to the QK scheduler, and eventually it unlocks interrupts. But before SVCall (Exception 11) can execute, the PendSV Exception 14 is taken again (because it is triggered by the PENDSVSET bit). This makes no sense and should never happen, because PendSV should never be in the triggered state at this point.

***
So, what are the consequences of this behavior and what is the fix?

Well, as you can see, due to late-arrival PendSV can be occasionally entered with the PENDSVSET bit being set, so it will be triggered again immediately after it completes. This might or might not have grave consequences. In case of the QK kernel, this was unacceptable and led to a Hardware Fault. In other RTOSes it might simply cause another scheduler call, waste of some CPU, and delay of the task-level response, but perhaps not a catastrophic failure.

The actual fix of the problem is very simple. Since you cannot rely on the automatic clearing of the PENDSVSET bit in the NVIC-ICSR register, you need to clear it manually (by writing 1 to the PENDSVCLR bit in the NVIC-ICSR register.) Of course this is wasteful, because only one time in a million this bit is actually not cleared automatically.

Interestingly, I have not seen such writing to the PENDSVCLR bit in open source RTOSes for ARM Cortex-M (such as FreeRTOS.org). Recently, I’ve come across some posts to the ARM Community Forums that this problem exists for the Frescale MQX RTOS (see PendSV pending inside PendSV handler? (Cortex-M4)).

If you use a preemptive kernel on ARM Cortex-M0/M3, perhaps you could check how your kernel handles PendSV. If you don’t see an explicit write to the PENDSVCLR bit, I would recommend that you think through the consequences of re-entering PendSV. I’d be very interested to collect a survey of how the existing kernels for ARM Cortex-M handle this situation.

On the Origin of Software by Means of Artificial Selection

August 5th, 2011 by Miro Samek

If you haven’t put your hands on the recent James Grenning’s book “Test-Driven Development for Embedded C” yet, I highly recommend you do. Here is why.

First, you need to realize that this book is not really about testing–it is about software development. The central idea behind TDD (Test-Driven Development) is that software, as any complex system in nature, has to evolve gradually and has to keep working throughout all the development stages. This idea is of course not new and goes back all the way to the Darwin’s “On the Origin of Species”. More recently, in his 1977 book “Systemantics: How Systems Work and Especially How They Fail” John Gall wrote:

“A complex system that works is invariably found to have evolved from a simple system that worked…. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system”.

The key point of TDD is to subject the software to constant “struggle for existence” to actually see if it indeed is still working and in the process weed out any undesired mutations.

Of course, in developing software we don’t have the deep evolutionary time, so we need to accelerate the pace of software evolution. We do this by automating the testing.

For embedded development this means avoiding the target system bottleneck (James calls it DOH-Development On Hardware). The embedded TDD strategy is to develop embedded software on the desktop and only occasionally check it on the real embedded hardware. This means that the C/C++ compilers and tools for the desktop (such as Visual C++, MinGW, or Cygwin for Windows and GCC for Linux and Mac OS X) are important for us.

The book comes with testing frameworks (Unity and CppUTest) and plenty of example code. The code works right of the box on Linux, but I had some issues running it on Windows. In the process of learning the tools, I’ve prepared a small template for Visual C++ 2008, which is available for download from:

http://www.state-machine.com/attachments/blinky_tdd.zip

This demo assumes that you download and install the CppUTest framework (http://sourceforge.net/projects/cpputest/) and that you define the environment variable CPP_U_TEST to point to the directory where you installed CppUTest. The Visual Studio solution AllTest.sln is located in the blinky\tests directory.

I’d love to hear about your experiences with TDD in embedded programming. I’m sure I will blog more about it in the future.

Protothreads versus State Machines

June 9th, 2011 by Miro Samek

For a number of years I’ve been getting questions regarding Protothreads and comparisons to state machines. Here is what I think.

Protothreads are an attempt to write event-driven code in a sequential way. To do so, protothreads introduce a concept of “blocking abstraction” to event-driven programming–something that event-driven programming is trying to get rid of in the first place.

Obviously, to be compatible with event-driven programming, protothreads cannot *really* block, at least not in the same sense as traditional threads of a conventional blocking RTOS can. Instead, a protothread is still called for every event and still *returns* to the caller without really blocking. However, when a given event does not match the “blocking abstraction”, the protothread returns without doing anything and without progressing. Only when the current event matches the “blocking abstraction” the protothread advances to the next “blocking abstraction” and also returns. Please note that protothreads allow standard flow control statements, such as IF-THEN-ELSE and WHILE loops between any two “blocking abstractions”. Therefore the program feels more “natural” for designers used to the traditional sequential programming style. You simply see the expected sequence of events.

Protothreads are indeed a simplification, but only for *sequential problems*, in which only specific sequences of events are valid and all other sequences are invalid. Examples for using protothreads include the sequence of events for initializing a radio modem.

However, protothreads lose their advantage entirely if there are many valid sequences of events. This is actually the most common situation in event-driven systems. State machines are capable to handle multiple sequences of events easily. In fact, state machines are getting simpler when the sequence of events matters less. In contrast, protothreads are getting more and more complex when they need to accept more sequences of events.

So, in the end, protothreads are an attempt to replace state machines, which are considered complex and messy by the inventors of the protothread mechanism. But, the “messiness” of state machines is obviously a very subjective statement. A good state machine implementation technique can remove many (most) accidental difficulties of coding state machines. I mean, if a state machine as depicted in a state diagram is simple, and if the code does not reflect this simplicity, the problem is with the implementation technique, not with the inherent complexity of a state machine.

The bottom line: good state machine implementation techniques eliminate most reasons for using protothreads. State machines are far more generic and flexible, because they can easily handle multiple sequences of events. In comparison, protothreads are intentionally crippled state machines that transition implicitly from one “blocking abstraction” to another executing code in between as the “actions” on such implicit transition.

Is an RTOS really the best way to design embedded systems?

June 7th, 2011 by Miro Samek

LinkedIn RTE group

Recently I’ve been involved in a discussion on the LinkedIn Real-Time Embedded Engineering group, which I started with the question “Is an RTOS really the best way to design embedded systems?”.

The discussion, which has swollen to over 0xFF comments by now, has sometimes low signal to noise ratio, but I believe it is still interesting.

I consider this discussion to be a continuation of the topic from my April blog post I hate RTOSes.

As before, my main point is centered on the fundamental mismatch of using the sequential programming paradigm (RTOS or superloop) to solve problems that are event-driven by nature.

I’m really curious what the visitors to EmbeddedGurus think.

Rapid Prototyping with QP and Arduino

February 21st, 2011 by Miro Samek

Arduino (see arduino.cc) is an open-source electronics prototyping platform, designed to make digital electronics more accessible to non-specialists in multidisciplinary projects. Arduino has gained popularity, because it provides both hardware and compatible software focused on developing working prototypes. By making it easier to build the first prototype, Arduino lowers the barrier of entry to the field of modern microelectronics and enables a host of new applications.

However, programming the Arduino microcontroller (Atmel AVRmega) remains a challenge. Traditionally, Arduino programs are written in a sequential manner, which means that whenever an Arduino program needs to synchronize with some external event, such as a button press, arrival of a character through the serial port, or a time delay, it explicitly waits in-line for the occurrence of the event. Waiting “in-line” means that the Arduino processor spends all of its cycles constantly checking for some condition in a tight loop (called the polling loop).

Although this approach is functional in many situations, it doesn’t work very well when there are multiple possible sources of events whose arrival times and order you cannot predict and where it is important to handle the events in a timely manner. The fundamental problem is that while a sequential program is waiting for one kind of event (e.g., a button press), it is not doing any other work and is not responsive to other events (e.g., characters from the serial port).

Another big problem with the sequential program structure is wastefulness in terms of power dissipation. Regardless of how much or how little actual work is being done, the Arduino processor is always running at top speed, which drains the battery quickly and prevents you from making truly long-lasting battery-powered devices.

Event-Driven Programming for Arduino

For these and other reasons experienced programmers turn to the long-know design strategy called event-driven programming, which requires a distinctly different way of thinking than conventional sequential programs. All event-driven programs are naturally divided into the application, which actually handles the events, and the supervisory event-driven infrastructure (framework), which waits for events and dispatches them to the application. The control resides in the event-driven framework, so from the application standpoint, the control is inverted compared to a traditional sequential program.

It turns out that the QP/C++ state machine framework beautifully complements the Arduino platform and provides everything you need to build responsive, robust, and power-efficient Arduino programs based on modern hierarchical state machines. In many ways the open source QP/C++ state machine framework is like a modern real-time operating system (RTOS) specifically designed for executing event-driven state machines. The free QM™ graphical modeling tool takes Arduino programming to the next level, by enabling automatic code generation of complete Arduino sketches.

The QP Development Kit (QDK) for Arduino is different from most other QDKs available from state-machine.com in that it is self-contained and includes the simplified and compacted source-code version of the QP/C++ framework. The QDK-Arduino is designed to plug directly into the Arduino IDE to become immediately useful without building a binary library. The extensive Application Note “Event Driven Arduino Programming with QP” describes the main concepts and how to get started.