embedded software boot camp

Cutting Through the Confusion with ARM Cortex-M Interrupt Priorities

February 1st, 2014 by Miro Samek

The insanely popular ARM Cortex-M processor offers very versatile interrupt priority management, but unfortunately, the multiple priority numbering conventions used in managing the interrupt priorities are often counter-intuitive, inconsistent, and confusing, which can lead to bugs. In this post I attempt to explain the subject and cut through the confusion.

The Inverse Relationship Between Priority Numbers and Urgency of the Interrupts

The most important fact to know is that ARM Cortex-M uses the “reversed” priority numbering scheme for interrupts, where priority zero corresponds to the highest urgency interrupt and higher numerical values of priority correspond to lower urgency. This numbering scheme poses a constant threat of confusion, because any use of the terms “higher priority” or “lower priority” immediately requires clarification, whether they represent the numerical value of priority, or perhaps, the urgency of an interrupt.

NOTE: To avoid this confusion, in the rest of this post, the term “priority” means the numerical value of interrupt priority in the ARM Cortex-M convention. The term “urgency” means the capability of an interrupt to preempt other interrupts. A higher-urgency interrupt (lower priority number) can preempt a lower-urgency interrupt (higher priority number).

 Interrupt Priority Configuration Registers in the NVIC

The number of priority levels in the ARM Cortex-M core is configurable, meaning that various silicon vendors can implement different number of priority bits in their chips. However, there is a minimum number of interrupt priority bits that need to be implemented, which is 2 bits in ARM Cortex-M0/M0+ and 3 bits in ARM Cortex-M3/M4.

But here again, the most confusing fact is that the priority bits are implemented in the most-significant bits of the priority configuration registers in the NVIC (Nested Vectored Interrupt Controller). The following figure illustrates the bit assignment in a priority configuration register for 3-bit implementation (part A), such as TI Tiva MCUs, and 4-bit implementation (part B), such as the NXP LPC17xx ARM Cortex-M3 MCUs.

 Interrupt priory registers with 3 bits of priority (A), and 4 bits of priority (B)

Interrupt priory registers with 3 bits of priority (A), and 4 bits of priority (B)


The relevance of the bit representation in the NVIC priority register is that this creates another priority numbering scheme, in which the numerical value of the priority is shifted to the left by the number of unimplemented priority bits. If you ever write directly to the priority registers in the NVIC, you must remember to use this convention.

NOTE: The interrupt priorities don’t need to be uniquely assigned, so it is perfectly legal to assign the same interrupt priority to many interrupts in the system. That means that your application can service many more interrupts than the number of interrupt priority levels.

NOTE: Out of reset, all interrupts and exceptions with configurable priority have the same default priority of zero. This priority number represents the highest-possible interrupt urgency.

Interrupt Priority Numbering in the CMSIS

The Cortex Microcontroller Software Interface Standard (CMSIS) provided by ARM Ltd. is the recommended way of programming Cortex-M microcontrollers in a portable way. The CMSIS standard provides the function NVIC_SetPriority(IRQn, priority) for setting the interrupts priorities.

However, it is very important to note that the ‘priority‘ argument of this function must not be shifted by the number of unimplemented bits, because the function performs the shifting by (8 – __NVIC_PRIO_BITS) internally, before writing the value to the appropriate priority configuration register in the NVIC. The number of implemented priority bits __NVIC_PRIO_BITS is defined in CMSIS for each ARM Cortex-M device.

For example, calling NVIC_SetPriority(7, 6) will set the priority configuration register corresponding to IRQ#7 to 1100,0000 binary on ARM Cortex-M with 3-bits of interrupt priority and it will set the same register to 0110,0000 binary on ARM Cortex-M with 4-bits of priority.

NOTE: The confusion about the priority numbering scheme used in the NVIC_SetPriority() is further promulgated by various code examples on the Internet and even in reputable books. For example the book “The Definitive Guide to ARM Cortex-M3, Second Edition”, ISBN 979-0-12-382091-4, Section 8.3 on page 138 includes a call NVIC_SetPriority(7, 0xC0) with the intent to set priority of IR#7 to 6. This call is incorrect and at least in CMSIS version 3.x will set the priority of IR#7 to zero.

Preempt Priority and Subpriority

The interrupt priority registers for each interrupt is further divided into two parts. The upper part (most-significant bits) is the preempt priority, and the lower part (least-significant bits) is the subpriority. The number of bits in each part of the priority registers is configurable via the Application Interrupt and Reset Control Register (AIRC, at address 0xE000ED0C).

The preempt priority level defines whether an interrupt can be serviced when the processor is already running another interrupt handler. In other words, preempt priority determines if one interrupt can preempt another.

The subpriority level value is used only when two exceptions with the same preempt priority level are pending (because interrupts are disabled, for example). When the interrupts are re-enabled, the exception with the lower subpriority (higher urgency) will be handled first.

In most applications, I would highly recommended to assign all the interrupt priority bits to the preempt priority group, leaving no priority bits as subpriority bits, which is the default setting out of reset. Any other configuration complicates the otherwise direct relationship between the interrupt priority number and interrupt urgency.

NOTE: Some third-party code libraries (e.g., the STM32 driver library) change the priority grouping configuration to non-standard. Therefore, it is highly recommended to explicitly re-set the priority grouping to the default by calling the CMSIS function NVIC_SetPriorityGrouping(0U) after initializing such external libraries.

Disabling Interrupts with PRIMASK and BASEPRI Registers

Often in real-time embedded programming it is necessary to perform certain operations atomically to prevent data corruption.  The simplest way to achieve the atomicity is to briefly disable and re-enabe interrupts.

The ARM Cortex-M offers two methods of disabling and re-enabling interrupts. The simplest method is to set and clear the interrupt bit in the PRIMASK register. Specifically, disabling interrupts can be achieved with the “CPSID i” instruction and enabling interrupts with the “CPSIE i” instruction. This method is simple and fast, but it disables all interrupt levels indiscriminately. This is the only method available in the ARMv6-M architecture (Cortex-M0/M0+).

However, the more advanced ARMv7-M (Cortex-M3/M4/M4F) provides additionally the BASEPRI special register, which allows you to disable interrupts more selectively. Specifically, you can disable interrupts only with urgency lower than a certain level and leave the higher-urgency interrupts not disabled at all. (This feature is sometimes called “zero interrupt latency”.)

The CMSIS provides the function __set_BASEPRI(priority) for changing the value of the BASEPRI register. The function uses the hardware convention for the ‘priority’ argument, which means that the priority must be shifted left by the number of unimplemented bits (8 – __NVIC_PRIO_BITS).

NOTE: The priority numbering convention used in __set_BASEPRI(priority) is thus different than in the NVIC_SetPriority(priority) function, which expects the “priority” argument not shifted.

For example, if you want to selectively block interrupts with priority number higher or equal to 6, you could use the following code:

// code before critical section
__set_BASEPRI(6 << (8 - __NVIC_PRIO_BITS));
// critical section
__set_BASEPRI(0U); // remove the BASEPRI masking
// code after critical section


Dual Targeting and Agile Prototyping of Embedded Software on Windows

April 12th, 2013 by Miro Samek

When developing embedded code for devices with non-trivial user interfaces, it often pays off to build a prototype (virtual prototype) of the embedded system of a PC. The strategy is called “dual targeting”, because you develop software on one machine (e.g., Windows PC) and run it on a deeply embedded target, as well as on the PC. Dual targeting is the main strategy for avoiding the “target system bottleneck” in the agile embedded software development, popularized in the book “Test-Driven Development for Embedded C” by James Grenning.

Avoiding Target Hardware Bottleneck with Dual Targeting

Please note that dual targeting does not mean that the embedded device has anything to do with the PC. Neither it means that the simulation must be cycle-exact with the embedded target CPU.

Dual targeting simply means that from day one, your embedded code (typically in C) is designed to run on at least two platforms: the final target hardware and your PC. All you really need for this is two C compilers: one for the PC and another for the embedded device.

However, the dual targeting strategy does require a specific way of designing the embedded software such that any target hardware dependencies are handled through a well-defined interface often called the Board Support Package (BSP). This interface has at least two implementations: one for the actual target and one for the PC, for example running Windows. With such interface in place, the bulk of the embedded code can remain completely unaware which BSP implementation it is linked to and so it can be developed quickly on the PC, but can also run on the target hardware without any changes.

While some embedded programmers can view dual targeting as a self-inflicted burden, the more experienced developers generally agree that paying attention to the boundaries between software and hardware is actually beneficial, because it results in more modular, more portable, and more maintainable software with much longer useful lifetime. The investment in dual targeting has also an immediate payback in the vastly accelerated compile-run-debug cycle, which is much faster and more productive on the powerful PC compared to much slower, recourse-constrained deeply embedded target with limited visibility into the running code.

Agile Rapid Prototyping of Embedded Software with Dual Targeting

Dual targeting can have many different objectives. For example, in the test-driven development (TDD) of embedded software, the objective is to build relatively concise unit tests and execute them on the desktop as console-type applications. The main challenge is management of the inter-module dependencies and flexibility of tests, but the overall architecture of the final product is of lesser concerns, as the unit tests are executed in isolation using special test harnesses.

However, dual targeting can also be used for (rapid) prototyping and simulating the whole embedded devices on the PC, not just executing unit tests. In this case, the objective is to build a possibly complete prototype of the embedded device as a GUI-type application. This approach is particularly interesting for embedded systems with non-trivial user interfaces, such as: home appliances, office equipment, thermostats, medical devices, industrial controllers, etc. As it turns out, significant percentage of the code embedded in all those devices is devoted to the user interface and can be, or even should be, developed on the desktop.

QWIN GUI Toolkit

When developing embedded code for devices with non-trivial user interfaces, one often runs into the problem of representing the embedded front panels as GUI elements on the PC. The problem is so common, that I’m really surprised that my internet search couldn’t uncover any simple C-only interface to the basic elements, such as LCDs, buttons, and LEDs. I’ve posted questions on StackOverflow, and other such forums, but again, I got recommendations for .NET, C#, VisualBasic, and many expensive proprietary tools, none of which provided an easy, direct binding to C. My objective is not really that complicated, yet it seems that every embedded developer has to re-invent this wheel over and over again.




So, to help embedded developers interested in prototyping embedded devices on Windows, I have created a QWIN GUI Toolkit” and have posted on SourceForge (as part of the Qtools collection) under the permissive MIT open source license. This toolkit relies only on the raw Win32 API in C and currently provides the following elements:

  • Graphic display for an efficient, pixel-addressable displays such as graphical LCDs, OLEDs, etc. with full 24-bit color.
  • Segment display for segmented display such as segment LCDs, and segment LEDs with generic, custom bitmaps for the segments.
  • Owner-drawn buttons with custom “depressed” and “released” bitmaps and capable of generating separate events when depressed and when released.

The toolkit comes with an example and an App Note, showing how to handle input from the owner-drawn buttons, regular buttons, keyboard, and the mouse. You can also view a 1-minute YouTube video “Flyn ‘n’ Shoot game on windows” that shows a virtual embedded board running a game.

Regarding the size and complexity of the “QWIN GUI Toolkit“, the implementation of the aforementioned GUI elements takes only about 250 lines of C. The example with all sources of input and a lot of comments amounts to some 300 lines of C. The toolkit has been tested with the free MinGW compiler, the free Visual C++ Express 2013, and the free ResEdit resource editor.


Embedded C Programming with ARM Cortex-M Video Course

January 21st, 2013 by Miro Samek

As part of my New Year’s resolution for 2013, I just started to teach an Embedded C Programming Course with ARM Cortex-M on YouTube. The playlist for this course is available at: http://www.youtube.com/playlist?list=PLPW8O6W-1chwyTzI3BHwBLbGQoPFxPAPM .

The course is intended for beginners and is structured as a series of short, focused, hands-on lessons that teach you how to program ARM Cortex-M microcontrollers in C.

I’ve designed this course not just to be watched, but to follow it along on your own computer. In the “Getting Started” Lesson 0, I show you how to download and install the free evaluation version of IAR EWARM and how to order the Stellaris Launchpad ARM Cortex-M4 board (for just $12.99). The board is optional, as I show how to use the instruction set simulator.

My goal is not just to teach C–other courses do it already quite well. But there are virtually no courses that would step down to the machine level and show you exactly what happens inside the ARM processor.

Starting from Lesson 1 you actually see how the ARM Cortex-M processor executes your code, how it manipulates registers, and how it counts. You learn how binary numbers map to the hexadecimal system used in the debugger (and in C) and you learn about the two’s complement number representation of signed numbers.

In lesson 2, you learn about the flow of control and the ARM branch instructions. Actually, you witness a disection of the ARM B-instruction (branch). You also learn about the pipeline and pipeline stalls due to branching.

In lesson 3, you learn about variables and pointers. You learn how ARM accesses variables in memory through the load and store instructions (load-store architecture). You also learn how the fundamental concept of memory addresses maps to pointers in C, how to obtain an address of a variable and how to dereference a pointer.

I hope that this course will help you gain understanding of the ARM Cortex-M core, which will look really good on your resume.

This deeper understanding will allow you to use both the ARM processor and the C language more efficiently and with greater confidence. You will gain understanding not just what for your program does, but also how the C statements translate to machine instructions and how fast the processor can execute them.

I’d love to hear your comments about the course. Is there anything that you would like to see in the upcoming lessons? Do you see anything that you would teach differently? Or perhaps you have ideas for teaching specific subjects? Please share…

The Best Christmas Present for a Nerd

December 5th, 2012 by Miro Samek

Christmas is right around the corner and if you wonder about the presents, I have just an idea for you. No, it is not the new iPad, Galaxy S3 phone, or any of the new “ultrabooks”. In fact, this is exactly the opposite. My present idea is to boost your productivity in creating “content”, not merely consuming it.

And when it comes to creating anything with a computer, you need a big screen–the bigger the better. In fact, I’d recommend that you get yourself two new monitors. And don’t think small. How about two 27″, 1920x1080p full HD, LED-lit panes? You can get those for under $300 each, so a pair will still cost you less than a new iPad.

I got such a setup a few months ago, and now I’m absolutely convinced that this has been the best investment in my productivity–better than a faster CPU or a solid-state disk. I really can’t benefit from my machine being faster–that’s not what wastes my time. But I sure can use more screen, to read the documentation two pages at a time, and to see a complete IDE or a modeling tool on the other screen (modeling tools absolutely love big screens!).

The picture of my desk shows my setup. I have two 27″ HP 2711x 1080p monitors connected to an HP dv6 laptop. One monitor is connected via the HDMI cable and the other via the analog VGI cable. I don’t see any degradation in image quality on the VGA-driven monitor.

Dual Monitors

As you can see in the picture, I’ve placed my monitors on 6″ stands above my desk ($25 each). This is actually quite important, because too many people place their screens too low for comfortable work. (Using a laptop without a stand and additional keyboard is absolutely the worst!)

So, here it is: my Christmas present idea for a nerd. Write a letter to Santa about it, and maybe he will shove it down your chimney? (Only if you are good, that is!)

RTOS, TDD and the “O” in the S-O-L-I-D rules

June 11th, 2012 by Miro Samek

In Chapter 11 of the “Test-Driven Development for Embedded C” book, James Grenning discusses the S-O-L-I-D rules for effective software design. These rules have been compiled by Robert C. Martin and are intended to make a software system easier to develop, maintain, and extend over time. The acronym SOLID stands for the following five principles:

S: Single Responsibility Principle
O: Open-Closed Principle
L: Liskov Substitution Principle
I: Interface Segregation Principle
D: Dependency Inversion Principle

Out of all the SOLID design rules, the “O” rule (Open-Closed Principle) seems to me the most important for TDD, as well as the iterative and incremental development in general. If the system we design is “open for extension but closed for modification”, we can keep extending it without much re-work and re-testing of the previously developed and tested code. On the other hand, if the design requires constant re-visiting of what’s already been done and tested, we have to re-do both the code and the tests and essentially the whole iterative, TDD-based approach collapses. Please note that I don’t even mean here extensibility for the future versions of the system. I mean small, incremental extensions that we keep piling up every day to build the system in the first place.

So, here is my problem: RTOS-based designs are generally lousy when it comes to the Open-Closed Principle. The fundamental reason is that RTOS-based designs use blocking for everything, from waiting on a semaphore to timed delays. Blocked tasks are unresponsive for the duration of the blocking and the whole intervening code is designed to handle this one event on which the task was waiting. For example, if a task blocks and waits for a button press, the code that follows the blocking call handles the button. So now, it is hard to add a new event to this task, such as reception of a byte from a UART, because of the timing (waiting on user input is too long and unpredictable) and because of the whole intervening code structure. In practice, people keep adding new tasks that can wait and block on new events, but this often violates the “S” rule (Single Responsibility Principle). Often, the added tasks have the same responsibility as the old tasks and have high degree of coupling (cohesion) with them. This cohesion requires sharing resources (a nightmare in TDD) and even more blocking with mutexes, etc.

Compare this with the event-driven approach, in which the system processes events quickly without ever blocking. Extending such systems with new events is trivial and typically does not require re-doing existing event handlers. Therefore such designs realize the Open-Closed Principle very naturally. You can also much more easily achieve the Single Responsibility Principle, because you can easily group related events in one cohesive design unit. This design unit (an active object) becomes also natural unit for TDD.

So, it seems to me that TDD should naturally favor event-driven approaches, such as active objects (actors), over traditional blocking RTOS.

I’m really curious about your thoughts about this, as it seems to me quite fundamental to the success of TDD. I’m looking forward to an interesting discussion.