embedded software boot camp

Economics 101: UML in Embedded Systems

April 17th, 2012 by Miro Samek

With UML, just as with anything else in the embedded space, the ultimate criterion for success is the return on investment (ROI). Sure there are many factors at play, such as “coolness factor”, yearning for a “silver bullet” and truly “automatic programming” all fueled by the aggressive marketing rhetoric of tool vendors. But ultimately, to be successful, the benefits of a method must outweigh the learning curve, the cost of tools, the added maintenance costs, the hidden costs of “fighting the tool” and so on.

As it turns out, the ROI of UML is lousy unless the models are used to generate substantial portions of the production code. Without code generation, the models inevitably fall behind and become more of a liability than an asset. In this respect I tend to agree with the “UML Modeling Maturity Index (UMMI)“, invented by Bruce Douglass. According to the UMMI, without code generation UML can reach at most 30% of its potential, and this is assuming correct use of behavioral modeling. Without it, the benefits are below 10%. This is just too low to outweigh all the costs.

Unfortunately, code generation capabilities have been always associated with complex, expensive UML tools with a very steep learning curve and a price tag to match. With such a big investment side of the ROI equation, it’s quite difficult to reach sufficient return. Consequently, all too often big tools get abandoned and if they continue to be used at all, they end up as overpriced drawing packages.

So, to follow my purely economic argument, unless we make the investment part of the ROI equation low enough, without reducing the returns too much, UML has no chance. On the other hand, if we could achieve positive ROI (something like 80% of benefits for 10% of the cost), we would have a “game changer”.

To this end, when you look closer, the biggest “bang for the buck” in UML with respect to embedded code generation are: (1) hierarchical state machines (UML statecharts) and (2) an embedded real-time framework. Of course, these two ingredients work best together and complement each other. State machines can’t operate in vacuum and need an event-driven framework to provide execution context, thread-safe event passing, event queueing, etc. Framework benefits from state machines for structure and code generation capabilities.

I’m not sure if many people realize the critical importance of a framework, but a good framework is in many ways even more valuable than the tool itself, because the framework is the big enabler of architectural reuse, testability, traceability, and code generation to name just a few. The second component are state machines, but again I’m not sure if everybody realizes the importance of state nesting. Without support for state hierarchy, traditional “flat” state machines suffer from the phenomenon known as “state-transition explosion”, which renders them unusable for real-life problems.

As it turns out, the two critical ingredients for code generation can be had with much lower investment than traditionally thought. An event-driven, real-time framework can be no more complex as a traditional bare-bones RTOS (e.g., see the family of the open source QP frameworks). A UML modeling tool for creating hierarchical state machines and production code generation can be free and can be designed to minimize the problem of “fighting the tool” (e.g., see QM). Sure, you don’t get all the bells and whistles of IBM Rhapsody, but you get the arguably most valuable ingredients. More importantly, you have a chance to achieve a positive ROI on your first project. As I said, this to me is game changing.

Can a lightweight framework like QP and the QM modeling tool scale to really big projects? Well, I’ve seen it used for tens of KLOC-size projects by big, distributed teams and I haven’t seen any signs of over-stressing the architecture or the tool.

Turning automatic code generation upside down

February 14th, 2012 by Miro Samek

Much ink has been spilled on the Next Big Thing in software development. One of these things has always been “automatic code generation” from high-level models (e.g., from state machines).

But even though many tools on the market today support code generation, their widespread acceptance has grown rather slowly. Of course, many factors contribute to this, but one of the main reasons is that the generated code has simply too many shortcomings, which too often require manual “massaging” of the generated code. But this breaks the connection with the original model. The tool industry’s answer has been “round-trip engineering”, which is the idea of feeding the changes in the code back to the model.

Unfortunately, “round-trip engineering” simply does not work well enough in practice. This should not be so surprising, considering that no other code generation in software history has ever worked that way. You don’t edit by hand the binary machine code generated by an assembler. You don’t edit by hand the assembly code generated by the high-level language compiler. This would be ridiculous. So, why modeling tools assume that the generated code will be edited manually?

Well, the modeling tools have to assume this, because the generated code is hard to use “as-is” without modifications.

First, the generated code might be simply incomplete, such as skeleton code with “TODO” comments generated from class diagrams. I’m not a fan of this, because I think that in the long run such code generation is outright counterproductive.

Second, most code generating tools impose a specific physical design (by physical design I mean partitioning of the code into directories, and files, such as header files and implementation files). For example, for generation of C/C++ code (which dominate real-time embedded programming), the beaten path is to generate <class>.h and <class>.cpp files for every class. But what if I want to put class declaration in a file scope of a .cpp file and not to generate the <class>.h file at all? Actually, I often want to do this to achieve even better encapsulation. A typical tool would not allow me to do this.

And finally, all too often the automatically generated code is hard to integrate with other code, not created by the tool. For example, a class definition might rely on many included header files. But while most tools recognize that and allow inserting some custom beginning of the file, they don’t allow to insert code in an arbitrary place in the file.

But, how about a tool that actually allows you to do your own physical design? How about turning the whole code generation process upside down?

A tool like this would allow you to create and name directories and files instead of the tool imposing it on you. Obviously, this is still manual coding. But, the twist here is that in this code you can “ask” the tool to synthesize parts of the code based on the model. (The “requests” are special tags that you mix in your code.) For example, you can “ask” the tool to generate a class declaration in one place, a class method definition in another, and a state machine definition in yet another place in your code.

This “inversion” of code generation responsibilities solves most of the problems with the integration between the generated code and other code. You can simply “ask” the tool to generate as much or as little code as you see fit. The tool helps, where it can add value, but otherwise you can keep it out of your way.

The idea of “inverting” the code generation is so simple, that I would be surprised if it was not already implemented in some tools. One example I have is the free QM tool from my company (http://www.state-machine.com/qm). If you know of any other tool that works that way, I would be very interested to hear about it.

Online Embedded Software Store: a good idea?

February 11th, 2012 by Miro Samek

Have you visited the new online Embedded Software Store (embeddedsoftwarestore.com) operated by Avnet and ARM? Did you buy anything there? What do you think?

Well, I visited the website, but frankly, I wouldn’t be comfortable buying software there.

For example, suppose you are interested in operating systems. That’s easy enough, because on the home page Embeddedsoftwarestore.com lists “New Products” in this category. Yesterday they listed uC/OS-II and CMS-RX RTOS. I clicked on uC/OS-II, which brought me to the product page for “uC/OS-II on the TI LM3S9Bxx – Product Line” by Micrium for $40,982.14. There is really not much of a product description, except for the “Product License”, which is a click-through EULA (End User License Agreement). Otherwise you can just add the product to the shopping cart and head out to check-out. Before you pay, you are presented with an order summary, where they list the products RoHS status, the packaging, as well as other equally “useful” information for software. You are also reminded that you are responsible for Duties and Taxes.

But, wait a minute. What are you buying here? First, you are not really buying the software, because you most likely already have it. It is available for a free download from Micrium (see http://micrium.com/page/downloads/source_code). So, you obviously don’t care about “shipping”. Rather, you buy the rights to use the software in your Product Line. But then the click-through EULA makes no sense. It has no binding signature of the vendor and it has no Product Line definition.

If I would really spend $40,000 for legal rights, I would accept nothing less than a contract signed personally by an officer of Micrium. A click-through contract is good, perhaps, for buying a 99-cent song online, but then you actually get the song. Here, you are about to spend $40K, which is like buying two cars with a mouse click, and you don’t get anything.

Well, perhaps I’m missing something here, but it seems to me that software is a bit more complex product than chips and boards. What do you think of this business model?

What’s the state of your Cortex?

September 26th, 2011 by Miro Samek

Recently, I’ve been involved in a fascinating bug hunt related to a very peculiar behavior of the ARM Cortex-M3 core. Given the incredible popularity of this core, I thought that digging a little deeper into the mysteries of ARM Cortex could be interesting and informative.

First, I need to provide some background. So, the bug was related to the very unique ARM Cortex-M exception type called PendSV. This is an exception triggered by software, but unlike any regular software interrupt, PendSV is an asynchronous exception. This means that PendSV typically does not run immediately after it is triggered, but only after the Nested Vectored Interrupt Controller (NVIC) determines that the priority of the currently executing code drops below the priority associated with PendSV.

At this point, you might wonder, why and where would such “Pended Software Interrupt” be useful? Well, it turns out that PendSV is the only reliable way on ARM Cortex-M to find out when all (possibly nested) interrupt service routines (ISRs) have completed. And this determination is essential to run the scheduler in any preemptive real time kernel.

Virtually all preemptive RTOSes for ARM Cortex-M processors work as follows. Upon initialization the priority associated with PendSV is set to be the lowest of all exceptions (0xFF). All ISRs in the system, prioritized above PendSV, trigger the PendSV exception by writing 1 to the PENDSVSET bit in the NVIC ICSR register, like this:

*((uint32_t volatile *)0xE000ED04) = 0x10000000;

Now, the heavy lifting is left entirely to the NVIC hardware. NVIC will activate PendSV only after the last of all nested interrupts completes and is about to return to the preempted task context. This is exactly the right time for a context switch. In other words, the PendSV exception is designed to call the scheduler and perform the task preemption. ARM Cortex is so smart that it eliminates the overhead of exiting one exception (the last nested interrupt) and activating another (the PendSV) in the trick called “tail-chaining”.

Everything looks easy so far, but ARM Cortex has one more trick up it’s sleeve and this optimization, called “late-arrival”, has interesting side effects related to PendSV. This subtle interaction between PendSV and late-arrival leads essentially to a hardware race condition I’ve recently had a pleasure to chase down.

To illustrate the events that lead up to the bug, I’ve prepared a distilled hardware trace available for viewing at ARM-Cortex-M3_bug.txt. Please go ahead and click on this link to follow along.

The trace starts with an interrupt entry (labelled as Exception 83). This system runs under the preemptive kernel called QK, so the ISR calls QK_ISR_ENTRY() and later QK_ISR_EXIT() macros to inform the kernel about the interrupt. At trace index 069545 the QK_ISR_EXIT() macro triggers the PendSV exception by writing 0x10000000 into the ICSR register.

After this, the Exception 83 runs to completion and eventually tail-chains to Exception 14 (PendSV). This is all as expected.

However, the real problem starts at trace index 069618, at which the execution of the first instruction of PendSV (CPSID i) is cancelled due to arrival of a higher-priority Exception 36 (another interrupt).

This cancellation of low-priority Exception 14 in favor of the higher-priority Exception 36 is the ARM Cortex-special called late arrival. The ARM core optimizes the interrupt entry (which is identical for all exception), and instead of entering the low-priority exception and than immediately high-priority exception, it simply enters the high-priority exception.

The problem is that just before the late arrival, the PENDSVSET bit in the NVIC-ICSR register is already cleared.

However, the late-arriving Exception 36 sets this bit again in QK_ISR_EXIT(), which is normal for any interrupt (trace index 070126).

The Exception 36 eventually exits to the original PendSV (trace index 070130), but this is not the usual tail-chaining (the trace indicates tail-chaining by the pair Exception Exit/Exception Entry). This time around the trace shows only Exception Exit, but no entry.

This difference has very important implication, which is that the PENDSVSET bit in the NVIC-ICSR register is not cleared (remember that it is set, however).

What unfolds next is the consequence of the PENDSVSET bit being set. PendSV executes, fakes its own return to the QK scheduler, and eventually it unlocks interrupts. But before SVCall (Exception 11) can execute, the PendSV Exception 14 is taken again (because it is triggered by the PENDSVSET bit). This makes no sense and should never happen, because PendSV should never be in the triggered state at this point.

***
So, what are the consequences of this behavior and what is the fix?

Well, as you can see, due to late-arrival PendSV can be occasionally entered with the PENDSVSET bit being set, so it will be triggered again immediately after it completes. This might or might not have adverse consequences. In case of the QK kernel, this was unacceptable and led to a Hardware Fault. In other RTOSes it might simply cause another scheduler call, waste of some CPU, and delay of the task-level response, but perhaps not a catastrophic failure.

The actual fix of the problem is very simple. Since you cannot rely on the automatic clearing of the PENDSVSET bit in the NVIC-ICSR register, you need to clear it manually (by writing 1 to the PENDSVCLR bit in the NVIC-ICSR register.) Of course this is wasteful, because only one time in a million this bit is actually not cleared automatically.

Interestingly, I have not seen such writing to the PENDSVCLR bit in open source RTOSes for ARM Cortex-M (such as FreeRTOS.org). Recently, I’ve come across some posts to the ARM Community Forums that this problem exists for the Frescale MQX RTOS (see PendSV pending inside PendSV handler? (Cortex-M4)).

If you use a preemptive kernel on ARM Cortex-M0/M3, perhaps you could check how your kernel handles PendSV. If you don’t see an explicit write to the PENDSVCLR bit, I would recommend that you think through the consequences of re-entering PendSV. I’d be very interested to collect a survey of how the existing kernels for ARM Cortex-M handle this situation.

On the Origin of Software by Means of Artificial Selection

August 5th, 2011 by Miro Samek

If you haven’t put your hands on the recent James Grenning’s book “Test-Driven Development for Embedded C” yet, I highly recommend you do. Here is why.

First, you need to realize that this book is not really about testing–it is about software development. The central idea behind TDD (Test-Driven Development) is that software, as any complex system in nature, has to evolve gradually and has to keep working throughout all the development stages. This idea is of course not new and goes back all the way to the Darwin’s “On the Origin of Species”. More recently, in his 1977 book “Systemantics: How Systems Work and Especially How They Fail” John Gall wrote:

“A complex system that works is invariably found to have evolved from a simple system that worked…. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system”.

The key point of TDD is to subject the software to constant “struggle for existence” to actually see if it indeed is still working and in the process weed out any undesired mutations.

Of course, in developing software we don’t have the deep evolutionary time, so we need to accelerate the pace of software evolution. We do this by automating the testing.

For embedded development this means avoiding the target system bottleneck (James calls it DOH-Development On Hardware). The embedded TDD strategy is to develop embedded software on the desktop and only occasionally check it on the real embedded hardware. This means that the C/C++ compilers and tools for the desktop (such as Visual C++, MinGW, or Cygwin for Windows and GCC for Linux and Mac OS X) are important for us.

The book comes with testing frameworks (Unity and CppUTest) and plenty of example code. The code works right of the box on Linux, but I had some issues running it on Windows. In the process of learning the tools, I’ve prepared a small template for Visual C++ 2008, which is available for download from:

http://www.state-machine.com/attachments/blinky_tdd.zip

This demo assumes that you download and install the CppUTest framework (http://sourceforge.net/projects/cpputest/) and that you define the environment variable CPP_U_TEST to point to the directory where you installed CppUTest. The Visual Studio solution AllTest.sln is located in the blinky\tests directory.

I’d love to hear about your experiences with TDD in embedded programming. I’m sure I will blog more about it in the future.