Archive for March, 2010

How to Set the Size of your C Stack

Wednesday, March 24th, 2010 Michael Barr

A reader of my monthly Firmware Update newsletter recently sent an e-mail to ask:

I am a firmware engineer. I read your recent blog post regarding the C stack, about which I have two questions: First, how can I increment or decrement the size of the stack in my code? Second, what size should I choose?

Here’s what I told him:

The size of the stack is set either in the linker command file or in the C or C++ startup code. You should be able to learn more about how to change the stack size from your specific compiler vendor’s manual or customer support.

Identifying the minimum stack size required for your specific application is made challenging by these stubborn facts:

– MEASURING the maximum stack growth during testing may not be sufficient. If you test for half a year, the product is sure to be run for a year or longer in the field. Have you really tested all possible cases? What about all possible series of interrupt service routines on top of that worst case use by main()?

– TOP DOWN ANALYSIS of the compiled code can be done to determine the number of function calls and interrupt service routines at maximum depth; their individual parameter and local variable use, etc. Unfortunately, these things may keep changing whenever you change the code and recompile.

The best approach is usually to perform a conservative top down analysis of the source code; when in doubt, always round up. Don’t forget about nested interrupt service routines. Double that conservative to set your initial stack budget. Then measure actual stack utilization during testing, preferably with code coverage analysis tools running–to ensure that you’ve tested all possible paths (except interrupts, which may run at different times in the field).

Then if you need to reclaim memory to ship the product, start shrinking the stack. But also put into place a high water mark system (e.g., 0xDEADBEEF) complete with supervisor code to put the product into a failsafe state if more than, for example, 80% of the stack is ever used.

Toyota’s Embedded Software Image Problem

Friday, March 19th, 2010 Michael Barr

It remains unclear whether Toyota’s higher-than-industry-average number of complaints regarding sudden unintended acceleration (SUA) is caused (in whole or in part) by an embedded software problem. But whether it is or it isn’t actually firmware, the company has clearly denied it and yet still developed an embedded software “image problem”. They’ve brought some of this on themselves.

Side Note: I think it is a net positive that journalists, the mass media, and a broader swath of the general public are increasingly aware that there is software embedded inside cars, airplanes, medical devices, and just about everything else with a power supply or batteries. Firmware has been inside these products for many years, of course. But as I wrote in a recent article in Electronic Design, my experience working with companies across many industries lead me to believe there is a looming firmware quality crisis. Greater public awareness is sure to bring litigation. This will force engineering management to care more about firmware quality than they currently do.

Toyota’s Firmware Image Problem

Long before the “floor-mat recall” NHTSA had logged a higher number of unintended acceleration complaints (4.51 complaints per 100,000 cars sold for the 2005 to 2010 model years) for Toyota than any other company. (A recent Washington Post graphic has more data.) Apparently, NHTSA and Toyota were investigating the reports–but hadn’t yet taken any action.

It seems that what set that first Toyota recall in motion was a high-profile fatal August 2009 crash involving an off-duty California Highway Patrol office, his family, a runaway Lexus, and a disturbing 911 call,  Given the context of that specific crash, I’m not convinced the floor mat recall made much sense. In particular, I find it hard to believe that a police officer with adrenaline pumping through his veins and his family’s life on the line, wouldn’t just rip a stuck floor mat out of the way like the Incredible Hulk. (Or that he would choose running off the road at 125 mph vs. shutting the vehicle off entirely.)  But I don’t have all the facts about either that specific accident or the reasoning behind the floor mat recall.

The broader recalls that have happened since have focused on also adding mechanical strength to the accelerator pedals in a number of different makes and models. To this day, Toyota categorically denies any sort of electrical problem.  Yet some cars that have been modified in this way have since been reported to experience unintended acceleration!  Besides which, mechanical parts generally fail visibly or entirely once they first fail–rather than intermittently.  Intermittent failures are far more common with electronics (think EMI) and firmware.

Toyota’s firmware image problem stems from two things:  First, they have separately recalled the Prius for a braking-related firmware upgrade.  Other possible Prius software issues have been identified by Steve Wozniak and Jim Sikes, but these have not yet been confirmed.  Additionally, the continued reliance (by Toyota and NHTSA) on theories such as “we can’t reproduce the problem and we haven’t been able to see it during testing” as proof that there’s not a software bug is simply unbelievable.  

Anyone who works with software knows from experience that lots of bugs can’t be easily reproduced.  The fact that these incidents can’t be reproduced is not a proof of anything.

Software in Cars: The Future

Don’t get me wrong.  I want more software in my car not less.  I very much look forward to the day that an in-car computer takes over the driving for me.  After all, some cars already have more sensor data to make decisions on than the driver does.  Imagine what a car with an integrated GPS navigation system, auto-follow cruise control, and collision avoidance systems could do.  While I guess that I should move left one lane to avoid a crash, the computer is capable of seeing in all directions at once, calculating all of the trajectories of near-by cars, including instantaneous changes in their acceleration or deceleration.

Additionally, I suspect that even with bugs in a car’s drive-by-wire software the car may be much safer overall for its electronic traction control and anti-lock braking systems.

I just wish that Toyota would own up to the fact that the inability to reproduce a problem doesn’t rule out a software (or EMI) flaw.

Firmware-Specific Bug #5: Heap Fragmentation

Monday, March 15th, 2010 Michael Barr

Dynamic memory allocation is not widely used by embedded software developers—and for good reasons. One of those is the problem of fragmentation of the heap.

All data structures created via C’s malloc() standard library routine or C++’s new keyword live on the heap. The heap is a specific area in RAM of a pre-determined maximum size. Initially, each allocation from the heap reduces the amount of remaining “free” space by the same number of bytes. For example, the heap in a particular system might span 10 KB starting from address 0x20200000. An allocation of a pair of 4-KB data structures would leave 2 KB of free space.

The storage for data structures that are no longer needed can be returned to the heap by a call to free() or use of the delete keyword. In theory this makes that storage space available for reuse during subsequent allocations. But the order of allocations and deletions is generally at least pseudo-random—leading the heap to become a mess of smaller fragments.

To see how fragmentation can be a problem, consider what would happen if the first of the above 4 KB data structures is free. Now the heap consists of one 4-KB free chunk and another 2-KB free chunk; they are not adjacent and cannot be combined. So our heap is already fragmented. Despite 6 KB of total free space, allocations of more than 4 KB will fail.

Fragmentation is similar to entropy: both increase over time. In a long running system (i.e., most every embedded system ever created), fragmentation may eventually cause some allocation requests to fail. And what then? How should your firmware handle the case of a failed heap allocation request?

Best Practice: Avoiding all use of the heap may is a sure way of preventing this bug. But if dynamic memory allocation is either necessary or convenient in your system, there is an alternative way of structuring the heap that will prevent fragmentation. The key observation is that the problem is caused by variable sized requests.

If all of the requests were of the same size, then any free block is as good as any other—even if it happens not to be adjacent to any of the other free blocks. Thus it is possible to use multiple “heaps”—each for allocation requests of a specific size—can using a “memory pool” data structure.

If you like you can write your own fixed-sized memory pool API. You’ll just need three functions:

  • handle = pool_create(block_size, num_blocks) – to create a new pool (of size M chunks by N bytes);
  • p_block = pool_alloc(handle) – to allocate one chunk (from a specified pool); and
  • pool_free(handle, p_block).

But note that many real-time operating systems (RTOSes) feature a fixed-size memory pool API. If you have access to one of those, use it instead of the compiler’s malloc() and free() or your own implementation.

Firmware-Specific Bug #4

Firmware-Specific Bug #6

Firmware-Specific Bug #4: Stack Overflow

Thursday, March 11th, 2010 Michael Barr

Every programmer knows that a stack overflow is a Very Bad Thing™. The effect of each stack overflow varies, though. The nature of the damage and the timing of the misbehavior depend entirely on which data or instructions are clobbered and how they are used. Importantly, the length of time between a stack overflow and its negative effects on the system depends on how long it is before the clobbered bits are used.

Unfortunately, stack overflow afflicts embedded systems far more often than it does desktop computers. This is for several reasons, including:

  1. embedded systems usually have to get by on a smaller amount of RAM;
  2. there is typically no virtual memory to fall back on (because there is no disk);
  3. firmware designs based on RTOS tasks utilize multiple stacks (one per task), each of which must be sized sufficiently to ensure against unique worst-case stack depth;
  4. and interrupt handlers may try to use those same stacks.

Further complicating this issue, there is no amount of testing that can ensure that a particular stack is sufficiently large. You can test your system under all sorts of loading conditions but you can only test it for so long. A stack overflow that only occurs “once in a blue moon” may not be witnessed by tests that run for only “half a blue moon.” Demonstrating that a stack overflow will never occur can, under algorithmic limitations (such as no recursion), be done with a top down analysis of the control flow of the code. But a top down analysis will need to be redone every time the code is changed.

Best Practice: On startup, paint an unlikely memory pattern throughout the stack(s). (I like to use hex 23 3D 3D 23, which looks like a fence ‘#==#’ in an ASCII memory dump.) At runtime, have a supervisor task periodically check that none of the paint above some pre-established high water mark has been changed. If something is found to be amiss with a stack, log the specific error (e.g., which stack and how high the flood) in non-volatile memory and do something safe for users of the product (e.g., controlled shut down or reset) before a true overflow can occur. This is a nice additional safety feature to add to the watchdog task.

Firmware-Specific Bug #3

Firmware-Specific Bug #5 (coming soon)

Embedded Gurus – Site Redesign

Tuesday, March 2nd, 2010 Michael Barr

I am pleased to announce that the EmbeddedGurus website has been redesigned. Among the new features of the site are:

1. A dynamically updating home page, featuring the most recent posts from all of our bloggers. If you prefer, you may view these posts by category.

2. A common look and feel to all of the individual blogs.

3. The ability to search individual blogs, as well as to easily browse from one post to the next and via tags and categories.

4. A sixth guru named Gary Stringham with a blog called Embedded Bridge.

A number of other minor improvements have also been made.

We hope you like the new look and continue to find our blogs about embedded systems design both readable and informative.