Posts Tagged ‘embedded’

Toyota’s Embedded Software Image Problem

Friday, March 19th, 2010 Michael Barr

It remains unclear whether Toyota’s higher-than-industry-average number of complaints regarding sudden unintended acceleration (SUA) is caused (in whole or in part) by an embedded software problem. But whether it is or it isn’t actually firmware, the company has clearly denied it and yet still developed an embedded software “image problem”. They’ve brought some of this on themselves.

Side Note: I think it is a net positive that journalists, the mass media, and a broader swath of the general public are increasingly aware that there is software embedded inside cars, airplanes, medical devices, and just about everything else with a power supply or batteries. Firmware has been inside these products for many years, of course. But as I wrote in a recent article in Electronic Design, my experience working with companies across many industries lead me to believe there is a looming firmware quality crisis. Greater public awareness is sure to bring litigation. This will force engineering management to care more about firmware quality than they currently do.

Toyota’s Firmware Image Problem

Long before the “floor-mat recall” NHTSA had logged a higher number of unintended acceleration complaints (4.51 complaints per 100,000 cars sold for the 2005 to 2010 model years) for Toyota than any other company. (A recent Washington Post graphic has more data.) Apparently, NHTSA and Toyota were investigating the reports–but hadn’t yet taken any action.

It seems that what set that first Toyota recall in motion was a high-profile fatal August 2009 crash involving an off-duty California Highway Patrol office, his family, a runaway Lexus, and a disturbing 911 call,  Given the context of that specific crash, I’m not convinced the floor mat recall made much sense. In particular, I find it hard to believe that a police officer with adrenaline pumping through his veins and his family’s life on the line, wouldn’t just rip a stuck floor mat out of the way like the Incredible Hulk. (Or that he would choose running off the road at 125 mph vs. shutting the vehicle off entirely.)  But I don’t have all the facts about either that specific accident or the reasoning behind the floor mat recall.

The broader recalls that have happened since have focused on also adding mechanical strength to the accelerator pedals in a number of different makes and models. To this day, Toyota categorically denies any sort of electrical problem.  Yet some cars that have been modified in this way have since been reported to experience unintended acceleration!  Besides which, mechanical parts generally fail visibly or entirely once they first fail–rather than intermittently.  Intermittent failures are far more common with electronics (think EMI) and firmware.

Toyota’s firmware image problem stems from two things:  First, they have separately recalled the Prius for a braking-related firmware upgrade.  Other possible Prius software issues have been identified by Steve Wozniak and Jim Sikes, but these have not yet been confirmed.  Additionally, the continued reliance (by Toyota and NHTSA) on theories such as “we can’t reproduce the problem and we haven’t been able to see it during testing” as proof that there’s not a software bug is simply unbelievable.  

Anyone who works with software knows from experience that lots of bugs can’t be easily reproduced.  The fact that these incidents can’t be reproduced is not a proof of anything.

Software in Cars: The Future

Don’t get me wrong.  I want more software in my car not less.  I very much look forward to the day that an in-car computer takes over the driving for me.  After all, some cars already have more sensor data to make decisions on than the driver does.  Imagine what a car with an integrated GPS navigation system, auto-follow cruise control, and collision avoidance systems could do.  While I guess that I should move left one lane to avoid a crash, the computer is capable of seeing in all directions at once, calculating all of the trajectories of near-by cars, including instantaneous changes in their acceleration or deceleration.

Additionally, I suspect that even with bugs in a car’s drive-by-wire software the car may be much safer overall for its electronic traction control and anti-lock braking systems.

I just wish that Toyota would own up to the fact that the inability to reproduce a problem doesn’t rule out a software (or EMI) flaw.

Firmware-Specific Bug #4: Stack Overflow

Thursday, March 11th, 2010 Michael Barr

Every programmer knows that a stack overflow is a Very Bad Thing™. The effect of each stack overflow varies, though. The nature of the damage and the timing of the misbehavior depend entirely on which data or instructions are clobbered and how they are used. Importantly, the length of time between a stack overflow and its negative effects on the system depends on how long it is before the clobbered bits are used.

Unfortunately, stack overflow afflicts embedded systems far more often than it does desktop computers. This is for several reasons, including:

  1. embedded systems usually have to get by on a smaller amount of RAM;
  2. there is typically no virtual memory to fall back on (because there is no disk);
  3. firmware designs based on RTOS tasks utilize multiple stacks (one per task), each of which must be sized sufficiently to ensure against unique worst-case stack depth;
  4. and interrupt handlers may try to use those same stacks.

Further complicating this issue, there is no amount of testing that can ensure that a particular stack is sufficiently large. You can test your system under all sorts of loading conditions but you can only test it for so long. A stack overflow that only occurs “once in a blue moon” may not be witnessed by tests that run for only “half a blue moon.” Demonstrating that a stack overflow will never occur can, under algorithmic limitations (such as no recursion), be done with a top down analysis of the control flow of the code. But a top down analysis will need to be redone every time the code is changed.

Best Practice: On startup, paint an unlikely memory pattern throughout the stack(s). (I like to use hex 23 3D 3D 23, which looks like a fence ‘#==#’ in an ASCII memory dump.) At runtime, have a supervisor task periodically check that none of the paint above some pre-established high water mark has been changed. If something is found to be amiss with a stack, log the specific error (e.g., which stack and how high the flood) in non-volatile memory and do something safe for users of the product (e.g., controlled shut down or reset) before a true overflow can occur. This is a nice additional safety feature to add to the watchdog task.

Firmware-Specific Bug #3

Firmware-Specific Bug #5 (coming soon)

Firmware-Specific Bug #3: Missing Volatile Keyword

Thursday, February 18th, 2010 Michael Barr

Failure to tag certain types of variables with C’s ‘volatile’ keyword, can cause a number of symptoms in a system that works properly only when the compiler’s optimizer is set to a low level or disabled. The volatile qualifier is used during variable declarations, where its purpose is to prevent optimization of the reads and writes of that variable.

For example, if you write code that says:


    g_alarm = ALARM_ON;    // Patient dying--get nurse!
    // Other code; with no reads of g_alarm state.
    g_alarm = ALARM_OFF;   // Patient stable.

the optimizer will generally try to make your program both faster and smaller by eliminating the first line above–to the detriment of the patient. However, if g_alarm is declared as volatile this optimization will not take place.

Best Practice: The ‘volatile’ keyword should be used to declare any: (a) global variable shared by an ISR and any other code; (b) global variable accessed by two or more RTOS tasks (even when race conditions in those accesses have been prevented); (c) pointer to a memory-mapped peripheral register (or register set); or (d) delay loop counter.

Note that in addition to ensuring all reads and writes take place for a given variable, the use of volatile also constrains the compiler by adding additional “sequence points”. Accesses to multiple volatiles must be executed in the order they are written in the code.

Firmware-Specific Bug #2

Firmware-Specific Bug #4

Embedded Software is the Future of Product Quality and Safety

Monday, February 8th, 2010 Michael Barr

Last year a friend had a St. Jude pacemaker attached to his heart. When he reported an unexpected low battery reading (displayed on an associated digital watch) to his doctor a month later, he learned this was the result of a firmware bug known to the manufacturer. The battery was fine and would last on the order of a decade more. His new-model pacemaker’s firmware didn’t include a bug fix that was provided the year before to wearers of old-model.

Another friend owns a Land Rover LR2 SUV with back-up sensors. When the car is in reverse and nearing an obstacle or another car, the driver is alerted via a beeping sound. Except that the back-up sensors don’t always work. Some “reboots” of the SUV don’t seem to have this feature enabled. He suspects there is a “race condition” during the software startup sequence.

Yet another friend has driven a Toyota Prius hybrid over 100,000 miles. He reports that the brakes very occasionally have an odd/different feel. But his older model Prius is not expected to be subject to the 2010 model year recall.

These are just a few of the personal anecdotes behind the headlines. Embedded software is everywhere now, with over 4 billion new devices manufactured each year. Increasingly the quality and safety of products is a side-effect of the quality and safety of the software embedded inside.

One important question is, can we trust future software updates any more than we can trust the existing firmware? How do we know that the Toyota Prius hybrids with upgraded braking firmware will be safer than those with the factory firmware?

Is Toyota’s Accelerator Problem Caused by Embedded Software Bugs?

Thursday, January 28th, 2010 Michael Barr

Last month I received an interesting e-mail in response to a column I wrote for Embedded Systems Design called The Lawyers are Coming! My column was partly about the poor state of embedded software quality across all industries, and my correspondent was writing to say my observations were accurate from his perch within the automotive industry. Included in his e-mail was this interesting tidbit:

I read something about the big Toyota recall being related to floor mats interfering with the accelerator, but I was told that the problem appears to be software (firmware) for the control-by-wire pedal.  Me thinks somebody probably forgot to check ranges, overflows, or stability properly when implementing the “algorithm”.

As background for those of you who have been working in SCIFs or other labs, the “big Toyota recall” was first announced in September 2009. It was said to concern removable floor mats causing the accelerator pedal to be pressed down. Some 3.8 million Toyota and Lexus vehicles were involved and owners were told to remove floor mats immediately.

This week several related major news events have transpired, including:

But none of the articles I’ve read have talked about software being a cause. And it’s not clear if the affected models are drive-by-wire. However, at least one article I read yesterday suggested that one fix being worked on is a software interlock to ensure that if both the brake and the gas pedal are depressed, the brake will override the accelerator. On the one hand, that seems to mean that software is already in the middle; on the other, I would be extremely surprised to learn that such an interlock wasn’t already present in a drive-by-wire system.

So what’s the story? Are embedded software bugs to blame for this massive recall? Do you know? Have you found any helpful articles pointing at software problems? Please share what you know in the comments below, or e-mail me privately.