embedded software boot camp

Toyota’s Embedded Software Image Problem

Friday, March 19th, 2010 by Michael Barr

It remains unclear whether Toyota’s higher-than-industry-average number of complaints regarding sudden unintended acceleration (SUA) is caused (in whole or in part) by an embedded software problem. But whether it is or it isn’t actually firmware, the company has clearly denied it and yet still developed an embedded software “image problem”. They’ve brought some of this on themselves.

Side Note: I think it is a net positive that journalists, the mass media, and a broader swath of the general public are increasingly aware that there is software embedded inside cars, airplanes, medical devices, and just about everything else with a power supply or batteries. Firmware has been inside these products for many years, of course. But as I wrote in a recent article in Electronic Design, my experience working with companies across many industries lead me to believe there is a looming firmware quality crisis. Greater public awareness is sure to bring litigation. This will force engineering management to care more about firmware quality than they currently do.

Toyota’s Firmware Image Problem

Long before the “floor-mat recall” NHTSA had logged a higher number of unintended acceleration complaints (4.51 complaints per 100,000 cars sold for the 2005 to 2010 model years) for Toyota than any other company. (A recent Washington Post graphic has more data.) Apparently, NHTSA and Toyota were investigating the reports–but hadn’t yet taken any action.

It seems that what set that first Toyota recall in motion was a high-profile fatal August 2009 crash involving an off-duty California Highway Patrol office, his family, a runaway Lexus, and a disturbing 911 call,  Given the context of that specific crash, I’m not convinced the floor mat recall made much sense. In particular, I find it hard to believe that a police officer with adrenaline pumping through his veins and his family’s life on the line, wouldn’t just rip a stuck floor mat out of the way like the Incredible Hulk. (Or that he would choose running off the road at 125 mph vs. shutting the vehicle off entirely.)  But I don’t have all the facts about either that specific accident or the reasoning behind the floor mat recall.

The broader recalls that have happened since have focused on also adding mechanical strength to the accelerator pedals in a number of different makes and models. To this day, Toyota categorically denies any sort of electrical problem.  Yet some cars that have been modified in this way have since been reported to experience unintended acceleration!  Besides which, mechanical parts generally fail visibly or entirely once they first fail–rather than intermittently.  Intermittent failures are far more common with electronics (think EMI) and firmware.

Toyota’s firmware image problem stems from two things:  First, they have separately recalled the Prius for a braking-related firmware upgrade.  Other possible Prius software issues have been identified by Steve Wozniak and Jim Sikes, but these have not yet been confirmed.  Additionally, the continued reliance (by Toyota and NHTSA) on theories such as “we can’t reproduce the problem and we haven’t been able to see it during testing” as proof that there’s not a software bug is simply unbelievable.  

Anyone who works with software knows from experience that lots of bugs can’t be easily reproduced.  The fact that these incidents can’t be reproduced is not a proof of anything.

Software in Cars: The Future

Don’t get me wrong.  I want more software in my car not less.  I very much look forward to the day that an in-car computer takes over the driving for me.  After all, some cars already have more sensor data to make decisions on than the driver does.  Imagine what a car with an integrated GPS navigation system, auto-follow cruise control, and collision avoidance systems could do.  While I guess that I should move left one lane to avoid a crash, the computer is capable of seeing in all directions at once, calculating all of the trajectories of near-by cars, including instantaneous changes in their acceleration or deceleration.

Additionally, I suspect that even with bugs in a car’s drive-by-wire software the car may be much safer overall for its electronic traction control and anti-lock braking systems.

I just wish that Toyota would own up to the fact that the inability to reproduce a problem doesn’t rule out a software (or EMI) flaw.

Tags: , , ,

16 Responses to “Toyota’s Embedded Software Image Problem”

  1. Jef Mangelschots says:

    Everybody in (embedded) software/system testing knows how difficult it can be to test for every possible situation. One reason is the distribution of knowledge: not everyone knows everything. Management expect engineers to have a broader spectrum knowledge than is usually the case. Engineers rarely admit they know not enough. Automotive engineers, test engineers, safety engineers, mechanical engineers, software engineers, hardware engineers, … all know some aspects of the whole part, but nobody knows everything.
    Look back at how they succeeded in putting a man on the moon in the Apollo project: TEAM’s of diverse engineers rehearsed every possible event they could come up with, in brainstorming sessions and then analyzed the outcome. After the simulations, each engineer described what he saw and how he interpreted his observations BEFORE they were given the big picture by the simulation team.
    I don’t know about the car industry. To my knowledge, this happens very infrequent, if ever at all.

    Another aspect is that there is no substitute for life data. Theoretical models, controlled test environments and simulations are a must, but they MUST be augmented with life data, and as much of it as possible. Modern cars now have ‘black box’ technology that record varying amounts of parameters. The auto industry should come up with a solution to collect everyday life data continuously (e.g. wireless, or download the logs upon each service, …) and continuously analyze them for trends (how close is a typical system operating near its uncomfort zone, what is the frequency of safety-mitigation systems activating, are there any long term drifts towards system unreliability, …).

  2. John McCormick says:

    I heard on NPR that the NHTSA does not have a single software expert on staff!

    Seems like the time has come for laws requiring formal verification of embedded software that is life critical. If you can write a formal predicate for a safety or securing property, we can now prove that the program implements that property. We now routinely use formal methods to verify programs on the size of 500 KSLOC and have seen that the cost savings in the testing phase are greatly reduced. At 1.2 MSLOC, the new UK air traffic control system will soon be the largest formal verification ever done.

  3. Edgar says:

    When you test, it’s a lot easier to prove it’s broken than to prove that it’s working. Actually, I wouldn’t say “it’s working” but “I wasn’t able to make it fail”.

    • Sterling Eanes says:

      Edsgar Dijkstra said it in the 70′s: “Program testing can be used to show the presence of bugs, but never their absence.”

  4. Carle Henson says:

    For years, a running joke was “That can’t possibly happen! Maybe we imagined it.” Of course we realized that the problem we had seen was not imaginary and would not go away on its own and the fact that we could not immediately reproduce it meant nothing.

    I remember a problem which was reported by a customer involving a display that very briefly displayed the wrong values. We tried every thing we could think of to try to reproduce the problem without success and finally concluded that the customer must have been mistaken. It was months later when the problem occured again – this time when we were testing a new release of the software. Since I was now sure that there indeed was a software problem, I was able to find and fix the problem – not by reproducing it and capturing extra data or using a debugger but by reading the code and working through all of the possible causes until I found it.

    Toyota, and everyone else who has unexplained phenomena should shut up about not being able to reproduce the problem and get on with finding and fixing it. That doesn’t mean I think it is easy to solve these problems. Debugging complex real time systems can be incredibly difficult. And as systems become more and more complex and more and more dependent on operating systems and other software packages which we are supposed to just use, debugging can become all but impossible.

    My slogan is “Write it right the first time and then you don’t have to worry about all that testing and debugging.” That is just another way of saying that the best time to fix bugs is before they happen – during the design and implementation stage. Even that is not sufficient, though, unless you can think of everything which is impossible, Thinking of everything is especially difficult when the system interacts with external devices which can react in totally unexpected ways – like in an automobile.

    One of our systems had a problem that would have been catastophic for our customer had it occured during normal operation. The problem was caused by cracked insulation on a wire. Further testing showed that we could simulate the problem by connecting a variable resister between the wire and the point where it had made contact. The problem only occured over a very narrow range of resistance. The probability that this combination of things would occur was nearly infinitesimal and as far as I know, it was never seen again, but we quickly fixed the software to recognize the problem if it did occur again and work around the problem.

  5. Dave Telling says:

    I soooo identify with this issue! We had a product for which we would get reports of intermittent apparent shutdown for some fraction of a second, yet were never able to reproduce the problem on the bench or dyno testing. We tried everything we could think of to force the problem, yet the product ran always without a hitch in our testing. That caused us to theorize that perhaps there was some kind of hardware problem (by this, I mean that there was something in the vehicle and/or the customer’s installation ) that was contributing to the problem. At this point, there have been no complaints for several years, so perhaps it really was a “hardware” issue, but there is no simple way of knowing for certain. I can certainly see Toyota saying that they can find no indication that there is a firmware problem, but I certainly can’t believe that they could say with a straight face that just because they can’t reproduce the problem that there can’t be any firmware issues, unless maybe because their lawyers told them to say that.

  6. Scott H says:

    Good summary of how this raises the awareness of firmware issues, and the underwhelming statements about “can’t reproduce the problem.” That in itself wouldn’t be troubling if there were also an elaboration of how thorough the investigations were into the firmware and sensor systems.

    However, as both an electrical engineer and car enthusiast, I disagree generally about the desirability of increasing automation and electronic content in our automobiles, for several reasons. 1) If you try to keep a car running for more than five years, you’ll find that generally the sensors and wiring fail before anything else, 2) state laws are increasingly requiring “no OBDII” codes to pass state inspections, while usually it is the sensors and wiring that fail, and not anything safety or necessarily emissions related, and 3) increasing automation increases driver dependence on the systems, and decreases any motivation to actually require that anyone be trained to drive properly.

    Electronics and automation have their place, and we wouldn’t have anywhere near the level of low emissions and fuel efficiency we now have. But this comes at a price in long term operating costs and feeds and feeds an attitude of “the car companies” should “fix” our safety problems. Most accidents, if not all, are the result of poor driver judgment and mistakes that cut saftey margins (driving too close, too fast for conditions or sightlines/visibility, etc.). I’d be interested to hear how many other electrical engineers are wary of these issues brought with increasing electronic content, which are in addition to the basic concern of the robustness of their design.

  7. Emil Fred says:

    There are several bugs that you see for a fleeting second during the development cycle. You may not care about it when you are trying to make the product work and it probably will never surface during product test cycle. These are the most dangerous ones especially if the process affected is a critical one. Bug tracking is needed right from R&D and not product testing.
    The need for quality firmware is all the more important these days due to the amount of critical processes controlled by firmware like in cars.

  8. Ian Rumley says:

    Sometimes these kind of bugs are extremely hard to find. I have had two such memorable bugs in my career that shared some common characteristics: rarely occurred (once very few weeks at the most), and then only in the field, concerned communications (one parallel and one serial), and root cause went back to hardware and were relatively easy to fix. One involved a glitch that was only a few nanoseconds wide and was only found after a fast enough ‘scope was used, and the other a line on the MCU that the data sheet claimed was pulled high but in reality was floating. In both cases the appearance to the developers was simply that data magically appeared at the receiver.

  9. Ayyappan Ramasamy says:

    They can not say that it is not reproducible.
    I had an issue in my supported product and it was not at all reproducible as easily as possible. Then my approach was to give random commands to the device under test. Finally, the defect was reproduced. Problem here is to find the test sequence. So they have to test with random parameters to reproduce it. This may not be a logical one, but sometime we need this kind of testing.

  10. Sreenivasa Reddy B says:

    During my career with a worlds leading automotive parts supplier I had comes across with a software design bug. The bug was “if some one jump starts the car when it is parked and key is off car would start running (in jump start the gear is engaged)”, this bug could occur during towing the vehicle while gear engaged, parked in downhill, etc. I was owner of the software module which had this bug. Some real world scenarios are next to impossible to assume while developing such software. Through verifications and validation is the mantra, even then bugs are part and parcel of software.
    I would still buy a car which has more software of course not the one which has my code :)

  11. Jon Willoughby says:

    My Camry’s user manual has details on the event data recorder and the circumstances in which they will turn over the data to safety and law enforcement. This article suggests that, in reality, Toyota has been reluctant to do so.

    http://www.ecnmag.com/News/2010/03/Toyota-black-box-data/

    So what is going on?

  12. Tony Leigh says:

    I’d be wary of having too much firmware in my car. It’s already hard enough to find car mechanics with a good enough understanding of mechanical engineering to fix difficult problems. Asking them to diagnose and fix problems like those above will only compund the problem. A colleague at work has problems with his BMW occasionally going into ‘limp’ mode for no discernible reason. The garage have had the car for a week, and still haven’t been able to fix it.

    Introducing steer-by-wire and brake-by wire could be a minefield. Even if the systems are duplicated, how many cars do you see driving around with just one headlight? And let’s say the steer-by-wire system contains an algorithm to stop the driver turning too sharply and rolling the car. What happens if he swerves to avoid a pedestrian, but the system judges he might roll it and makes him run into the pedestrian instead. Whose fault is it? The driver’s? Or the steer-by-wire system? The lawyers would have a field day.

  13. Parag Barangale says:

    I had a tricky problem which could not get detected, it appeared just once that too while the software was in development stage. I tried it to create it numerous times but it never came up. The device went till the delivery line, for prototype assembly line testing the test engg accidentally setup test bench wrongly and the problem came up.
    Logically the code was doing the math operations correctly but I had to rearrange/break the functions for calculation and it resolved the bug.
    Some times even code review does not reveal the problem. It gets real tough with embedded devices.
    With Toyota I suggest the first thing is to do is to Accept. Unless you accept the problem its difficult to reach for solutions.

  14. A good friend of mine (Dr. Brian Kirk), a leaning expert of software and system safety recently was part of a panel (including electrical engineers, a Toyota sudden acceleration car crash survivor, and car safety experts) providing a briefing on the myths Toyota uses to cover up the truth about sudden acceleration. Here’s a video of that:
    http://www.youtube.com/watch?v=UJnN8IyIumg

  15. [...] President of Netrino and a well respected expert in the embedded community discusses this in his blog post regarding Toyota’s woes. Although this is only a side note in his post, it carries a good [...]

Leave a Reply