Posts Tagged ‘realtime’

Beer and Boards at ESC Silicon Valley

Wednesday, April 6th, 2011 Michael Barr

It really looks like I’ve picked the wrong year to miss ESC Silicon Valley (due to a schedule conflict). (The last time I wasn’t at ESC, it was 1997 and White Zombie was still together. The first thing I’d really liked to have seen is Steve Wozniak‘s keynote speech. The second thing I’m really sad to miss is the just announced “Beer and Boards” party/giveaway.

Beer and Boards sounds really fun. Here’s how it works: Every “All Access” attendee will get to choose one of three free development kits to take home:

Once you select your preferred kit you will receive information on the time and place for the relevant Beer and Boards party, at which you will get to drink free beer at a special meet-and-greet with one of your kit’s designers to talk about your new kit and its capabilities. Three boards spread out over three days.

Nerds drinking beer! I love it. What will they think of next?

Before you register for this year’s ESC, be sure to check out my earlier post Save Big on Embedded Systems Conference Registration. Also, remember to use the promo code BARR20 to save an additional 20% off registration and be entered to win a free seat at a future Embedded Software Boot Camp or one of 20 free copies of the Embedded C Coding Standard.

What NHTSA/NASA Didn’t Consider re: Toyota’s Firmware

Wednesday, March 2nd, 2011 Michael Barr

In a blog post yesterday (Unintended Acceleration and Other Embedded Software Bugs), I wrote extensively on the report from NASA’s technical team regarding their analysis of the embedded software in Toyota’s ETCS-i system. My overall point was that it is hard to judge the quality of their analysis (and thereby the overall conclusion that the software isn’t to blame for unintended accelerations) given the large number of redactions.

I need to put the report down and do some other work at this point, but I have a few other thoughts and observations worth writing down.

Insufficient Explanations

First, some of the explanations offered by Toyota, and apparently accepted by NASA, strike me as insufficent. For example, at pages 129-132 of Appendix A to the NASA Report there is a discussion of recursion in the Toyota firmware. “The question then is how to verify that the indirect recursion in the ETCS-i does in fact terminate (i.e., has no infinite recursion) and does not cause a stack overflow.”

“For the case of stack overflow, [redacted phrase], and therefore a stack overflow condition cannot be detected precisely. It is likely, however, that overflow would cause some form of memory corruption, which would in turn cause some bad behavior that would then cause a watchdog timer reset. Toyota relies on this assumption to claim that stack overflow does not occur because no reset occurred during testing.” (emphasis added)

I have written about what really happens during stack overflow before (Firmware-Specific Bug #4: Stack Overflow) and this explains why a reset may not result and also why it is so hard to trace a stack overflow back to that root cause. (From page 20, in NASA’s words: “The system stack is limited to just 4096 bytes, it is therefore important to secure that no execution can exceed the stack limit. This type of check is normally simple to perform in the absence of recursive procedures, which is standard in safety critical embedded software.”)

Similarly, “Toyota designed the software with a high margin of safety with respect to deadlines and timeliness. … [but] documented no formal verification that all tasks actually meet this deadline requirement.” and “All verification of timely behavior is accomplished with CPU load measurements and other measurement-based techniques.” It’s not clear to me if the NASA team is saying it buys those Toyota explanations or merely wanted to write them down. However, I do not see a sufficient explanation in this wording from page 132:

“The [worst case execution time] analysis and recursion analysis involve two distinctly different problems, but they have one thing in common: Both of their failure modes would result in a CPU reset. … These potential malfunctions, and many others such as concurrency deadlocks and CPU starvation, would eventually manifest as a spontaneous system reset.” (emphasis added)

Might not a deadlock, starvation, priority inversion, or infinite recursion be capable of producing a bit of “bad behavior” (perhaps even unintended acceleration) before that “eventual” reset? Or might not a stack overflow just corrupt one or a few important variables a little bit and that result in bad behavior rather than or before a result? These kinds of possibilities, even at very low probabilities, are important to consider in light of NASA’s calculation that the U.S.-owned Camry 2002-2007 fleet alone is running this software a cumulative one billion hours per year.

Paths Not Taken

My second observation is based upon reflection on the steps NASA might have taken in its review of Toyota’s ETCS-i firmware, but apparently did not. Specifically, there is no mention anywhere (unless it was entirely redacted) of:

  • rate monotonic analysis, which is a technique that Toyota could have used to validate the critical set of tasks with deadlines and higher priority ISRs (and that NASA could have applied in its review),
  • cyclomatic complexity, which NASA might have used as an additional winnowing tool to focus its limited time on particularly complex and hard to test routines,
  • hazard analysis and mitigation, as those terms are defined by FDA guidelines regarding software contained in medical devices, nor
  • any discussion or review of Toyota’s specific software testing regimen and bug tracking system.

Importantly, there is also a complete absence of discussion of how Toyota’s ETCS-i firmware versions evolved over time. Which makes and models (and model years) had which versions of that firmware? (Presumably there were also hardware changes worthy of note.) Were updates or patches ever made to cars once they were sold, say while at the dealer during official recalls or other types of service?

Unintended Acceleration and Other Embedded Software Bugs

Tuesday, March 1st, 2011 Michael Barr

Last month, NHTSA and the NASA Engineering and Safety Center (NESC) published reports of their joint investigation into the causes of unintended acceleration in Toyota vehicles. NASA’s multi-disciplinary NESC technical team was asked, by Congress, to assist NHTSA by performing a review of Toyota’s electronic throttle control and the associated embedded software. In carefully worded concluding statement, NASA stated that it “found no electronic flaws in Toyota vehicles capable of producing the large throttle openings required to create dangerous high-speed unintended acceleration incidents.” (The official reports and a number of supporting files are available for download at http://www.nhtsa.gov/UA.)

The first thing you will notice if you join me in trying to judge the technical issues for yourself are the redactions: pages and pages of them. In parts and entirely for unexplained reasons, this report on automotive electronics reads like the public version of a CIA Training Manual. I’ve observed that approximately 193 of the 1,061 pages released so far feature some level of redaction (via black boxes, which obscure from a single number, word, or phrase to a full table, page, or section). The redactions are at their worst in NASA’s Appendix A, which describes NASA’s review of Toyota’s embedded software in detail. More than half of all the pages with redactions (including the vast majority of fully redacted tables, pages, and sections) are in that Appendix.

Despite the redactions, we can still learn some interesting facts about Toyota’s embedded software and NASA’s technical review of the same. The bulk of the below outlines what I’ve been able to make sense of in about two days of reading. Throughout, my focus is on embedded software inside the electronic throttle control, so I’m leaving out considerations of other potential causes, including EMI (which NASA also investigated). First a little background on the investigation.

Background

Although the inquiry was taken to examine unintended acceleration reports across all Toyota, Scion, and Lexus models, NASA focused its technical inquiry almost entirely on Toyota Camry models equipped with the Electronic Throttle Control System, Intelligent (ETCS-i). The Camry has long been among the top cars bought in the U.S., so this choice probably made finding relevant complaint data and affected vehicles easier for NHTSA. (BTW, NASA says the voluntary complaint database shows both that unintended accelerations were reported before the introduction of electronic throttle control and that press coverage and Congressional hearings can increase the volume of complaints.)

According to a press release by the company made upon publication of the NHTSA and NASA reports, Toyota’s ETCS-i has been installed in “more than 40 million cars and trucks sold around the world, including more than 16 million in the United States.” Undoubtedly, ETCS-i has also “made possible significant safety advances such as vehicle stability control and traction control.” But as with any other embedded system there have been refinements made through the years to both the electronics and the embedded software.

Though Toyota apparently made available, under agreed terms and via its attorneys, schematics, design documents, and source code “for multiple Camry years and versions” (Appendix A, p. 9) as well as many of the Japanese engineers involved in its design and evolution, NASA only closely examined one version. In NASA’s words, “The area of emphasis will be the 2005 Toyota Camry because this vehicle has a consistently high rate of reported ‘UA events’ over all Toyota models and all years, when normalized to the number of each model and year, according to NHTSA data.” (p. 7) Except as otherwise stated, everything else in this column concerns the electronics and firmware found in that year, make, and model.

Event Data Recorders

Event Data Recorder (EDR) is the generic term for the automotive equivalent of an aircraft black box flight data recorder. EDRs were first installed in cars in the early 1990s and have increased in use as well as sophistication in the years since. Generally speaking, the event data recorder is an embedded system residing within the airbag control module located in the front center of the engine compartment. The event data recorder is connected to other parts of the car’s electronics via the CAN bus and is always monitoring vehicle speed, the position of the brake and accelerator pedals, and other key parameters.

In the event of an impossibly high (for the vehicle operating normally) acceleration or deceleration sensor reading, Toyota’s latest event data recorders save the prior five 1Hz samples of these parameters in a non-volatile memory area. Once saved, an event record can be read over the car’s On-Board Diagnostics (OBD) port (or, in the event of a more severe accident, directly from the airbag control module) via a special cable and PC software. If the airbag actually deploys, the event record will be permanently locked. The last 2 or 3 (depending on version) lesser “bump” records are also stored, but may be overwritten in a FIFO manner.

This investigation of Toyota’s unintended acceleration marked the first time that anyone from NHTSA had ever read data from a Toyota event data recorder. (Toyota representatives apparently testified in Congress that there had previously just been one copy of the necessary PC software in the U.S.) As part of this study, NHTSA validated and used tools provided by Toyota to extract historical data from 52 vehicles involved in incidents of unintended acceleration, with acknowledged bias toward geographically reachable recent events. After reviewing driver and other witness statements and examining said black box data, NHTSA concluded that 39 of these 52 events were explainable as “pedal misapplications.” That’s a very nice way of saying that whenever the driver reported “stepping on the brake” he or she had pressed the accelerator pedal by mistake. Figure 5 of a supplemental report describing these facts portrays an increasing likelihood of such incidents with driver age vs. the bell curve of Camry ownership by age.

Note that no record is apparently ever made, in the event data recorder or elsewhere, of any events or state changes within the ETCS-i firmware. So-called “Diagnostic Trouble Codes” concerning sensor and other hardware failures are recorded in non-volatile memory and the presence of one or more such codes enables the “Check Engine” light on the dashboard. But no logging is done of significant software faults, including but not limited to watchdog-initiated resets.

Engine Control Module

ETCS-i is a collection of components and features that was changed in the basic engine design when Toyota switched from mechanical to electronic throttle control. (Electronic throttle control is also known as “throttle-by-wire”.) Toyota has used two different types of pedal sensors in the ETCS-i system, always in a redundant fashion. The earlier design, pre-2007, using potentiometers was susceptible to current leakage via growth of tin whiskers. Though this type of failure was not known to cause sudden high-speed behaviors, it did seem to be associated with a higher number of warranty claims. The newer pedal sensor design uses Hall effect sensors.

Importantly, the brakes are not a part of the ETCS-i system. In the 2005 Camry, Toyota’s brake pedal was mechanically controlled. (It may still be.) It appears this is one of the reasons the NASA team felt comfortable with their conclusion that driver reports of wide open throttle behavior that could not be stopped with the brakes were not caused by software failures (alone). “The NESC team did not find an electrical path from the ETCS-i that could disable braking.” (NASA Report, p. 15) It is clear, though, that power assisted brakes lose the enabling vacuum pressure when the throttle is wide open and the driver subsequently pumps the brakes; thus any system failure that opened the throttle could indirectly make bringing the vehicle to a stop considerably harder.

The Engine Control Module at the heart of the ETCS-i consists of a Main-CPU and a Sub-CPU located within a pair of ASICs. The Sub-CPU contains a set of A/D converters that translates raw sensor inputs, such as voltages VPA and VPA1 from the accelerator pedal, into digital position values and sends them to the Main-CPU via a serial interface. In addition, the Sub-CPU monitors the outputs of the Main-CPU and is able to reset (in the manner of a watchdog timer) the Main-CPU.

The Main-CPU is reported to be a V850E1 microcontroller, which is “a 32-bit RISC CPU core for ASIC” designed by Renesas (nee NEC). The V850E1 processor has a 64MB program address space, which is part of an overall 4GB linear address space. The Main-CPU also keeps tabs on the Sub-CPU and can reset it if anything is found wrong.

NASA reports that the embedded software in the Main-CPU is written (mostly) in ANSI C and compiled using a GreenHills C compiler (Appendix A, p. 14). Furthermore, an OSEK-compliant real-time operating system with fixed-priority preemptive scheduling is used to manage a redacted (but apparently larger than ten, based on the size of the redaction) number of real-time tasks. The actual firmware development (design, coding and unit testing) was outsourced to Denso (p. 19). Toyota apparently performed integration testing and ran several commercial and in-house static analysis tools, including QAC (p. 20). The code was written in English, with Japanese comments and design documents, and follows a proprietary Toyota naming convention/coding standard that predates but half overlaps with the 1998 version of MISRA-C.

Are There Bugs in Toyota’s Firmware?

In the NASA Report’s executive summary it is made clear that “because proof that the ETCS-i caused the reported UAs was not found does not mean it could not occur.” (NASA Report, p. 17) The report also states that NASA’s analysis was time-limited and top-down, remarking “The Toyota Electronic Throttle Control (ETC) was far more complex than expected involving hundreds of thousands of lines of software code” and that this affected the quality of a planned peer review.

It’s stated that “Reported [Unintended Accelerations (UAs)] are rare events. Typically, the reporting of UAs is about 1/100,000 vehicles/year.” But there are millions of cars on the road, and so NHTSA has collected some “831 UA reports for Camry” alone. “Over one-half of the reported events described large (greater than 25 degrees) high-throttle opening UAs of unknown cause” (NASA Report, p. 14), the causes of which are never fully explained in these reports.

The NASA apparently identified some lesser firmware bugs themselves, saying “[our] logic model verifications identified a number of potential issues. All of these issues involved unrealistic timing delays in the multiprocessing, asynchronous software control flow.” (Appendix A, p. 11) NASA also spent time simulating possible race conditions due to worrisome “recursively nested interrupt masking” (pp, 44-46); note, though, that simulation success is not a sufficient proof of lack of races. As well, the NASA team seems to recommend “reducing the amount of global data” (p. 38) and eliminating “dead code” (p. 40).

Additionally, the redacted text in other parts of Appendix A seems to be obscuring that:

  • The standard gcc compiler version 4″ generated a redacted number of warnings (probably larger than 100) about the code, in 11 different warning categories. (p. 25)
  • Coverity version 4.2″ generated a redacted number of warnings (probably larger than 154) about the code, in 10 different warning categories. (p. 27)
  • Codesonar version 3.6p1″ generated a redacted number of warnings (probably larger than 136) about the code, in 10 different warning categories.
  • Uno version 2.12″ generated a redacted number of warnings (probably larger than 72) about the code, in 9 different warning categories.
  • The code contained at least 347 deviations from a subset of 14 of the MISRA-C rules.
  • The code contained at least 243 violations of a subset of 9 of the 10 “Power of 10–Rules for Developing Safety Critical Code,” which was published in IEEE Computer in 2006 by NASA team member Gerard Holzmann.

It looks to me like Figure 6.2.3-1 of the NASA Report (p. 30) shows that UA complaints filed with NHTSA increased in the year of introduction of electronic throttle control for the vast majority of Toyota, Scion, and Lexus models–and that complaint counts have remained higher but generally declined over time since those transitions years. Such a complaint data pattern is perhaps consistent with firmware bugs. (Note to NHTSA: It would be helpful to see this same chart normalized by number of vehicles sold by model year and with the rows sorted by the year of ETC introduction. It would also be nice to see a chart of ETCS-i firmware versions and updates, which vehicles they apply to, and the dates on which each was put into new production vehicles or distributed through dealers.)

Final Thoughts

I am not privy to all of the facts considered by the NHTSA or NASA review teams and thus cannot say if I agree or disagree with their overall conclusion that embedded software bugs are not to blame for reports of unintended acceleration in Toyota vehicles. How about you? If you’ve spotted something I missed in the reports from NHTSA or NASA, please send me an e-mail or leave a comment below. Let’s keep the conversation going.

Embedded Software Boot Camp in a Box

Wednesday, December 15th, 2010 Michael Barr

Whether you are new to embedded software development in C or looking for ways to improve your skills, the Embedded Software Boot Camp in a Box will provide you the hands-on education you need. Exercises are based around an ARM processor board (shown below), the MicroC/OS-II real-time operating system, and the IAR Embedded Workbench compiler/debugger, all of which are included in the box.

STR912-SK

Learn Embedded Programming on an ARM Processor

Netrino’s popular Embedded Software Boot Camp (see upcoming dates), on which this kit is based, is an intense in-person training experience that requires attendees to be able to check out of normal work and life routines for a week—sometimes also travelling a great distance. The Embedded Software Boot Camp in a Box is a way to learn the same skills at your own pace. You’ll do the same exercises and have access to the same materials, just won’t have a “drill instructor” or the clock to prod you.

Here’s how you’ll use the Embedded Software Boot Camp in a Box to learn embedded programming:

  • Read the 350 page “Field Manual” book, which contains the slides from the in-person Boot Camps, in order.
  • If you want to dig deeper, watch the video of Michael Barr‘s acclaimed “How to Prioritize RTOS Tasks and Why it Matters” lecture on DVD, or read the three books and numerous articles provided as PDFs on the USB drive.
  • As you read, you will come to slides titled “Exercise: …”. These slides mark the best points to attempt each exercise.
  • In all there are ten programming exercises: one to test your compiler/debugger/board setup; two concerning hardware interfacing in C; six concerning multithreaded programming with uC/OS-II; and one capstone project to build a scuba dive computer. These involve hardware interactions such as blinking LEDs, debouncing pushbuttons, reading A/D converters, working with programmable timer/counters, and generating audio tones via PWM signals.
  • Detailed instructions for each exercise can be found in the printed “Exercise Manual”.
  • Solutions for each of the exercises are provided on the USB drive.
  • After you finish with the included exercises, you’ll know your way around most of your ARM processor board and be ready to explore the rest of its hardware (RS-232, CAN, Ethernet, USB, etc.) on your own.

For more details or to order your kit now, browse on over to http://www.netrino.com/Boot-Camp-Box.

Firmware-Specific Bug #10: Jitter

Thursday, December 2nd, 2010 Michael Barr

Some real-time systems demand not only that a set of deadlines be always met but also that additional timing constraints be observed in the process. Such as managing jitter.

An example of jitter is shown in Figure 1. Here a variable amount of work (blue boxes) must be completed before every 10 ms deadline. As illustrated in the figure, the deadlines are all met. However, there is considerable timing variation from one run of this job to the next. This jitter is unacceptable in some systems, which should either start or end their 10 ms runs more precisely.

Jitter Figure 1

If the work to be performed involves sampling a physical input signal, such as reading an analog-to-digital converter, it will often be the case that a precise sampling period will lead to higher accuracy in derived values. For example, variations in the inter-sample time of an optical encoder’s pulse count will lower the precision of the velocity of an attached rotation shaft.

Best Practice: The most important single factor in the amount of jitter is the relative priority of the task or ISR that implements the recurrent behavior. The higher the priority the lower the jitter. The periodic reads of those encoder pulse counts should thus typically be in a timer tick ISR rather than in an RTOS task.

Figure 2 shows how the interval of three different 10 ms recurring samples might be impacted by their relative priorities. At the highest priority is a timer tick ISR, which executes precisely on the 10 ms interval. (Unless there are higher priority interrupts, of course.) Below that is a high-priority task (TH), which may still be able to meet a recurring 10-ms start time precisely. At the bottom, though, is a low priority task (TL) that has its timing greatly affected by what goes on at higher priority levels. As shown, the interval for the low priority task is 10 ms +/- approximately 5 ms.

Jitter Figure 2

Firmware-Specific Bug #9