Posts Tagged ‘engineering’

What NHTSA/NASA Didn’t Consider re: Toyota’s Firmware

Wednesday, March 2nd, 2011 Michael Barr

In a blog post yesterday (Unintended Acceleration and Other Embedded Software Bugs), I wrote extensively on the report from NASA’s technical team regarding their analysis of the embedded software in Toyota’s ETCS-i system. My overall point was that it is hard to judge the quality of their analysis (and thereby the overall conclusion that the software isn’t to blame for unintended accelerations) given the large number of redactions.

I need to put the report down and do some other work at this point, but I have a few other thoughts and observations worth writing down.

Insufficient Explanations

First, some of the explanations offered by Toyota, and apparently accepted by NASA, strike me as insufficent. For example, at pages 129-132 of Appendix A to the NASA Report there is a discussion of recursion in the Toyota firmware. “The question then is how to verify that the indirect recursion in the ETCS-i does in fact terminate (i.e., has no infinite recursion) and does not cause a stack overflow.”

“For the case of stack overflow, [redacted phrase], and therefore a stack overflow condition cannot be detected precisely. It is likely, however, that overflow would cause some form of memory corruption, which would in turn cause some bad behavior that would then cause a watchdog timer reset. Toyota relies on this assumption to claim that stack overflow does not occur because no reset occurred during testing.” (emphasis added)

I have written about what really happens during stack overflow before (Firmware-Specific Bug #4: Stack Overflow) and this explains why a reset may not result and also why it is so hard to trace a stack overflow back to that root cause. (From page 20, in NASA’s words: “The system stack is limited to just 4096 bytes, it is therefore important to secure that no execution can exceed the stack limit. This type of check is normally simple to perform in the absence of recursive procedures, which is standard in safety critical embedded software.”)

Similarly, “Toyota designed the software with a high margin of safety with respect to deadlines and timeliness. … [but] documented no formal verification that all tasks actually meet this deadline requirement.” and “All verification of timely behavior is accomplished with CPU load measurements and other measurement-based techniques.” It’s not clear to me if the NASA team is saying it buys those Toyota explanations or merely wanted to write them down. However, I do not see a sufficient explanation in this wording from page 132:

“The [worst case execution time] analysis and recursion analysis involve two distinctly different problems, but they have one thing in common: Both of their failure modes would result in a CPU reset. … These potential malfunctions, and many others such as concurrency deadlocks and CPU starvation, would eventually manifest as a spontaneous system reset.” (emphasis added)

Might not a deadlock, starvation, priority inversion, or infinite recursion be capable of producing a bit of “bad behavior” (perhaps even unintended acceleration) before that “eventual” reset? Or might not a stack overflow just corrupt one or a few important variables a little bit and that result in bad behavior rather than or before a result? These kinds of possibilities, even at very low probabilities, are important to consider in light of NASA’s calculation that the U.S.-owned Camry 2002-2007 fleet alone is running this software a cumulative one billion hours per year.

Paths Not Taken

My second observation is based upon reflection on the steps NASA might have taken in its review of Toyota’s ETCS-i firmware, but apparently did not. Specifically, there is no mention anywhere (unless it was entirely redacted) of:

  • rate monotonic analysis, which is a technique that Toyota could have used to validate the critical set of tasks with deadlines and higher priority ISRs (and that NASA could have applied in its review),
  • cyclomatic complexity, which NASA might have used as an additional winnowing tool to focus its limited time on particularly complex and hard to test routines,
  • hazard analysis and mitigation, as those terms are defined by FDA guidelines regarding software contained in medical devices, nor
  • any discussion or review of Toyota’s specific software testing regimen and bug tracking system.

Importantly, there is also a complete absence of discussion of how Toyota’s ETCS-i firmware versions evolved over time. Which makes and models (and model years) had which versions of that firmware? (Presumably there were also hardware changes worthy of note.) Were updates or patches ever made to cars once they were sold, say while at the dealer during official recalls or other types of service?

Social Networking for Engineers

Friday, February 4th, 2011 Michael Barr

Would your best friend describe you as a particularly “social” person? Do you like to “network” and meet new people? If you’re an engineer, your answer is probably something like,

“Um, no and no. Now can I slink back to my cube, Mr. Nosy McSales Guy?”

The growth of “social networking” in its many forms is a remarkable phenomenon that’s proving powerful enough to reshape the economic landscape and trouble despotic regimes. For example, if (6 year old!) Facebook were a country it would already be the world’s 3rd most populous.

That we the engineers–who ultimately make stuff like this possible–are mostly a loose band of individuals self-selected for our lack of people skills (a key trait that allows us to sit in cubes all day focusing deep-deep-deep on new technology) may explain why so many of us are luddites when it comes to using this “social” technology.

Some of us rationalize that we don’t like connecting with people offline, so why would we do that online. Others that reading status updates from other people will take valuable time away from more important stuff. This fun video sums it all up,

“Until recently, wasting time on computers was the domain of engineers alone. Now even my Nana wants to keep me up to date on the status of her cats!”

But there’s a lot of value in social networking for engineers. Here’s how I use three social networking websites and why you should join them too.

LinkedInmy cloud-based self-updating address book

Every user on LinkedIn creates a “public profile page”, which is something like a resume. Your profile gives your current job title, the name of your employer, and the nearest big city. If you want, your public profile also has space for you to expand on what you do in your current job or in your career generally. You can also list where you went to University, what you majored in, and your past employment history–complete with praise quotes from former colleagues and managers.

When you “connect” to another LinkedIn user, they get to see your private information too. This includes (by default) your e-mail address and phone number, as well as the names of your other connections. The majority of LinkedIn users seem to have on the order of 100 connections once they get setup. Your “in” list consists mostly of current and past colleagues, perhaps some classmates or other chums, etc.

Although it is not specifically advertised this way and has many other valuable features, I think of LinkedIn as primarily my cloud-based self-updating address book. It’s an address book in that I can easily search for your phone number or e-mail address once we connect. If I can’t remember or spell your last name, I can search by first name and anything else I can remember about you, like the name of an employer. And, as long as you take the few minutes to update your profile page and contact info each time you change jobs, we’ll never lose touch with each other. Wow!

I’ve used LinkedIn to easily reconnect with old friends as well as to stay connected to colleagues, friends, and pretty much anyone who hands me their business card. Although I also have an offline address book, that’s now much smaller than it used to be–and just for tracking those phone numbers and e-mail addresses that I use on a weekly or monthly basis.

There are smartphone apps for LinkedIn and I have one on my iPhone, but I rarely use it. I don’t visit LinkedIn every day or even every week. Instead I visit the LinkedIn website in little bursts–such as just after a conference–or when I want to find someone’s phone number. I’ve also turned off most of their automatic e-mails at this point, though those can be useful prompts when you’re just getting started.

You can view my public profile at http://linkedin.com/in/netrinomike. If we’ve met somewhere (online or off), feel free to send me an invitation.

Twittermy own private specialized news service

Twitter is something completely different. In fact, it is hard to describe what twitter is. That’s partly because it is many different things to many different people. For example, I often hear people say they don’t use twitter because they don’t want to know what their friend Joe had for lunch. But I’ve been using Twitter almost two years and have never learned what anyone had for lunch there.

Thus rather than try to describe Twitter or its capabilities, I’ll just tell you how I use it as an engineer. I currently “follow” 276 twitter users. Just a handful of these are “friends”, though a larger set are “acquaintances”; most I’ve never met. When one of the users that I follow writes something (in the lingo, “tweets”), I see it in a timeline of recent posts. All of the posts are short text (maximum 140 characters). I usually check in on this timeline 1 or 2 times a day, at which point I scan them for interesting bits of information; except for sometimes following links to longer articles, this activity takes on the order of 15 minutes a day tops.

I DON’T follow users who tweet a lot–say more than ten times per day (a quick look suggests they collectively average less than one per day)–for long. And I DON’T follow users that tweet what they ate for lunch. In fact, I ONLY follow users that typically include a link in every tweet. That is, what they are doing is feeding me a headline of possible interest; if it is of interest and I have time, then I follow the link to read more.

The vast majority of the users I follow are in the embedded systems design community. Some are engineers. Some are marketers. Some sell tools that I use. Some are just in software or engineering more broadly. A few cover hobby interests of mine. The best tweeters always stay on topic, in their area of expertise–just as I try to do by posting from a narrower topic area than I read.

From reading these streams of short headlines I stay vastly more up to date on the technologies and products and subjects of most interest to me than was ever possible before. I’ve basically stopped reading newspaper websites and some blogs and read twitter instead. (But just like printed newspapers, when you don’t have time to keep up, the old stuff just drifts to the bottom of the stack where you may never get to it.)

By the way, I read and post Twitter messages almost exclusively from an app (Twitterific) on my iPhone. I hardly ever visit the Twitter website directly. I prefer the user experience of the app and can easily find spare minutes to read from my phone while away from my desk.

You can view a timeline of my tweets at http://twitter.com/netrinomike. If you find the kinds of links I post there interesting, feel free to “follow” me. Unlike most other social networking services, you can follow anyone on Twitter just for knowing their handle.

Deliciousmy Internet memory book

Delicious is an Internet bookmarking service that can be social if you want it to be. By bookmarking service I mean that it’s an alternative to the long list of bookmarks you’ve probably been keeping in your browser.

Rather, as I come across interesting web pages during Internet research, I save those I think I may want to come back to sometime later in delicious. There are a number of advantages of keeping bookmarks in this way:

  • you can add notes to each bookmark
  • you can categorize (“tag”) each bookmark in as many ways as you want (e.g., “embedded” + “bloggers”)
  • you can search for a previous bookmark by keyword or tag
  • your bookmarks are not tied to a specific browser on a specific computer

After using Delicious for more than five years, I now keep just 12 bookmarks in my web browser. These are links that I use daily or weekly. One of those is a shortcut to add the page I’m on to delicious; another to my delicious history.

Delicious can be social in that you can easily share links with friends and see what’s popular across all users and things like that. I never use any of those features. (For one thing, what’s popular on the whole site never includes the stuff about embedded software that I’m most passionate about.) But though I don’t connect to other delicious users much I do make the majority of my bookmarks public–so you can browse or search them too.

You can see my public bookmarks at http://www.delicious.com/frappucino.

Please share your experiences with social networking and suggestions for other useful services in the comments below. (I do use Facebook, by the way, but not for professional purposes.)

Embedded Software is the Future of Product Quality and Safety

Monday, February 8th, 2010 Michael Barr

Last year a friend had a St. Jude pacemaker attached to his heart. When he reported an unexpected low battery reading (displayed on an associated digital watch) to his doctor a month later, he learned this was the result of a firmware bug known to the manufacturer. The battery was fine and would last on the order of a decade more. His new-model pacemaker’s firmware didn’t include a bug fix that was provided the year before to wearers of old-model.

Another friend owns a Land Rover LR2 SUV with back-up sensors. When the car is in reverse and nearing an obstacle or another car, the driver is alerted via a beeping sound. Except that the back-up sensors don’t always work. Some “reboots” of the SUV don’t seem to have this feature enabled. He suspects there is a “race condition” during the software startup sequence.

Yet another friend has driven a Toyota Prius hybrid over 100,000 miles. He reports that the brakes very occasionally have an odd/different feel. But his older model Prius is not expected to be subject to the 2010 model year recall.

These are just a few of the personal anecdotes behind the headlines. Embedded software is everywhere now, with over 4 billion new devices manufactured each year. Increasingly the quality and safety of products is a side-effect of the quality and safety of the software embedded inside.

One important question is, can we trust future software updates any more than we can trust the existing firmware? How do we know that the Toyota Prius hybrids with upgraded braking firmware will be safer than those with the factory firmware?

Firmware Wall of Shame: Welch Allyn Defibrillator Recall

Tuesday, March 17th, 2009 Michael Barr

The FDA has just announced a Class I (the most serious human risk category) recall of the Welch Allyn AED 10 automatic external defibrillator (shown).

Among the reasons for the recall are the following problems that are either caused by embedded software bugs or hardware problems able to be fixed entirely through a firmware upgrade:

  • “Units serviced in 2007 and upgraded with software version 02.06.00 have a remote possibility of shut down during use in cold environmental conditions. There are no known injuries or deaths associated with this issue. The units will be updated with the current version of software.”
  • “All of the recalled units will be upgraded with software that corrects [another] unexpected shutdown problem. In the meantime … it is vital to follow the step 1-2-3 operating procedure which directs attachment of the pads after the device has been turned on. This procedure is described on the back of your device and also in the Quick Reference material inside the AED 10 case. Some pages in the user’s manual may erroneously describe or show illustrations of [a different] operating procedure… Please disregard these erroneous instructions.”

There has been at least one death at a time when the second type of unexpected software shutdown occurred. Are bugs in the embedded software to blame? Of what sort? Could the authors of that firmware be sued in relation to the death? Were they negligent? Are they sure that there are Zero Bugs (or even just fewer bugs) in the “current version of the software”?

Expect more of this type of firmware-involved death as embedded systems continue to proliferate.

Requirements vs. Design

Wednesday, February 4th, 2009 Michael Barr

Over the years, I have found that many engineers (as well as their managers) struggle to separate the various elements or layers of firmware engineering. For example, we are barraged with requests for “design reviews” that turn out to be “code reviews” because the customer is confused about the meaning of “design”.

In the hopes of clearing this up, I propose a concise set of definitions and an architectural analogy.

Requirements
The requirements are the WHAT of the system. A set of requirements is a list of statements each of which begins “The system shall…” Each such statement must be objective and testable. The requirements should not unnecessarily restrict the HOW of the architecture, design, or implementation.

Architecture
The architecture of a system is the outermost layer of HOW. The architecture is a block diagram. The architecture of a system describes dataflow and workflow partitioning at the hardware vs. software level. The architecture of firmware features subsystem-level blocks such as device drivers, middleware, RTOS, etc. The architecture does not include function or variable names. It should be extensible in the direction of anticipated future changes.

Analogy: An architect describes a new building very broadly. A scale model and drawings show the outer dimensions, foundation, and number of floors. The number of rooms and their specific uses are not included at this level.

Design
The design of a system is the middle layer of HOW. A firmware design document identifies finer structural details, such as the names and responsibilities of tasks within the specific subsystems or device drivers, the brand of RTOS (if one is used), and the various interfaces between subsystems. The design does include class, task, function, and variable names that must be agreed upon by all implementers.

Analogy: A designer describes the interior and exterior of the new building in finer detail than the architect. He locates and names the rooms and gives them purposes. The location of pipes and vents and outlets are not included at this level.

Implementation
An implementation is the lowest layer of HOW. There is no document, other than the source code or schematics, to describe the implementation details. If the interfaces are defined sufficiently at the design level above, individual engineers are able to begin implementation in parallel.

Analogy: The carpenter, plumber, and electrician work in parallel and apply their own judgement about the finer details of component placement.

Constructive feedback is welcome via the blog comments or e-mail.