Posts Tagged ‘ethics’

Embedded Software is the Future of Product Quality and Safety

Monday, February 8th, 2010 Michael Barr

Last year a friend had a St. Jude pacemaker attached to his heart. When he reported an unexpected low battery reading (displayed on an associated digital watch) to his doctor a month later, he learned this was the result of a firmware bug known to the manufacturer. The battery was fine and would last on the order of a decade more. His new-model pacemaker’s firmware didn’t include a bug fix that was provided the year before to wearers of old-model.

Another friend owns a Land Rover LR2 SUV with back-up sensors. When the car is in reverse and nearing an obstacle or another car, the driver is alerted via a beeping sound. Except that the back-up sensors don’t always work. Some “reboots” of the SUV don’t seem to have this feature enabled. He suspects there is a “race condition” during the software startup sequence.

Yet another friend has driven a Toyota Prius hybrid over 100,000 miles. He reports that the brakes very occasionally have an odd/different feel. But his older model Prius is not expected to be subject to the 2010 model year recall.

These are just a few of the personal anecdotes behind the headlines. Embedded software is everywhere now, with over 4 billion new devices manufactured each year. Increasingly the quality and safety of products is a side-effect of the quality and safety of the software embedded inside.

One important question is, can we trust future software updates any more than we can trust the existing firmware? How do we know that the Toyota Prius hybrids with upgraded braking firmware will be safer than those with the factory firmware?

Embedded Programmers Worldwide Earn Failing Grades in C and C++

Tuesday, November 24th, 2009 Michael Barr

In industry surveys, over 80% of embedded software developers report using C or C++ as their primary programming language. Yet as a group, these programmers earned a failing grade on a multiple-choice quiz testing firmware-related C programming skills. A scary result, considering that embedded software inside medical devices, industrial controls, anti-lock brakes, and cockpits place human lives at risk every day.

In a February 2008 blog post, I examined the first few hundred results from the “Embedded C Quiz” on the Netrino website. That analysis compared the performance of programmers in the U.S. and India with the rest of the world (the only three data sets large enough for meaningful analysis). I concluded that the average embedded programmers in the U.S. and India don’t know C very well, but do know it better than programmers in the rest of the world.

Two years now since launching the quiz, we have collected thousands of data points, so it’s time for an update on programmer performance. In total, 3,870 programmers have taken the short 10-question multiple-choice C skills test. A few (a bit less than 3%) didn’t answer all of the questions; the analysis below is based on just the 3,755 completed quizzes. (Note that each website user can only take the quiz once.)

Across all countries, the mean result was 60.8%–a grade of ‘D-’ at best. That is to say that the average embedded programmer answered just 6 out of 10 multiple-choice questions correctly. A rather scary fact, given that C is the language of choice for most embedded projects and that C++ is even harder to master.

Programmers in the United States scored slightly above average. But they still earned a failing grade of 61.8%. Programmers in India scored slightly below the worldwide average, at 58.9%. Together, programmers from these two large English-speaking countries accounted for the majority of all quiz takers.

The number of completed quizzes, mean scores, and standard deviations for all countries with more than 20 completed quizzes are shown in the table below, sorted by average score. In general, programmers from European countries scored best.

Country Completed Mean Std Dev
Poland 23 68.7 19.2
Sweden 26 67.7 15.8
Australia 45 67.3 22.3
Germany 57 67.2 17.2
France 35 66.9 24.0
United Kingdom 109 66.1 22.8
Spain 24 65.0 18.3
Canada 114 64.5 19.3
China 51 64.1 23.4
Israel 22 62.3 21.7
United States 1346 61.8 20.4
Egypt 28 59.3 22.8
India 1288 58.9 22.4
Romania 45 58.9 23.0
Singapore 24 58.3 20.1
Italy 44 56.4 20.8
Turkey 57 55.6 23.3
Brazil 47 55.1 24.1
Pakistan 25 44.0 21.7

How are your embedded C programming skills? Test them by taking the Embedded C Quiz yourself now at http://www.netrino.com/Embedded-Systems/Embedded-C-Quiz?

P.S.  We recently launched an Embedded C++ Quiz and the results so far look downright abysmal.  I’ll write something about that in a future post.  Do you have a few minutes to take that one too?

Breathalyzer Source Code Analysis

Thursday, November 5th, 2009 Michael Barr

Firmware bugs seem to be everywhere these days. So much so that firmware source code analysis is even entering the courtroom in criminal cases involving data collection devices with software inside. Consider the precedent-setting case of the Alcotest 7110. After a two-year legal fight, seven defendants in New Jersey DUI cases successfully won the right to have their experts review the source code for the Alcotest firmware.

The state and the defendants both ultimately produced expert reports evaluating the quality of the firmware source code. Though each side’s experts reached divergent opinions as to the overall code quality, several facts seem to have emerged as a result of the analysis:

- Of the available 12-bits of A/D precision, just 4-bits (most-significant) are used in the actual calculation. This sorts each raw blood-alcohol reading into one of 16 buckets. (I wonder how they biased the rounding on that.)
- Out of range A/D readings are forced to the high or low limit. This must happen with at least 32 consecutive readings before any flags are raised.
- There is no feedback mechanism for the software to ensure that actuated devices, such as an air pump and infrared sensor, are actually on or off when they are supposed to be.
- The software first averages the initial two readings. Then it averages the third reading with that average. Then the fourth reading is averaged in, etc. No comments or documentation explains the use of this formula, which causes the final reading to have a weight of 0.5 in the final value and the one before that to have a weight of 0.25, etc.
- Out of range averages are forced to the high or low limit too.
- Static analysis with lint produced over 19,000 warnings about the code (that’s about three errors for ever five lines of source code).

What would you infer about the reliability of a defendant’s blood-alcohol level if you were on that jury? If you’re so inclined you can read the full expert reports for yourself: at defendants and state.

This Code Stinks! The Worst Embedded Code Ever

Thursday, November 5th, 2009 Michael Barr

At the Embedded Systems Conference Boston in September, I gave a popular ESC Theater talk titled “This Code Stinks! The Worst Embedded Code Ever” that used lousy code from real products as a teaching tool. The example code was gathered by a number of engineers from a broad swath of companies over several years. (Minor details, including variable names and function names, were changed as needed to hide the specifics of applications, companies, or programmers.)

Here’s just one example of the bad code in that presentation:

y = (x + 305) / 146097 * 400 + (x + 305) % 146097 / 36524 * 100 + (x + 305) % 146097 % 36524 / 1461 * 4 + (x + 305) % 146097 % 36524 % 1461 / 365;

I don’t know if the above snippet contains any bugs, as most of the other examples were found to. And that’s a problem. Where are we supposed to begin an analysis of the above? What is this code supposed to do when it works? What range of input values is appropriate to test? What are the correct output values for a given input? Is this code responsible for handling out of range inputs gracefully? In the original listing, there were no comments on or around this line to help.

I eventually learned that this code computes the year, with accounting for extra days in leap years, given the number of days since a known reference date (e.g., January 1, 1970). But I note that we still don’t know if it works in all cases; despite it being present in an FDA-regulated medical device. I note too that the Microsoft Zune Bug was buried in a much better formatted snippet of code that performed a very similar calculation.

Here’s another example, this time in C++, with the bug finding left as an exercise for the reader:

bool Probe::getParam(uint32_t param_id, int32_t idx)
{
int32_t val = 0;
int32_t ret = 0;

ret = m_pParam->readParam(param_id, idx, &val);

if (!ret)
{
logMsg(“attempt to read parameter failed\n”);
exit(1);
}
else …

Hint: This code was embedded in a piece of factory automation equipment.

I’ve placed the full set of slides online at http://bit.ly/badcode.

Firmware Disasters

Tuesday, June 23rd, 2009 Michael Barr

First, an Airbus A330 fell out of the sky. Then two D.C. Metro trains collided. Several hundred people have been killed and injured in these disastrous system failures. Did bugs in embedded software play a role in either or both disasters?

An incident on an earlier (October 2006) Airbus A330 flight may offer clues to the crash of Air France 447:

Qantas Flight 72 had been airborne for three hours, flying uneventfully on autopilot from Singapore to Perth, Australia. But as the in-flight dinner service wrapped up, the aircraft’s flight-control computer went crazy. The plane abruptly entered a smooth 650-ft. dive (which the crew sensed was not being caused by turbulence) that sent dozens of people smashing into the airplane’s luggage bins and ceiling. More than 100 of the 300 people on board were hurt, with broken bones, neck and spinal injuries, and severe lacerations splattering blood throughout the cabin. (Article, Time Magazine, June 3, 2009)

Authorities have blamed a pair of simultaneous computer failures for that event in the fly-by-wire A330. First, one of three redundant air data inertial reference units (ADIRUs) began giving bad angle of attack (AOA) data. Simultaneously, a voting algorithm intended to handle precisely such a failure in 1 of the 3 units by relying only on the other matching data failed to work as designed; the flight computer instead made decisions only on the basis of the one failed ADIRU!

(A later analysis by Airbus “found data fingerprints suggesting similar ADIRU problems had occurred on a total of four flights. One of the earlier instances, in fact, included a September 2006 event on the same [equipment] that entered the uncommanded dive in October [2006].” Ibid.)

Much of the attention in the publicly disclosed details of the Air France 447 crash has focused on the failure of one of several air speed indicators. Were there three of those as well? If so, was the same flight computer to blame for failing to recognize which to trust and which was unreliable?

It is very early in the investigation of yesterday’s collision between two D.C. Metro red line trains, in which a stopped train was rear-ended and heavily damaged by a moving train on the same track, to place blame. But a WashingtonPost.com article headlined “Collision was Supposed to be Impossible” says it all:

Metro was designed with a fail-safe computerized signal system that is supposed to prevent trains from colliding.

and

During morning and afternoon rush hours, all trains except longer eight-car trains typically operate in automatic mode, meaning their movements are controlled by computerized systems and the central Operations Control Center. Both trains in yesterday’s crash [about 5pm] were six-car trains. (Article, Washington Post, June 23, 2009)

Are bugs in embedded software to blame for these two disasters? You can bet the lawyers are already looking into it.