Posts Tagged ‘programming’

Real Men [Still] Program in C

Wednesday, March 29th, 2017 Michael Barr

It’s hard for me to believe, but it’s been nearly 8 years since I wrote the popular “Real Men Program in C” blog post (turned article). That post was prompted by a conversation with a couple of younger programmers who told me: “C is too hard for programmers of our generation to bother mastering.”

I ended then:

If you accept [] that C shall remain important for the foreseeable future and that embedded software is of ever-increasing importance, then you’ll begin to see trouble brewing. Although they are smart and talented computer scientists, [younger engineers] don’t know how to competently program in C. And they don’t care to learn.

But someone must write the world’s ever-increasing quantity of embedded software. New languages could help, but will never be retrofitted onto all the decades-old CPU architectures we’ll continue to use for decades to come. As turnover is inevitable, our field needs to attract a younger generation of C programmers.

What is the solution? What will happen if these trends continue to diverge?

Now that a substantial period of years has elapsed, I’d like to revisit two key phrases from that quote: Is C still important? and Is there a younger generation of C programmers? There’s no obvious sign of any popular “new language” nor of any diminution of embedded systems.

Is C Still Important?

The original post used survey data from 1997-2009 to establish that C was (through that entire era) the dominant programming language for embedded systems. The “primary” programming languages used in the final year were C (62%), C++ (24%), and Assembly (5%).

As the figure below shows (data from Barr Group‘s 2017 Embedded Systems Safety & Security Survey), C has now consolidated its dominance as the lingua franca of embedded programmers: now at 71%. Use of C++ remains at about the same level (22%) while use of assembly as the primary language has basically disappeared.

Primary Programming Language

Conclusion: Obviously, C is still important in embedded systems.

Is There a Younger Generation of C Programmers?

The next figure shows the years of paid, professional experience of embedded system designers (data from the same source). Unfortunately, I don’t have data from that older time period about the average ages of embedded programmers. But what looks potentially telling about this is that the average years of experience of American designers (two decades) is much higher than the averages in Europe (14 years) and Asia (11). I dug into the data on the U.S. engineers a bit and found that the experience curve was essentially flat, with no bigger younger group like in the worldwide data.

Years of Experience

Conclusion: The jury is still out. It’s possible there is already a missing younger generation in the U.S., but there also seems to be some youth coming up into our field in Asia at least.

It should be really interesting to see how this all plays out in the next 8 years. I’m putting a tickler in my to-do list to blog about this topic again then!

Footnote: Same as last time, I’m not excluding women. There are plenty of great embedded systems designers who are women–and they mostly program in C too, I presume.

Lethal Software Defects: Patriot Missile Failure

Thursday, March 13th, 2014 Michael Barr

During the Gulf War, twenty-eight U.S. soldiers were killed and almost one hundred others were wounded when a nearby Patriot missile defense system failed to properly track a Scud missile launched from Iraq. The cause of the failure was later found to be a programming error in the computer embedded in the Patriot’s weapons control system.

On February 25, 1991, Iraq successfully launched a Scud missile that hit a U.S. Army barracks near Dhahran, Saudi Arabia. The 28 deaths by that one Scud constituted the single deadliest incident of the war, for American soldiers. Interestingly, the “Dhahran Scud”, which killed more people than all 70 or so of the earlier Scud launches, was apparently the last Scud fired in the Gulf War.

Unfortunately, the “Dhahran Scud” succeeded where the other Scuds failed because of a defect in the software embedded in the Patriot missile defense system. This same bug was latent in all of the Patriots deployed in the region. However, the presence of the bug was masked by the fact that a particular Patriot weapons control computer had to be continuously running for several days before the bug could cause the hazard of a failure to track a Scud.

There is a nice concise write-up of the problem, with a prefatory background on how the Patriot system is designed to work, in the official post-failure analysis report by the U.S. General Accounting Office (GAO IMTEC-92-26) entitled “Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia“.

The hindsight explanation is that:

a software problem “led to an inaccurate tracking calculation that became worse the longer the system operated” and states that “at the time of the incident, the [Patriot] had been operating continuously for over 100 hours” by which time “the inaccuracy was serious enough to cause the system to look in the wrong place [in the radar data] for the incoming Scud.”

Detailed Analysis

The GAO report does not go into the technical details of the specific programming error. However, I believe we can infer the following based on the information and data that is provided about the incident and about the defect.

A first important observation is that the CPU was a 24-bit integer-only CPU “based on a 1970s design”. Befitting the time, the code was written in assembly language.

A second important observation is that real numbers (i.e., those with fractions) were apparently manipulated as a whole number in binary in one 24-bit register plus a binary fraction in a second 24-bit register. In this fixed-point numerical system, the real number 3.25 would be represented as binary 000000000000000000000011:010000000000000000000000, in which the : is my marker for the separator between the whole and fractional portions of the real number. The first half of that binary represents the whole number 3 (i.e., bits are set for 2 and 1, the sum of which is 3). The second portion represents the fraction 0.25 (i.e., 0/2 + 1/4 + 0/8 + …).

A third important observation is that system [up]time was “kept continuously by the system’s internal clock in tenths of seconds [] expressed as an integer.” This is important because the fraction 1/10 cannot be perfectly represented in 24-bits of binary fraction because its binary expansion, as a series of 1 or 0 over 2^n bits, does not terminate.

I understand that the missile-interception algorithm that did not work that day is approximately as follows:

  1. Consider each object that might be a Scud missile in the 3-D radar sweep data.
  2. For each, calculate an expected next location at the known speed of a Scud (+/- an acceptable window).
  3. Check the radar sweep data again at a future time to see if the object is in the location a Scud would be.
  4. If it is a Scud, engage and fire missiles.

Furthermore, the GAO reports that the problem was an accumulating linear error of .003433 seconds per 1 hour of uptime that affected every deployed Patriot equally. This was not a clock-specific or system-specific issue.

Given all of the above, I reason that the problem was that one part of the Scud-interception calculations utilized time in its decimal representation and another used the fixed-point binary representation. When the uptime was still low, targets were found in the expected locations when they were supposed to be and the latent software bug was hidden.

Of course, all of the above detail is specific to the Patriot hardware and software design that was in use at the time of the Gulf War. As the Patriot system has since been modernized by Raytheon, many details like these will have likely changed.

According to the GAO report:

Army officials [] believed the Israeli experience was atypical [and that] other Patriot users were not running their systems for 8 or more hours at a time. However, after analyzing the Israeli data and confirming some loss in targeting accuracy, the officials made a software change which compensated for the inaccurate time calculation. This change allowed for extended run times and was included in the modified software version that was released [9 days before the Dhahran Scud incident]. However, Army officials did not use the Israeli data to determine how long the Patriot could operate before the inaccurate time calculation would render the system ineffective.

Four days before the deadly Scud attack, the “Patriot Project Office [in Huntsville, Alabama] sent a message to Patriot users stating that very long run times could cause [targeting problems].” That was about the time of the last reboot of the Patriot missile that failed.

Note that if time samples were all in the decimal timebase or all in the binary timebase then the two compared radar samples would always be close in time and the error would not accumulate with uptime. And that is the likely fix that was implemented.

Firmware Updates

Here are a few tangentially interesting tidbits from the GAO report:

  • “During the [Gulf War] the Patriot’s software was modified six times.”
  • “Patriots had to be shut down for at least 1 to 2 hours to install each software modification.”
  • “Rebooting[] takes about 60 to 90 seconds” and sets the “time back to zero.”
  • The “[updated] software, which compensated for the inaccurate time calculation, arrived in Dhahran” the day after the deadly attack.

Public Statements

In hindsight, there are some noteworthy quotes from the 1991 news articles initially reporting on this incident. For example,

Brig. Gen. Neal, United States Command (2 days after):

The Scud apparently fragmented above the atmosphere, then tumbled downward. Its warhead blasted an eight-foot-wide crater into the center of the building, which is three miles from a major United States air base … Our investigation looks like this missile broke apart in flight. On this particular missile it wasn’t in the parameters of where it could be attacked.

U.S. Army Col. Garnett, Patriot Program Director (4 months after):

The incident was an anomaly that never showed up in thousands of hours of testing and involved an unforeseen combination of dozens of variables — including the Scud’s speed, altitude and trajectory.

Importantly, the GAO report states that, a few weeks before the Dharan Scud, Israeli soldiers reported to the U.S. Army that their Patriot had a noticeable “loss in accuracy after … 8 consecutive hours.” Thus, apparently, all of this “thousands of hours” of testing involved frequent reboots. (I can envision the test documentation now: “Step 1: Power up the Patriot. Step 2: Check that everything is perfect. Step 3: Fire the dummy target.”) The GAO reported that “an endurance test has [since] been conducted to ensure that extended run times do not cause other system difficulties.”

Note too that the quote about “thousands of hours of testing” was also misleading in that the Patriot software was, also according to the GAO report, hurriedly modified in the months leading up to the Gulf War to track Scud missiles going about 2.5 times faster than the aircraft and cruise missiles it was originally designed to intercept. Improvements to the Scud-specific tracking/engagement algorithms were apparently even being made during the Gulf War.

These specific theories and statements about went wrong or why it must have been a problem outside the Patriot itself were fully discredited once the source code was examined. When computer systems may have misbehaved in a lethal manner, it is important to remember that newspaper quotes from those on the side of the designers are not scientific evidence. Indeed, the humans who offer those quotes often have conscious and/or subconscious motives and blind spots that favor them to be falsely overconfident in the computer systems. A thorough source code review takes time but is the scientific way to go about finding the root cause.

As a New York Times editorial dated 4 months after the incident explained:

The Pentagon initially explained that Patriot batteries had withheld their fire in the belief that Dhahran’s deadly Scud had broken up in midflight. Only now does the truth about the tragedy begin to emerge: A computer software glitch shut down the Patriot’s radar system, blinding Dhahran’s anti-missile batteries. It’s not clear why, even after Army investigators had reached this conclusion, the Pentagon perpetuated its fiction

At least in this case, it was only a few months before the U.S. Army admitted the truth about what happened to themselves and to the public. That is to the U.S. Army’s credit. Other actors in other lethal software defect cases have been far more stubborn to admit what has later become clear about their systems.

Apple’s #gotofail SSL Security Bug was Easily Preventable

Monday, March 3rd, 2014 Michael Barr

If programmers at Apple had simply followed a couple of the rules in the Embedded C Coding Standard, they could have prevented the very serious `Gotofail` SSL bug from entering the iOS and OS X operating systems. Here’s a look at the programming mistakes involved and the easy-to-follow coding standard rules that could have easily prevent the bug.

In case you haven’t been following the computer security news, Apple last week posted security updates for users of devices running iOS 6, iOS 7, and OS X 10.9 (Mavericks). This was prompted by a critical bug in Apple’s implementation of the SSL/TLS protocol, which has apparently been lurking for over a year.

In a nutshell, the bug is that a bunch of important C source code lines containing digital signature certificate checks were never being run because an extraneous goto fail; statement in a portion of the code was always forcing a jump. This is a bug that put millions of people around the world at risk for man-in-the-middle attacks on their apparently-secure encrypted connections. Moreover, Apple should be embarrassed that this particular bug also represents a clear failure of software process at Apple.

There is debate about whether this may have been a clever insider-enabled security attack against all of Apple’s users, e.g., by a certain government agency. However, whether it was an innocent mistake or an attack designed to look like an innocent mistake, Apple could have and should have prevented this error by writing the relevant portion of code in a simple manner that would have always been more reliable as well as more secure. And thus, in my opinion, Apple was clearly negligent.

Here are the lines of code at issue (from Apple’s open source code server), with the extraneous goto in bold:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, ...)
{
    OSStatus  err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

The code above violates at least two rules from Barr Group‘s Embedded C Coding Standard book. Importantly, had Apple followed at least the first of these rules, in particular, this dangerous bug should almost certainly have been prevented from ever getting into even a single device.

Rule 1.3.a

Braces shall always surround the blocks of code (a.k.a., compound statements), following if, else, switch, while, do, and for statements; single statements and empty statements following these keywords shall also always be surrounded by braces.

Had Apple not violated this always-braces rule in the SSL/TLS code above, there would have been either just one set of curly braces after each if test or a very odd looking hard-to-miss chunk of code with two sets of curly braces after the if with two gotos. Either way, this bug was preventable by following this rule and performing code review.

Rule 1.7.c

The goto keyword shall not be used.

Had Apple not violated this never-goto rule in the SSL/TLS code above, there would not have been a double goto fail; line to create the unreachable code situation. Certainly if that forced each of the goto lines to be replaced with more than one line of code, it would have forced programmers to use curly braces.

On a final note, Apple should be asking its engineers and engineering managers about the failures of process (at several layers) that must have occurred for this bug to have gone into end user’s devices. Specifically:

  • Where was the peer code review that should have spotted this, or how did the reviewers fail to spot this?
  • Why wasn’t a coding standard rule adopted to make such bugs easier to spot during peer code reviews?
  • Why wasn’t a static analysis tool, such as Klocwork, used, or how did it fail to detect the unreachable code that followed? Or was it users of such a tool, at Apple, who failed to act?
  • Where was the regression test case for a bad SSL certificate signature, or how did that test fail?

Dangerous bugs, like this one from Apple, often result from a combination of accumulated errors in the face of flawed software development processes. Too few programmers recognize that many bugs can be kept entirely out of a system simply by adopting (and rigorously enforcing) a coding standard that is designed to keep bugs out.

An Update on Toyota and Unintended Acceleration

Saturday, October 26th, 2013 Michael Barr

In early 2011, I wrote a couple of blog posts (here and here) as well as a later article (here) describing my initial thoughts on skimming NASA’s official report on its analysis of Toyota’s electronic throttle control system. Half a year later, I was contacted and retained by attorneys for numerous parties involved in suing Toyota for personal injuries and economic losses stemming from incidents of unintended acceleration. As a result, I got to look at Toyota’s engine source code directly and judge for myself.

From January 2012, I’ve led a team of seven experienced engineers, including three others from Barr Group, in reviewing Toyota’s electronic throttle and some other source code as well as related documents, in a secure room near my home in Maryland. This work proceeded in two rounds, with a first round of expert reports and depositions issued in July 2012 that led to a billion-dollar economic loss settlement as well as an undisclosed settlement of the first personal injury case set for trial in U.S. Federal Court. The second round began with an over 750 page formal written expert report by me in April 2013 and culminated this week in an Oklahoma jury’s decision that the multiple defects in Toyota’s engine software directly caused a September 2007 single vehicle crash that injured the driver and killed her passenger.

It is significant that this was the first and only jury so far to hear any opinions about Toyota’s software defects. Earlier cases either predated our source code access, applied a non-software theory, or was settled by Toyota for an undisclosed sum.

In our analysis of Toyota’s source code, we built upon the prior analysis by NASA. First, we looked more closely at more lines of the source code for more vehicles for more man months. And we also did a lot of things that NASA didn’t have time to do, including reviewing Toyota’s operating system’s internals, reviewing the source code for Toyota’s “monitor CPU”, performing an independent worst-case stack depth analysis, running portions of the main CPU software including the RTOS in a processor simulator, and demonstrating–in 2005 and 2008 Toyota Camry vehicles–a link between loss of throttle control and the numerous defects we found in the software.

In a nutshell, the team led by Barr Group found what the NASA team sought but couldn’t find: “a systematic software malfunction in the Main CPU that opens the throttle without operator action and continues to properly control fuel injection and ignition” that is not reliably detected by any fail-safe. To be clear, NASA never concluded software wasn’t at least one of the causes of Toyota’s high complaint rate for unintended acceleration; they just said they weren’t able to find the specific software defect(s) that caused unintended acceleration. We did.

Now it’s your turn to judge for yourself. Though I don’t think you can find my expert report outside the Court system, here are links to the trial transcript of my expert testimony to the Oklahoma jury and a (redacted) copy of the slides I shared with the jury in Bookout, et.al. v. Toyota.

Note that the jury in Oklahoma found that Toyota owed each victim $1.5 million in compensatory damages and also found that Toyota acted with “reckless disregard”. The latter legal standard meant the jury was headed toward deliberations on additional punitive damages when Toyota called the plaintiffs to settle (for yet another undisclosed amount). It has been reported that an additional 400+ personal injury cases are still working their way through various courts.

Related Stories

Updates

On December 13, 2013, Toyota settled the case that was set for the next trial, in West Virginia in January 2014, and announced an “intensive” settlement process to try to resolve approximately 300 of the remaining personal injury case, which are consolidated in U.S. and California courts.

Toyota continues to publicly deny there is a problem and seems to have no plans to address the unsafe design and inadequate fail safes in its drive-by-wire vehicles–the electronics and software design of which is similar in most of the Toyota and Lexus (and possibly Scion) vehicles manufactured over at least about the last ten model years. Meanwhile, incidents of unintended acceleration continue to be reported in these vehicles (see also the NHTSA complaint database) and these new incidents, when injuries are severe, continue to result in new personal injury lawsuits against Toyota.

In March 2014, the U.S. Department of Justice announced a $1.2 billion settlement in a criminal case against Toyota. As part of that settlement, Toyota admitted to past lying to NHTSA, Congress, and the public about unintended acceleration and also to putting its brand before public safety. Yet Toyota still has made no safety recalls for the defective engine software.

On April 1, 2014, I gave a keynote speech at the EE Live conference, which touched on the Toyota litigation in the context of lethal embedded software failures of the past and the coming era of self-driving vehicles. The slides from that presentation are available for download at http://www.barrgroup.com/killer-apps/.

On September 18, 2014, Professor Phil Koopman, of Carnegie Mellon University, presented a talk about his public findings in these Toyota cases entitled “A Case Study of Toyota Unintended Acceleration and Software Safety“.

On October 30, 2014, Italian computer scientist Roberto Bagnara presented a talk entitled “On the Toyota UA Case
and the Redefinition of Product Liability for Embedded Software
” at the 12th Workshop on Automotive Software & Systems, in Milan.

Intellectual Property Protections for Embedded Software: A Primer

Tuesday, June 11th, 2013 Michael Barr

My experiences as a testifying expert witness in numerous lawsuits involving software and source code have taught me a thing or two about the various intellectual property protections that are available to the creators of software. These are areas of the law that you, as an embedded software engineer, should probably know at least a little about. Hence, this primer.

Broadly speaking, software is protectable under three areas of intellectual property law: patent law, copyright law, and trade secret law. Each of these areas of the law protects your software in a different way and you may choose to rely on none, some, or all three such protections. (The name of your product may also be protectable by trademark law, though that has nothing specifically to do with software.)

Embedded Software and Patent Law

Patent law can be used to protect one or more innovative IDEAS that your product uses to get the job done. If you successfully patent a mathematical algorithm specific to your product domain (e.g., an algorithm for detecting or handling a specific arrhythmia used in your pacemaker) then you own a (time-limited) monopoly on that idea. If you believe another company is using the same algorithm in their product then you have the right to bring an infringement suit (e.g., in the ITC or U.S. District Court).

In the process of such a suit, the competitor’s schematics, source code, and design documents will generally be made available to independent expert witnesses (i.e., not to you directly). The expert(s) will then spend time reviewing the competitor’s source code to determine if one or more of the claims of the asserted patent(s) is infringed. It is a useful analogy to think of the claims of a patent as a description of the boundaries of real property and of infringement of the patent as trespassing.

Patents protect ideas regardless of how they are expressed. For example, you may have heard about (purely) “software patents” being new and somewhat controversial. However, the patents that protect most embedded systems typically cover a combination of at least electronics and software. Patent protection is typically broad enough to cover purely hardware, purely software, as well as hardware-software. Thus the protection can span a range of hardware vs. software decompositions and provides protection within software even when the programming languages and/or function and variable names differ.

To apply for a patent on your work you must file certain paperwork with and pay registration fees to the U.S. Patent and Trademark Office. This process generally begins with a prior art search conducted by an attorney and takes at least several years to complete. You should expect the total cost (not including your own time), per patent, to be measured in the tens of thousands of dollars.

Embedded Software and Copyright Law

Copyright law can be used to protect one or more creative EXPRESSIONS that the authors of the source code employed to get the job done. Unlike patent law, copyright law cannot be used to protect ideas or algorithms. Rather, copyright can only protect the way that you specifically creatively choose to implement those ideas. Indeed if there is only one or a handful of ways to implement a particular algorithm, or only one way to do so efficiently or in your chosen language, you may not be able to protect that aspect of your software with copyright.

The attorneys in a source code copyright infringement lawsuit wind up arguing over two primary issues. First, they argue which individual parts of the source code (e.g., function prototypes in an API) are protectable because they are sufficiently creative. The judge generally decides this issue, based on expert analysis. Second, they argue how the selection and arrangement of these individually protectable “islands” together shows a pattern of “substantial similarity”. The jury decides that.

Source code copyright infringement is easiest to prove when the two programs have source code that looks similar in some important way. That is, when the programming languages are the same and the function and variable names are similar. However, it is rare that the programs are identical in every detail. Thus, due to the possibility of the accused software developers independently creating something similar by coincidence rather than malfeasance, the legal standard for proving copyright infringement is much higher when it cannot be shown that the defendants had “access” to some version of the source code.

Unlike patents, copyrights do not need to be awarded. You, or your employer, own a copyright in your work merely by creating it. (Whether you write “Copyright (c) 2013 by MyCompany, Inc.” at the top of every source code file or not.) However, there are some advantages to registering your copyright (by submitting a sample) in a work of software with the U.S. Copyright Office before any alleged infringement occurs. Even if you outsource it to an attorney, the cost of registering a copyright should only be about a thousand dollars at most.

As source code frequently changes and new versions will inevitably be released, you should be reassured that a single copyright extends to “derivative works”, which generally includes later versions of the software. You don’t have to keep registering every minor release with the Copyright Office. And, very importantly, the binary executable version of your software (e.g., the contents of Flash or a library of object code) is extended copyright protection as a derivative work of the source code. Thus someone who copies your binary can also be found to have infringed your copyright.

Interestingly, both patent law and copyright law are called for in the U.S. Constitution. However, of course, the extension of these areas of law to software is a modern development.

Embedded Software and Trade Secret Law

Unlike patent and copyright law, which each at best protects only a portion (“islands”) of your source code, trade secret law can be used to protect the entirety of the SECRETS within the source code. Secrets need not be innovative ideas nor creative expressions. The key requirement for this area of law to apply is that you take reasonable steps to keep the source code “secret”. So, for example, though open source software may be protectable by patent law and copyright law it cannot be protected by trade secret law due to the lack of secrecy.

You may think that there is a fundamental conflict between registering the copyright in your software, which requires submitting a copy to the government, and keeping your source code secret. However, the U.S. Copyright Office only requires that a small portion of the source code of your program be filed to successfully identify the copyrighted software and its owner; the vast majority of the source code need not be submitted.

Preserving this secrecy is one of the reasons for the inconveniences software developers often encounter at the companies that employ them (e.g., not being able to take source code home). (And certain terms of their employment agreements.) Protecting software like the secret formula for Coca-Cola or Krabby Patties helps an owner prove that the source code is a trade secret and thus opens the door to this additional legal basis for bringing a lawsuit against a competitor. Trade secrets cases I have been involved with as an expert have involved allegations that one or more insiders left a company and subsequently misappropriated it’s software secrets to compete via a startup or existing competitor.

Final Thoughts

In my work as an expert, I always look to the attorneys for more precise definitions of legal terms. Importantly, there are many terms and concepts I have purposefully avoided using here to keep this at an introductory level of detail. You should, of course, always consult with an attorney about your specific situation. You should never simply rely on what you read on the Internet. Hopefully, there is enough information in this primer to help you at least understand the types of protections potentially available to you and to find a lawyer who specializes in the right field.