Posts Tagged ‘opensource’

Apple’s #gotofail SSL Security Bug was Easily Preventable

Monday, March 3rd, 2014 Michael Barr

If programmers at Apple had simply followed a couple of the rules in the Embedded C Coding Standard, they could have prevented the very serious `Gotofail` SSL bug from entering the iOS and OS X operating systems. Here’s a look at the programming mistakes involved and the easy-to-follow coding standard rules that could have easily prevent the bug.

In case you haven’t been following the computer security news, Apple last week posted security updates for users of devices running iOS 6, iOS 7, and OS X 10.9 (Mavericks). This was prompted by a critical bug in Apple’s implementation of the SSL/TLS protocol, which has apparently been lurking for over a year.

In a nutshell, the bug is that a bunch of important C source code lines containing digital signature certificate checks were never being run because an extraneous goto fail; statement in a portion of the code was always forcing a jump. This is a bug that put millions of people around the world at risk for man-in-the-middle attacks on their apparently-secure encrypted connections. Moreover, Apple should be embarrassed that this particular bug also represents a clear failure of software process at Apple.

There is debate about whether this may have been a clever insider-enabled security attack against all of Apple’s users, e.g., by a certain government agency. However, whether it was an innocent mistake or an attack designed to look like an innocent mistake, Apple could have and should have prevented this error by writing the relevant portion of code in a simple manner that would have always been more reliable as well as more secure. And thus, in my opinion, Apple was clearly negligent.

Here are the lines of code at issue (from Apple’s open source code server), with the extraneous goto in bold:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, ...)
{
    OSStatus  err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

The code above violates at least two rules from Barr Group‘s Embedded C Coding Standard book. Importantly, had Apple followed at least the first of these rules, in particular, this dangerous bug should almost certainly have been prevented from ever getting into even a single device.

Rule 1.3.a

Braces shall always surround the blocks of code (a.k.a., compound statements), following if, else, switch, while, do, and for statements; single statements and empty statements following these keywords shall also always be surrounded by braces.

Had Apple not violated this always-braces rule in the SSL/TLS code above, there would have been either just one set of curly braces after each if test or a very odd looking hard-to-miss chunk of code with two sets of curly braces after the if with two gotos. Either way, this bug was preventable by following this rule and performing code review.

Rule 1.7.c

The goto keyword shall not be used.

Had Apple not violated this never-goto rule in the SSL/TLS code above, there would not have been a double goto fail; line to create the unreachable code situation. Certainly if that forced each of the goto lines to be replaced with more than one line of code, it would have forced programmers to use curly braces.

On a final note, Apple should be asking its engineers and engineering managers about the failures of process (at several layers) that must have occurred for this bug to have gone into end user’s devices. Specifically:

  • Where was the peer code review that should have spotted this, or how did the reviewers fail to spot this?
  • Why wasn’t a coding standard rule adopted to make such bugs easier to spot during peer code reviews?
  • Why wasn’t a static analysis tool, such as Klocwork, used, or how did it fail to detect the unreachable code that followed? Or was it users of such a tool, at Apple, who failed to act?
  • Where was the regression test case for a bad SSL certificate signature, or how did that test fail?

Dangerous bugs, like this one from Apple, often result from a combination of accumulated errors in the face of flawed software development processes. Too few programmers recognize that many bugs can be kept entirely out of a system simply by adopting (and rigorously enforcing) a coding standard that is designed to keep bugs out.

Intellectual Property Protections for Embedded Software: A Primer

Tuesday, June 11th, 2013 Michael Barr

My experiences as a testifying expert witness in numerous lawsuits involving software and source code have taught me a thing or two about the various intellectual property protections that are available to the creators of software. These are areas of the law that you, as an embedded software engineer, should probably know at least a little about. Hence, this primer.

Broadly speaking, software is protectable under three areas of intellectual property law: patent law, copyright law, and trade secret law. Each of these areas of the law protects your software in a different way and you may choose to rely on none, some, or all three such protections. (The name of your product may also be protectable by trademark law, though that has nothing specifically to do with software.)

Embedded Software and Patent Law

Patent law can be used to protect one or more innovative IDEAS that your product uses to get the job done. If you successfully patent a mathematical algorithm specific to your product domain (e.g., an algorithm for detecting or handling a specific arrhythmia used in your pacemaker) then you own a (time-limited) monopoly on that idea. If you believe another company is using the same algorithm in their product then you have the right to bring an infringement suit (e.g., in the ITC or U.S. District Court).

In the process of such a suit, the competitor’s schematics, source code, and design documents will generally be made available to independent expert witnesses (i.e., not to you directly). The expert(s) will then spend time reviewing the competitor’s source code to determine if one or more of the claims of the asserted patent(s) is infringed. It is a useful analogy to think of the claims of a patent as a description of the boundaries of real property and of infringement of the patent as trespassing.

Patents protect ideas regardless of how they are expressed. For example, you may have heard about (purely) “software patents” being new and somewhat controversial. However, the patents that protect most embedded systems typically cover a combination of at least electronics and software. Patent protection is typically broad enough to cover purely hardware, purely software, as well as hardware-software. Thus the protection can span a range of hardware vs. software decompositions and provides protection within software even when the programming languages and/or function and variable names differ.

To apply for a patent on your work you must file certain paperwork with and pay registration fees to the U.S. Patent and Trademark Office. This process generally begins with a prior art search conducted by an attorney and takes at least several years to complete. You should expect the total cost (not including your own time), per patent, to be measured in the tens of thousands of dollars.

Embedded Software and Copyright Law

Copyright law can be used to protect one or more creative EXPRESSIONS that the authors of the source code employed to get the job done. Unlike patent law, copyright law cannot be used to protect ideas or algorithms. Rather, copyright can only protect the way that you specifically creatively choose to implement those ideas. Indeed if there is only one or a handful of ways to implement a particular algorithm, or only one way to do so efficiently or in your chosen language, you may not be able to protect that aspect of your software with copyright.

The attorneys in a source code copyright infringement lawsuit wind up arguing over two primary issues. First, they argue which individual parts of the source code (e.g., function prototypes in an API) are protectable because they are sufficiently creative. The judge generally decides this issue, based on expert analysis. Second, they argue how the selection and arrangement of these individually protectable “islands” together shows a pattern of “substantial similarity”. The jury decides that.

Source code copyright infringement is easiest to prove when the two programs have source code that looks similar in some important way. That is, when the programming languages are the same and the function and variable names are similar. However, it is rare that the programs are identical in every detail. Thus, due to the possibility of the accused software developers independently creating something similar by coincidence rather than malfeasance, the legal standard for proving copyright infringement is much higher when it cannot be shown that the defendants had “access” to some version of the source code.

Unlike patents, copyrights do not need to be awarded. You, or your employer, own a copyright in your work merely by creating it. (Whether you write “Copyright (c) 2013 by MyCompany, Inc.” at the top of every source code file or not.) However, there are some advantages to registering your copyright (by submitting a sample) in a work of software with the U.S. Copyright Office before any alleged infringement occurs. Even if you outsource it to an attorney, the cost of registering a copyright should only be about a thousand dollars at most.

As source code frequently changes and new versions will inevitably be released, you should be reassured that a single copyright extends to “derivative works”, which generally includes later versions of the software. You don’t have to keep registering every minor release with the Copyright Office. And, very importantly, the binary executable version of your software (e.g., the contents of Flash or a library of object code) is extended copyright protection as a derivative work of the source code. Thus someone who copies your binary can also be found to have infringed your copyright.

Interestingly, both patent law and copyright law are called for in the U.S. Constitution. However, of course, the extension of these areas of law to software is a modern development.

Embedded Software and Trade Secret Law

Unlike patent and copyright law, which each at best protects only a portion (“islands”) of your source code, trade secret law can be used to protect the entirety of the SECRETS within the source code. Secrets need not be innovative ideas nor creative expressions. The key requirement for this area of law to apply is that you take reasonable steps to keep the source code “secret”. So, for example, though open source software may be protectable by patent law and copyright law it cannot be protected by trade secret law due to the lack of secrecy.

You may think that there is a fundamental conflict between registering the copyright in your software, which requires submitting a copy to the government, and keeping your source code secret. However, the U.S. Copyright Office only requires that a small portion of the source code of your program be filed to successfully identify the copyrighted software and its owner; the vast majority of the source code need not be submitted.

Preserving this secrecy is one of the reasons for the inconveniences software developers often encounter at the companies that employ them (e.g., not being able to take source code home). (And certain terms of their employment agreements.) Protecting software like the secret formula for Coca-Cola or Krabby Patties helps an owner prove that the source code is a trade secret and thus opens the door to this additional legal basis for bringing a lawsuit against a competitor. Trade secrets cases I have been involved with as an expert have involved allegations that one or more insiders left a company and subsequently misappropriated it’s software secrets to compete via a startup or existing competitor.

Final Thoughts

In my work as an expert, I always look to the attorneys for more precise definitions of legal terms. Importantly, there are many terms and concepts I have purposefully avoided using here to keep this at an introductory level of detail. You should, of course, always consult with an attorney about your specific situation. You should never simply rely on what you read on the Internet. Hopefully, there is enough information in this primer to help you at least understand the types of protections potentially available to you and to find a lawyer who specializes in the right field.

Dead Code, the Law, and Unintended Consequences

Wednesday, February 6th, 2013 Michael Barr

Dead code is source code that is not executed in the final system. It comes in two forms. First, there is dead code that is commented out or removed via #ifdef’s. That dead code has no corresponding form in the binary. Other dead code is present in the binary but cannot be or is never invoked. Either way, dead code is a vestige or unnecessary part of the product.

One of the places I have seen a lot of dead code is in my work as an expert witness. And I’ve observed that the presence of dead code can have unintended legal consequences. In at least one case I was involved in it is likely that strings in certain dead code in the binary was a major cause of a lawsuit being brought against a maker of embedded system products. I have also observed several scenarios in which dead code (at least in part) heightened the probability of a loss in court.

One way that dead code can increases the probability of a loss in court is if the dead code implements part (or all) of a patented algorithm. When a patent infringement suit is brought against your company, one or more versions of your source code–when potentially relevant–must be produced to the other side’s legal team. The patent owner’s expert(s) will pore over this source code for many hours, seeking to identify portions of the code that implement each part of the algorithm. If one of those parts is implemented in dead code that becomes part of the binary the product may still infringe an asserted claim of the patent–even if it is never invoked. (I’m not a lawyer and not sure if dead code does legally infringe, but consider it at least possible that neither side’s expert will notice it is dead or that the judge or jury won’t be convinced by a dead code defense.)

Another potential consequence of dead code is that an expert assessing the quality of your source code (e.g., in a product liability suit involving injury or death) may use as one basis of her opinion of poor quality that the source code she examined is overly-complex and riddled with commented out code and/or preprocessing directives. As you know, source code that is hard to read is harder than it needs to be to maintain. And,I think most experts would agree, code that is hard to read and maintain is more likely to contain bugs. In such a scenario, your engineering team may come off as sloppy or incompetent to the jury, which is not exactly the first impression you want to make when your product is alleged to have caused injury or death. Note that overly-complex code also increases the cost of litigation–as both side’s experts will need to spend more time reviewing the source code to understand it fully.

In a source code copyright (or copyleft) suit the mere presence of another party’s source code may result be sufficient to prove infringement–even if it is isn’t actually built into the binary! Consider the risks of your code containing files or functions of open source software that, by their mere existence in your source code, attaches an open source license to all of your proprietary code.

Bottom line advice: If source code is dead remove it. If you think you might need to refer to that code again later, well that is what version control systems are for–make a searchable comment about what you’ve removed at such a checkin. Do this as soon as you are certain it won’t be in a release version of your firmware.

Trends in Embedded Software Design

Wednesday, April 18th, 2012 Michael Barr

In many ways, the story of my career as an embedded software developer is intertwined with the history of the magazine Embedded Systems Design. When it was launched in 1988, under the original title Embedded Systems Programming (ESP), I was finishing high school. Like the vast majority of people at that time, I had never heard the term “embedded system” or thought much about the computers hidden away inside other kinds of products. Six years later I was a degreed electrical engineer who, like many EEs by that time in the mid-90’s, had a job designing embedded software rather than hardware. Shortly thereafter I discovered the magazine on a colleague’s desk, and became a subscriber and devotee.

The Early Days

In the early 1990s, as now, the specialized knowledge needed to write reliable embedded software was mostly not taught in universities. The only class I’d had in programming was in FORTRAN; I’d taught myself to program in assembly and C through a pair of hands-on labs that were, in hindsight, my only formal education in writing embedded software. It was on the job and from the pages of the magazine, then, that I first learned the practical skills of writing device drivers, porting and using operating systems, meeting real-time deadlines, implementing finite state machines, the pros and cons of languages other than C and assembly, remote debugging and JTAG, and so much more.

In that era, my work as a firmware developer involved daily interactions with Intel hex files, device programmers, tubes of EPROMs with mangled pins, UV erasers, mere kilobytes of memory, 8- and 16-bit processors, in-circuit emulators, and ROM monitors. Databooks were actual books; collectively, they took up whole bookshelves. I wrote and compiled my firmware programs on an HP-UX workstation on my desk, but then had to go downstairs to a lab to burn the chips, insert them into the prototype board, and test and debug via an attached ICE. I remember that on one especially daunting project eight miles separated my compiler and device programmer from the only instance of the target hardware; a single red LED and a dusty oscilloscope were the extent of my debugging toolbox.

Like you I had the Internet at my desk in the mid-90s, but it did not yet provide much useful or relevant information to my work other than via certain FTP sites (does anyone else remember FTPing into sunsite.unc.edu? or Gopher?). The rest was mostly blinking headlines and dancing hamster; and Amazon was merely the world’s biggest river. There was not yet an Embedded.com or EETimes.com. To learn about software and hardware best practices, I pursued an MSEE and CS classes at night and traveled to the Embedded Systems Conferences.

At the time, I wasn’t aware of any books about embedded programming. And every book that I had found on C started with “Hello, World”, only went up in abstraction from there, and ended without ever once addressing peripheral control, interrupt service routines, interfacing to assembly language routines, and operating systems (real-time or other). For reasons I couldn’t explain years later when Jack Ganssle asked me, I had the gumption to think I could write that missing book for embedded C programmers, got a contract from O’Reilly, and did–ending, rather than starting, mine with “Hello, World” (via an RS-232 port).

In 1998, a series of at least three twists of fate spanning four years found me taking a seat next to an empty chair at the speaker’s lunch at an Embedded Systems Conference. The chair’s occupant turned out to be Lindsey Vereen, who was then well into his term as the second editor-in-chief of the magazine. In addition to the book, I’d written an article or two for ESP by that time and Lindsey had been impressed with my ability to explain technical nuances. When he told me that he was looking for someone to serve as a technical editor, I didn’t realize it was the first step towards my role in that position.

Future Trends

Becoming and then staying involved with the magazine, first as technical editor and later as editor-in-chief and contributing editor, has been a highlight of my professional life. I had been a huge fan of ESP and of its many great columnists and other contributors in its first decade. And now, looking back, I believe my work helped make it an even more valuable forum for the exchange of key design ideas, best practices, and industry learning in its second decade. And, though I understand the move away from print towards online publishing and advertising, I am nonetheless saddened to see the magazine come to an end.

Reflecting back on these days long past reminds me that a lot truly has changed about embedded software design. Assembly language is used far less frequently today; C and C++ much more. EPROMs with their device programmers and UV erasers have been supplanted by flash memory and bootloaders. Bus widths and memory sizes have increased dramatically. Expensive in-circuit emulators and ROM monitors have morphed into inexpensive JTAG debug ports. ROM-DOS has been replaced with whatever Microsoft is branding embedded Windows this year. And open-source Linux has done so well that it has limited the growth of the RTOS industry as a whole–and become a piece of technology we all want to master if only for our resumes.

So what does the future hold? What will the everyday experiences of embedded programmers be like in 2020, 2030, or 2040? I see three big trends that will affect us all over those timeframes, each of which has already begun to unfold.

Trend 1: Volumes Finally Shift to 32-bit CPUs

My first prediction is that inexpensive, low-power, highly-integrated microcontrollers–as best exemplified by today’s ARM Cortex-M family–will bring 32-bit CPUs into even the highest volume application domains. The volumes of 8- and 16-bit CPUs will finally decline as these parts become truly obsolete.

Though you may be programming for a 32-bit processor already, it’s still true that 8- and 16-bit processors still drive CPU chip sales volumes. I’m referring, of course, to microcontrollers such as those based on 8051, PIC, and other instruction set architectures dating back 30-40 years. These older architectures remain popular today only because certain low-margin, high-volume applications of embedded processing demand squeezing every penny out of BOM cost.

The limitations of 8- and 16-bit architectures impact the embedded programmers in a number of ways. First, there are the awkward memory limitations resulting from limited address bus widths–and the memory banks, segmenting techniques, and other workarounds to going beyond those limitations. Second, these CPUs are much better at decision making than mathematics–they lack the ability to manipulate large integers efficiently and have no floating-point capability. Finally, these older processors frequently lack modern development tools, are unable to run larger Internet-enabled operating systems, such as Linux, and don’t feature the security and reliabiltiy protections afforded by an MMU.

There will, of course, always be many applications that are extremely cost-conscious, so my prediction is not that they will disappear completely, but that the overall price (including BOM cost as well as power consumption) of 32-bit micro controllers, with their improved instruction set architectures and transistor geometries, will win on price. That will put the necessary amount of computing power into the hands of some designers and make our work easier for all of us. It also helps programmers accomplish more in less time.

Trend 2: Complexity Forces Programmers Beyond C

My second prediction is that the days of the C programming language’s dominance in embedded systems are numbered.

Don’t get me wrong, C is a language I know and love. But, as you may know firsthand, C is simply not up to the task of building systems requiring over a million lines of code. Nonetheless, the demanded complexity of embedded software has been driving our systems towards more than a million lines of code. At this level of complexity, something has to give.

Additionally, our industry is facing a crisis: the average age of an embedded developer is rapidly increasing and C is generally not taught in universities anymore. Thus, even as the demand for embedded intelligence in every industry continues to increase, the population of skilled and experienced C programmers is on the decline. Something has to give on this front too.

But what alternative language can be used to build real-time software, manipulate hardware directly, and be quickly ported to numerous instruction set architectures? It’s not going to be C++ or Ada or Java, for sure–as those have already been tried and found lacking. A new programming language is probably not the answer either, across so many CPU families and with so many other languages already tried.

Thus I predict that tools that are able to reliably generate those millions of lines of C code automatically for us, based on system specifications, will ultimately take over. As an example of a current tool of this sort that could be part of the trend, I direct your attention to Miro Samek’s dandy open source Quantum Platform (QP) framework for event-driven programs and his (optional) free Quantum Modeler (QM) graphical modeling tool. You may not like the idea of auto-generated code today, but I guarantee that once you push a button to generate consistent and correct code from an already expressive statechart diagram, you will see the benefits of the overall structure and be ready to move up a level in programming efficiency.

I view C as a reasonable common output language for such tools (given that C can manipulate hardware registers directly and that every processor ever invented has a C compiler). Note that I do expect there to be continued demand for those of us with the skills and interest to fine tune the performance of the generated code or write device drivers to integrate it more closely to the hardware.

Trend 3: Connectivity Drives Importance of Security

We’re increasingly connecting embedded systems–to each other and to the Internet. You’ve heard the hype (e.g., “Internet of things” and “ubiquitous computing”) and you’ve probably already also put TCP/IP into one or more of your designs. But connectivity has a lot of implications that we are only starting to come to terms with. The most obvious of these is security.

A connected device cannot hide for long behind “security through obscurity” and, so, we must design security into our connected devices from the start. In my travels around our industry I’ve observed that the majority of embedded designers are largely unfamiliar with security. Sure some of you have read about encryption algorithms and know the names of a few. But mostly the embedded community is shooting in the dark as security designers, within organizations that aren’t of much help. And security is only as strong as the weakest link in the chain.

This situation must change. Just as Flash memory has supplanted UV-erasable EPROM, so too will over-the-net patches and upgrades take center stage as a download mechanism in coming years and decades. We must architect our systems first to be secure and then to accepted trusted downloads so that our products can keep up in the inevitable arms race against hackers and attackers.

And That’s a Wrap

Whatever the future holds, I am certain that embedded software development will remain an engaging and challenging career. And you’ll still find me writing about the field at https://embeddedgurus.com/barr-code and http://twitter.com/embeddedbarr.

Tools to Detect Software Copyright Infringement

Thursday, September 23rd, 2010 Michael Barr

An emerging class of tools makes it easy to automatically detect copying of copyrighted software source code, even if it came from one of the hundreds of thousands of open source packages.

I am presently providing litigation support in a case of alleged software copyright infringement.  In a nutshell, the plaintiff brought suit against the defendant for allegedly continuing to use plaintiff’s copyrighted software source code in defendant’s products after termination of a license agreement between the parties.  Fortunately, automated tools are making it easier than ever to quickly and inexpensively detect copying of software source code.

Some of the most powerful tools for doing direct comparisons between a pair of source code sets are from S.A.F.E. Their CodeMatch tool works by comparing each file of source code in the first set with every file of code in the second set.  Results are presented in a table that is sorted by the relative amount of matching code in the files.  And CodeMatch is clever enough to detect copying in which variable and function names and other details were subsequently changed; CodeMatch can even detect code that was copied from one programming language into another.  The only weakness of CodeMatch is that you have to have the source code for each product, which is not always possible early in litigation.

Other tools from S.A.F.E. provide additional help.  For example, BitMatch can compare a pair of executable binary programs or one party’s source code against another’s executable code.  It works by matching strings that appear in both programs.  Meanwhile, SourceDetective helps rule out that the two programs are only similar because they both borrowed from some third program—by automatically searching the Internet for hundreds or thousands of matching phrases.  CodeMatch, BitMatch, and SourceDetective are part of a suite of related tools called CodeSuite.  CodeSuite is a free download that runs on Microsoft Windows, with license keys sold based on the amount of code to be compared.

Of course, sometimes code may be copied from open source software.  Open source software is subject to so-called copyleft licenses, which are a special type of copyright that makes the source code open to the public.  Copyleft language is drafted to ensure that the source code for certain categories of derived work are also open to the public.  This creates problems for companies that wish to keep their source code private but also rely upon open source software.

Fortunately, there are also tools to detect the presence of part of all of an open source software package within a proprietary program.  I have used such tools from Black Duck Software and Protecode.  Both work similarly: each company maintains a database of hundreds of thousands of known open source packages against which the source code you provide is tested. Results are presented as a list of open source packages from which code may have been copied. This testing can be done entirely on a personal computer running Microsoft Windows, so that proprietary source code need not be sent outside a trusted network.  Both tools are generally licensed for an expected level of use on an annual basis.

Unfortunately, the precision of CodeMatch is lost in trying to cast such a broad net for potential copying.  The tools from BlackDuck and Protecode don’t actually compare your code against each and every of the millions of source code files in their database.  Instead, they reduce each file of your source code to a simpler representation of its structure and then compute a unique mathematical signature for that new file.  This signature is subsequently compared to a similar representation of the files in their database.  In plain English, this means that you get lots of false positives.  Some open source packages that weren’t actually copied usually turn up in the results list.

When searching for potential copying of open source code, I recommend searching the database from BlackDuck or Protecode first.  Then, to eliminate the false positives, a more thorough analysis should be performed by obtaining the listed open source packages and using CodeMatch to compare the proprietary code against them file-by-file.

With the help of tools like those mentioned here, it is possible to quickly ascertain whether source code copying has taken place.  Prior to the appearance of these tools, it was necessary for an expert in software development to manual perform dozens of searching and comparison steps.  This strategy can be used early in litigation with the benefit of dramatically reducing the cost of such analysis.  The same tools can also be employed proactively by companies seeking to reduce their risks of copyright infringement litigation.