Posts Tagged ‘standards’

Proposed Rule Changes for Embedded C Coding Standard

Wednesday, June 20th, 2018 Michael Barr

My book Embedded C Coding Standard began as an internal coding standard of a consulting company and was first published in 2008 by that company (Netrino). In 2013, the book’s cover was given a new look and the standard became known as “Barr Group‘s Embedded C Coding Standard“. A 2018 update to the book will be released soon and this will be the first time the substance of the standard has changed in over a decade.

The primary motivation for making changes is to better harmonize the rules with the MISRA-C Guidelines. The older Barr Group standard made reference to MISRA C:2004, which was superseded by MISRA C:2012. The few known direct conflicts between BARR-C:2013 and MISRA C (stemming from our earlier embrace of ISO C99) were effectively eliminated with their 2012 update. But we wanted to do more than just remove those noted conflicts and push even further in our embrace of harmonization.

In a sentence, MISRA C is a safer subset of the C language plus a set guidelines for using what is left of the language more safely. In about as many words, BARR-C is a style guide for the C language that reduces the number of defects introduced during the coding phase by increasing readability and portability. It turns out there is quite a bit of value in combining rules from both standards. For example, MISRA’s guidelines do not provide stylistic advice and applying BARR-C’s stylistic rules to the C subset further increases safety.

The rest of this post is a preview of the specific rule changes and additions we will make in BARR-C:2013. (Omitted from this list are rules reworded simply for greater clarity.)

Rule 1.1.d (new)

The following rule will be added:

Preprocessor directive #define shall not be used to alter or rename any keyword or other aspect of the programming language.

Rule 1.6.b (new)

The following rule will be added:

Conversion from a pointer to void to a pointer to another type shall be cast.

Rule 1.7.c (change)

The earlier rule:

The goto keyword shall not be used.

will be replaced with:

It is a preferred practice to avoid all use of the goto keyword. If goto is used it shall only jump to a label declared later and in the same or an enclosing block.

This is being done so that the BARR-C standard does not restrict uses of goto that are permissible under MISRA C.

Rule 1.7.d (change)

The earlier rule:

The continue keyword shall not be used.

will be replaced with:

It is a preferred practice to avoid all use of the continue keyword.

This is being done so that the BARR-C standard does not restrict uses of continue that are permissible under MISRA C.

Rule 1.7.e (eliminated)

There will no longer be any restriction on the use of the break keyword. This is being done so that the BARR-C standard does not restrict uses of break that are permissible under MISRA C.

Rule 4.2.d (change)

The earlier rule:

No header file shall contain a #include statement.

will be replaced with:

No public header file shall contain a #include of any private header file.

This is being done because apparently no one (other than me) actually valued the old rule. 🙂

Rule 5.1.c (new)

The following rule will be added:

The name of all public data types shall be prefixed with their module name and an underscore.

Rule 5.4.b.v (new)

The following rule will be added:

Always invoke the isfinite() macro to check that prior calculations have resulted in neither infinity nor NaN.

Rule 5.6.a (new)

The following rule will be added:

Boolean variables shall be declared as type bool.

Rule 5.6.b (new)

The following rule will be added:

Non-Boolean values shall be converted to Boolean via use of relational operators (e.g., < or !=), not via casts.

Rule 6.2.c (changed)

The earlier rule:

All functions shall have just one exit point and it shall be at the bottom of the function. That is, the keyword return shall appear a maximum of once.

will be replaced with:

It is a preferred practice that all functions shall have just one exit point and it shall be via a return at the bottom of the function.

Rule 6.3.b.iv (new)

The following rule will be added:

Never include a transfer of control (e.g., return keyword).

Rule 7.1.n (new)

The following rule will be added:

The names of all variables representing non-pointer handles for objects, e.g., file handles, shall begin with the letter ‘h’.

Rule 7.1.o (new)

The following rule will be added:

In the case of a variable name requiring multiple of the above prefixes, the order of their inclusion before the first underscore shall be [g][p|pp][b|h].

Rule 7.2.c (new)

The following rule will be added:

If project- or file-global variables are used, their definitions shall be grouped together and placed at the top of a source code file.

Rule 7.2.d (new)

The following rule will be added:

Any pointer variable lacking an initial address shall be initialized to NULL.

Rule 8.2.a (changed)

The earlier rule:

The shortest (measured in lines of code) of the if and else if clauses should be placed first.

will be replaced with:

It is a preferred practice that the shortest (measured in lines of code) of the if and else if clauses should be placed first.

Rule 8.3.c (new)

The following rule will be added:

Any case designed to fall through to the next shall be commented to clearly explain the absence of the corresponding break.

Rule 8.5.b (new)

The following rule will be added:

The C Standard Library functions abort(), exit(), and setjmp()/longjmp() shall not be used.

If you have questions about any of these draft changes or suggestions for better or other changes, please comment below.

C’s goto Keyword: Should we Use It or Lose It?

Wednesday, June 6th, 2018 Michael Barr

In the 2008 and 2013 editions of my bug-killing Embedded C Coding Standard, I included this rule:

Rule 1.7.c. The goto keyword shall not be used.

Despite this coding standard being followed by about 1 in 8 professional embedded systems designers, none of us at Barr Group have heard any complaints that this rule is too strict. (Instead we hear lots of praise for our unique focus on reducing intra-team stylistic arguments by favoring above all else C coding rules that prevent bugs.)

However, there exist other coding standards with a more relaxed take on goto. I’ve thus been revisiting this (and related) rules for an updated 2018 edition of the BARR-C standard.

Specifically, I note that the current (2012) version of the MISRA-C Guidelines for the Use of the C Programming Language in Critical Systems merely advises against the use of goto:

Rule 15.1 (Advisory): The goto statement should not be used

though at least the use of goto is required to be appropriately narrowed:

Rule 15.2 (Required): The goto statement shall jump to a label declared later in the same function

Rule 15.3 (Required): Any label referenced by a goto statement shall be declared in the same block, or in any block enclosing the goto statement

Generally speaking, the rules of the MISRA-C:2012 standard are either the same as or stricter than those in my BARR-C standard. In addition to overlaps, they have more and stricter coding rules and we add stylistic advice. It is precisely because MISRA-C’s rules are stricter and only BARR-C includes stylistic rules that these two most popular embedded programming standards frequently and easily combined.

Which all leads to the key question:

Should the 2018 update to BARR-C relax the prior ban on all use of goto?

According to the authors of the C programming language, “Formally, [the goto keyword is] never necessary” as it is “almost always easy to write code without it”. They go on to recommend that goto “be used rarely, if at all.”

Having, in recent days, reviewed the arguments for and against goto across more than a dozen C programming books as well as other C/C++ coding standards and computer science papers, I believe there is just one best bug-killing argument for each side of this rule. And I’ll have to decide between them this week to keep the new edition on track for a June release.

Allow goto: For Better Exception Handling

There seems to be universal agreement that goto should never be used to branch UP to an earlier point in a function. Likewise, branching INTO a deeper level of nesting is a universal no-no.

However, branching DOWN in the function and OUT to an outer level of nesting are often described as powerful uses of the goto keyword. For example, a single goto statement can be used to escape from two or more levels of nesting. And this is not a behavior that can be done with a single break or continue. (To accomplish the same behavior without using goto, one could, e.g., set a flag variable in the inner block and check it in each outer block.)

Given this, the processing of exceptional conditions detected inside nested blocks is a potentially valuable application of goto that could be implemented in a manner compliant with the three MISRA-C:2012 rules quoted above. Such a jump could even proceed from two or more points in a function to a common block of error recovery code.

Because good exception handling is a property of higher reliabilty software and is therefore a potential bug killer, I believe I must consider relaxing Rule 1.7.c in BARR-C:2018 to permit this specific use of goto. A second advantage of relaxing the rule would be increasing the ease of combining rules from MISRA-C with those from BARR-C (and vice versa), which is a primary driver for the 2018 update.

Ban goto: Because It’s Even Riskier in C++

As you are likely aware, there are lots of authors who opine that use of goto “decreases readability” and/or “increases potential for bugs”. And none other than Dijkstra (50 years ago now!) declared that the “quality of programmers is [inversely proportional to their number] of goto statements”.

But–even if all true–none of these assertions is a winning argument for banning the use of goto altogether vs. the above “exception handling exception” suggested by the above.

The best bug-killing argument I have heard for continued use of the current Rule 1.7.c is that the constructors and destructors of C++ create a new hazard for goto statements. That is, if a goto jumps over a line of code on which an object would have been constructed or destructed then that step would never occur. This could, for example, lead to a very subtle and difficult type of bug to detect–such as a memory leak.

Given that C++ is a programming language intentionally backward compatible with legacy C code and that elsewhere in BARR-C there is, e.g., a rule that variables (which could be objects) should be declared as close as possible to their scope of use, relaxing the existing ban on all use of goto could forseeably create new hazards for the followers of the coding standard who later migrate to C++. Indeed, we already have many programmers following our C coding rules while crafting C++ programs.

So what do you think? What relevant experiences can bring to bear on this issue in the comments area below? Should I relax the ban on goto or maintain it? Are there any better (bug-killing) arguments for or against goto?

An Update on Toyota and Unintended Acceleration

Saturday, October 26th, 2013 Michael Barr

In early 2011, I wrote a couple of blog posts (here and here) as well as a later article (here) describing my initial thoughts on skimming NASA’s official report on its analysis of Toyota’s electronic throttle control system. Half a year later, I was contacted and retained by attorneys for numerous parties involved in suing Toyota for personal injuries and economic losses stemming from incidents of unintended acceleration. As a result, I got to look at Toyota’s engine source code directly and judge for myself.

From January 2012, I’ve led a team of seven experienced engineers, including three others from Barr Group, in reviewing Toyota’s electronic throttle and some other source code as well as related documents, in a secure room near my home in Maryland. This work proceeded in two rounds, with a first round of expert reports and depositions issued in July 2012 that led to a billion-dollar economic loss settlement as well as an undisclosed settlement of the first personal injury case set for trial in U.S. Federal Court. The second round began with an over 750 page formal written expert report by me in April 2013 and culminated this week in an Oklahoma jury’s decision that the multiple defects in Toyota’s engine software directly caused a September 2007 single vehicle crash that injured the driver and killed her passenger.

It is significant that this was the first and only jury so far to hear any opinions about Toyota’s software defects. Earlier cases either predated our source code access, applied a non-software theory, or was settled by Toyota for an undisclosed sum.

In our analysis of Toyota’s source code, we built upon the prior analysis by NASA. First, we looked more closely at more lines of the source code for more vehicles for more man months. And we also did a lot of things that NASA didn’t have time to do, including reviewing Toyota’s operating system’s internals, reviewing the source code for Toyota’s “monitor CPU”, performing an independent worst-case stack depth analysis, running portions of the main CPU software including the RTOS in a processor simulator, and demonstrating–in 2005 and 2008 Toyota Camry vehicles–a link between loss of throttle control and the numerous defects we found in the software.

In a nutshell, the team led by Barr Group found what the NASA team sought but couldn’t find: “a systematic software malfunction in the Main CPU that opens the throttle without operator action and continues to properly control fuel injection and ignition” that is not reliably detected by any fail-safe. To be clear, NASA never concluded software wasn’t at least one of the causes of Toyota’s high complaint rate for unintended acceleration; they just said they weren’t able to find the specific software defect(s) that caused unintended acceleration. We did.

Now it’s your turn to judge for yourself. Though I don’t think you can find my expert report outside the Court system, here are links to the trial transcript of my expert testimony to the Oklahoma jury and a (redacted) copy of the slides I shared with the jury in Bookout, et.al. v. Toyota.

Note that the jury in Oklahoma found that Toyota owed each victim $1.5 million in compensatory damages and also found that Toyota acted with “reckless disregard”. The latter legal standard meant the jury was headed toward deliberations on additional punitive damages when Toyota called the plaintiffs to settle (for yet another undisclosed amount). It has been reported that an additional 400+ personal injury cases are still working their way through various courts.

Related Stories

Updates

On December 13, 2013, Toyota settled the case that was set for the next trial, in West Virginia in January 2014, and announced an “intensive” settlement process to try to resolve approximately 300 of the remaining personal injury case, which are consolidated in U.S. and California courts.

Toyota continues to publicly deny there is a problem and seems to have no plans to address the unsafe design and inadequate fail safes in its drive-by-wire vehicles–the electronics and software design of which is similar in most of the Toyota and Lexus (and possibly Scion) vehicles manufactured over at least about the last ten model years. Meanwhile, incidents of unintended acceleration continue to be reported in these vehicles (see also the NHTSA complaint database) and these new incidents, when injuries are severe, continue to result in new personal injury lawsuits against Toyota.

In March 2014, the U.S. Department of Justice announced a $1.2 billion settlement in a criminal case against Toyota. As part of that settlement, Toyota admitted to past lying to NHTSA, Congress, and the public about unintended acceleration and also to putting its brand before public safety. Yet Toyota still has made no safety recalls for the defective engine software.

On April 1, 2014, I gave a keynote speech at the EE Live conference, which touched on the Toyota litigation in the context of lethal embedded software failures of the past and the coming era of self-driving vehicles. The slides from that presentation are available for download at http://www.barrgroup.com/killer-apps/.

On September 18, 2014, Professor Phil Koopman, of Carnegie Mellon University, presented a talk about his public findings in these Toyota cases entitled “A Case Study of Toyota Unintended Acceleration and Software Safety“.

On October 30, 2014, Italian computer scientist Roberto Bagnara presented a talk entitled “On the Toyota UA Case
and the Redefinition of Product Liability for Embedded Software
” at the 12th Workshop on Automotive Software & Systems, in Milan.

Trends in Embedded Software Design

Wednesday, April 18th, 2012 Michael Barr

In many ways, the story of my career as an embedded software developer is intertwined with the history of the magazine Embedded Systems Design. When it was launched in 1988, under the original title Embedded Systems Programming (ESP), I was finishing high school. Like the vast majority of people at that time, I had never heard the term “embedded system” or thought much about the computers hidden away inside other kinds of products. Six years later I was a degreed electrical engineer who, like many EEs by that time in the mid-90’s, had a job designing embedded software rather than hardware. Shortly thereafter I discovered the magazine on a colleague’s desk, and became a subscriber and devotee.

The Early Days

In the early 1990s, as now, the specialized knowledge needed to write reliable embedded software was mostly not taught in universities. The only class I’d had in programming was in FORTRAN; I’d taught myself to program in assembly and C through a pair of hands-on labs that were, in hindsight, my only formal education in writing embedded software. It was on the job and from the pages of the magazine, then, that I first learned the practical skills of writing device drivers, porting and using operating systems, meeting real-time deadlines, implementing finite state machines, the pros and cons of languages other than C and assembly, remote debugging and JTAG, and so much more.

In that era, my work as a firmware developer involved daily interactions with Intel hex files, device programmers, tubes of EPROMs with mangled pins, UV erasers, mere kilobytes of memory, 8- and 16-bit processors, in-circuit emulators, and ROM monitors. Databooks were actual books; collectively, they took up whole bookshelves. I wrote and compiled my firmware programs on an HP-UX workstation on my desk, but then had to go downstairs to a lab to burn the chips, insert them into the prototype board, and test and debug via an attached ICE. I remember that on one especially daunting project eight miles separated my compiler and device programmer from the only instance of the target hardware; a single red LED and a dusty oscilloscope were the extent of my debugging toolbox.

Like you I had the Internet at my desk in the mid-90s, but it did not yet provide much useful or relevant information to my work other than via certain FTP sites (does anyone else remember FTPing into sunsite.unc.edu? or Gopher?). The rest was mostly blinking headlines and dancing hamster; and Amazon was merely the world’s biggest river. There was not yet an Embedded.com or EETimes.com. To learn about software and hardware best practices, I pursued an MSEE and CS classes at night and traveled to the Embedded Systems Conferences.

At the time, I wasn’t aware of any books about embedded programming. And every book that I had found on C started with “Hello, World”, only went up in abstraction from there, and ended without ever once addressing peripheral control, interrupt service routines, interfacing to assembly language routines, and operating systems (real-time or other). For reasons I couldn’t explain years later when Jack Ganssle asked me, I had the gumption to think I could write that missing book for embedded C programmers, got a contract from O’Reilly, and did–ending, rather than starting, mine with “Hello, World” (via an RS-232 port).

In 1998, a series of at least three twists of fate spanning four years found me taking a seat next to an empty chair at the speaker’s lunch at an Embedded Systems Conference. The chair’s occupant turned out to be Lindsey Vereen, who was then well into his term as the second editor-in-chief of the magazine. In addition to the book, I’d written an article or two for ESP by that time and Lindsey had been impressed with my ability to explain technical nuances. When he told me that he was looking for someone to serve as a technical editor, I didn’t realize it was the first step towards my role in that position.

Future Trends

Becoming and then staying involved with the magazine, first as technical editor and later as editor-in-chief and contributing editor, has been a highlight of my professional life. I had been a huge fan of ESP and of its many great columnists and other contributors in its first decade. And now, looking back, I believe my work helped make it an even more valuable forum for the exchange of key design ideas, best practices, and industry learning in its second decade. And, though I understand the move away from print towards online publishing and advertising, I am nonetheless saddened to see the magazine come to an end.

Reflecting back on these days long past reminds me that a lot truly has changed about embedded software design. Assembly language is used far less frequently today; C and C++ much more. EPROMs with their device programmers and UV erasers have been supplanted by flash memory and bootloaders. Bus widths and memory sizes have increased dramatically. Expensive in-circuit emulators and ROM monitors have morphed into inexpensive JTAG debug ports. ROM-DOS has been replaced with whatever Microsoft is branding embedded Windows this year. And open-source Linux has done so well that it has limited the growth of the RTOS industry as a whole–and become a piece of technology we all want to master if only for our resumes.

So what does the future hold? What will the everyday experiences of embedded programmers be like in 2020, 2030, or 2040? I see three big trends that will affect us all over those timeframes, each of which has already begun to unfold.

Trend 1: Volumes Finally Shift to 32-bit CPUs

My first prediction is that inexpensive, low-power, highly-integrated microcontrollers–as best exemplified by today’s ARM Cortex-M family–will bring 32-bit CPUs into even the highest volume application domains. The volumes of 8- and 16-bit CPUs will finally decline as these parts become truly obsolete.

Though you may be programming for a 32-bit processor already, it’s still true that 8- and 16-bit processors still drive CPU chip sales volumes. I’m referring, of course, to microcontrollers such as those based on 8051, PIC, and other instruction set architectures dating back 30-40 years. These older architectures remain popular today only because certain low-margin, high-volume applications of embedded processing demand squeezing every penny out of BOM cost.

The limitations of 8- and 16-bit architectures impact the embedded programmers in a number of ways. First, there are the awkward memory limitations resulting from limited address bus widths–and the memory banks, segmenting techniques, and other workarounds to going beyond those limitations. Second, these CPUs are much better at decision making than mathematics–they lack the ability to manipulate large integers efficiently and have no floating-point capability. Finally, these older processors frequently lack modern development tools, are unable to run larger Internet-enabled operating systems, such as Linux, and don’t feature the security and reliabiltiy protections afforded by an MMU.

There will, of course, always be many applications that are extremely cost-conscious, so my prediction is not that they will disappear completely, but that the overall price (including BOM cost as well as power consumption) of 32-bit micro controllers, with their improved instruction set architectures and transistor geometries, will win on price. That will put the necessary amount of computing power into the hands of some designers and make our work easier for all of us. It also helps programmers accomplish more in less time.

Trend 2: Complexity Forces Programmers Beyond C

My second prediction is that the days of the C programming language’s dominance in embedded systems are numbered.

Don’t get me wrong, C is a language I know and love. But, as you may know firsthand, C is simply not up to the task of building systems requiring over a million lines of code. Nonetheless, the demanded complexity of embedded software has been driving our systems towards more than a million lines of code. At this level of complexity, something has to give.

Additionally, our industry is facing a crisis: the average age of an embedded developer is rapidly increasing and C is generally not taught in universities anymore. Thus, even as the demand for embedded intelligence in every industry continues to increase, the population of skilled and experienced C programmers is on the decline. Something has to give on this front too.

But what alternative language can be used to build real-time software, manipulate hardware directly, and be quickly ported to numerous instruction set architectures? It’s not going to be C++ or Ada or Java, for sure–as those have already been tried and found lacking. A new programming language is probably not the answer either, across so many CPU families and with so many other languages already tried.

Thus I predict that tools that are able to reliably generate those millions of lines of C code automatically for us, based on system specifications, will ultimately take over. As an example of a current tool of this sort that could be part of the trend, I direct your attention to Miro Samek’s dandy open source Quantum Platform (QP) framework for event-driven programs and his (optional) free Quantum Modeler (QM) graphical modeling tool. You may not like the idea of auto-generated code today, but I guarantee that once you push a button to generate consistent and correct code from an already expressive statechart diagram, you will see the benefits of the overall structure and be ready to move up a level in programming efficiency.

I view C as a reasonable common output language for such tools (given that C can manipulate hardware registers directly and that every processor ever invented has a C compiler). Note that I do expect there to be continued demand for those of us with the skills and interest to fine tune the performance of the generated code or write device drivers to integrate it more closely to the hardware.

Trend 3: Connectivity Drives Importance of Security

We’re increasingly connecting embedded systems–to each other and to the Internet. You’ve heard the hype (e.g., “Internet of things” and “ubiquitous computing”) and you’ve probably already also put TCP/IP into one or more of your designs. But connectivity has a lot of implications that we are only starting to come to terms with. The most obvious of these is security.

A connected device cannot hide for long behind “security through obscurity” and, so, we must design security into our connected devices from the start. In my travels around our industry I’ve observed that the majority of embedded designers are largely unfamiliar with security. Sure some of you have read about encryption algorithms and know the names of a few. But mostly the embedded community is shooting in the dark as security designers, within organizations that aren’t of much help. And security is only as strong as the weakest link in the chain.

This situation must change. Just as Flash memory has supplanted UV-erasable EPROM, so too will over-the-net patches and upgrades take center stage as a download mechanism in coming years and decades. We must architect our systems first to be secure and then to accepted trusted downloads so that our products can keep up in the inevitable arms race against hackers and attackers.

And That’s a Wrap

Whatever the future holds, I am certain that embedded software development will remain an engaging and challenging career. And you’ll still find me writing about the field at https://embeddedgurus.com/barr-code and http://twitter.com/embeddedbarr.

Building Reliable and Secure Embedded Systems

Tuesday, March 13th, 2012 Michael Barr

In this era of 140 characters or less, it has been well and concisely stated that, “RELIABILITY concerns ACCIDENTAL errors causing failures, whereas SECURITY concerns INTENTIONAL errors causing failures.” In this column I expand on this statement, especially as regards the design of embedded systems and their place in our network-connected and safety-concious modern world.

As the designers of embedded systems, the first thing we must accomplish on any project is to make the hardware and software work. That is to say we need to make the system behave as it was designed to. The first iteration of this is often flaky; certain uses or perturbations of the system by testers can easily dislodge the system into a non-working state. In common parlance, “expect bugs.”

Given time, tightening cycles of debug and test can get us past the bugs and through to a shippable product. But is a debugged system good enough? Neither reliability nor security can be tested into a product. Each must be designed in from the start. So let’s take a closer look at these two important design aspects for modern embedded systems and then I’ll bring them back together at the end.

Reliable Embedded Systems

A product can be stable yet lack reliability. Consider, for example, an anti-lock braking computer installed in a car. The software in the anti-lock brakes may be bug-free, but how does it function if a critical input sensor fails?

Reliable systems are robust in the face of adverse run-time environments. Reliable systems are able to work around errors encountered as they occur to the system in the field–so that the number and impact of failures are minimized. One key strategy for building reliable systems is to eliminate single-points-of-failure. For example, redundancy could be added around that critical input sensor–perhaps by adding a second sensor in parallel with the first.

Another aspect of reliability that is under the complete control of designers (at least when they consider it from the start) are the “fail-safe” mechanisms. Perhaps a suitable but lower-cost alternative to a redundant sensor is detection of the failed sensor with a fall back to mechanical braking.

Failure Mode and Effect Analysis (FMEA) is one of the most effective and important design processes used by engineers serious about designing reliability into their systems. Following this process, each possible failure point is traced from the root failure outward to its effects. In an FMEA, numerical weights can be applied to the likelihoods of each failure as well as the seriousness of consequences. An FMEA can thus help guide you to a cost effective but higher reliability design by highlighting the most valuable places to insert the redundancy, fail-safes, or other elements that reinforce the system’s overall reliability.

In certain industries, reliability is a key driver of product safety. And that is why you see these techniques and FMEA and other design for reliability processes being applied by the designers of safety-critical automotive, medical, avionics, nuclear, and industrial systems. The same techniques can, of course, be used to make any type of embedded system more reliable.

Regardless of your industry, it is typically difficult or impossible to make your product as reliable via patches. There’s no way to add hardware like that redundant sensor, so your options may reduce to a fail-safe that is helpful but less reliable overall. Reliability cannot be patched or tested or debugged into your system. Rather, reliability must be designed in from the start.

Secure Embedded Systems

A product can also be stable yet lack security. For example, an office printer is the kind of product most of us purchase and use without giving a minute of thought to security. The software in the printer may be bug-free, but is it able to prevent a would-be eavesdropper from capturing a remote electronic copy of everything you print, including your sensitive financial documents?

Secure systems are robust in the face of persistent attack. Secure systems are able to keep hackers out by design. One key strategy for building secure systems is to validate all inputs, especially those arriving over an open network connection. For example, security could be added to a printer by ensuring against buffer overflows and encrypting and digitally signing firmware updates.

One of the unfortunate facts of designing secure embedded systems is that the hackers who want to get in only need to find and exploit a single weakness. Adding layers of security is good, but if even any one of those layers remains fundamentally weak, a sufficiently motivated attacker will eventually find and breach that defense. But that’s not an excuse for not trying.

For years, the largest printer maker in the world apparently gave little thought to the security of the firmware in its home/office printers, even as it was putting tens of millions of tempting targets out into the world. Now the security of those printers has been breached by security researchers with a reasonable awareness of embedded systems design. Said one of the lead researchers, “We can actually modify the firmware of the printer as part of a legitimate document. It renders correctly, and at the end of the job there’s a firmware update. … In a super-secure environment where there’s a firewall and no access — the government, Wall Street — you could send a résumé to print out.”

Security is a brave new world for many embedded systems designers. For decades we have relied on the fact that the microcontrollers and Flash memory and real-time operating systems and other less mainstream technologies we use will protect our products from attack. Or that we can gain enough “security by obscurity” by keeping our communications protocols and firmware upgrade processes secret. But we no longer live in that world. You must adapt.

Consider the implications of an insecure design of an automotive safety system that is connected to another Internet-connected computer in the car via CAN; or the insecure design of an implanted medical device; or the insecure design of your product.

Too often, the ability to upgrade a product’s firmware in the field is the very vector that’s used to attack. This can happen even when a primary purpose for including remote firmware updates is motivated by security. For example, as I’ve learned in my work as an expert witness in numerous cases involving reverse engineering of the techniques and technology of satellite television piracy, much of that piracy has been empowered by the same software patching mechanism that allowed the broadcasters to perform security upgrades and electronic countermeasures. Ironically, had the security smart cards in those set-top boxes had only masked ROM images the overall system security may have been higher. This was certainly not what the designers of the system had in mind. But security is also an arms race.

Like reliability, security must be designed in from the start. Security can’t be patched or tested or debugged in. You simply can’t add security as effectively once the product ships. For example, an attacker who wished to exploit a current weakness in your office printer or smart card might download his hack software into your device and write-protect his sectors of the flash today so that his code could remain resident even as you applied security patches.

Reliable and Secure Embedded Systems

It is important to note at this point that reliable systems are inherently more secure. And that, vice versa, secure systems are inherently more reliable. So, although, design for reliability and design for security will often individually yield different results–there is also an overlap between them.

An investment in reliability, for example, generally pays off in security. Why? Well, because a more reliable system is more robust in its handling of all errors, whether they are accidental or intentional. An anti-lock braking system with a fall back to mechanical braking for increased reliability is also more secure against an attack against that critical hardware input sensor. Similarly, those printers wouldn’t be at risk of fuser-induced fire in the case of a security breach if they were never at risk of fire in the case of any misbehavior of the software.

Consider, importantly, that one of the first things a hacker intent on breaching the security of your embedded device might do is to perform a (mental, at least) fault tree analysis of your system. This attacker would then target her time, talents, and other resources at one or more single points of failure she considers most likely to fail in a useful way.

Because a fault tree analysis starts from the general goal and works inward deductively toward the identification of one or more choke points that might produce the desired erroneous outcome, attention paid to increasing reliability such as via FMEA usually reduces choke points and makes the attacker’s job considerably more difficult. Where security can break down even in a reliable system is where the possibility of an attacker’s intentionally induced failure is ignored in the FMEA weighting and thus possible layers of protection are omitted.

Similarly, an investment in security may pay off in greater reliability–even without a directed focus on reliability. For example, if you secure your firmware upgrade process to accept only encrypted and digitally signed binary images you’ll be adding a layer of protection against an inadvertently corrupted binary causing an accidental error and product failure. Anything you do to improve the security of communications (i.e., checksums, prevention of buffer overflows, etc.) can have a similar effect on reliability.

The Only Way Forward

Each year it becomes increasingly important for all of us in the embedded systems design community to learn to design reliable and secure products. If you don’t, it might be your product making the wrong kind of headlines and your source code and design documents being poured over by lawyers. It is no longer acceptable to stick your head in the sand on these issues.