Posts Tagged ‘realtime’

Trends in Embedded Software Design

Wednesday, April 18th, 2012 Michael Barr

In many ways, the story of my career as an embedded software developer is intertwined with the history of the magazine Embedded Systems Design. When it was launched in 1988, under the original title Embedded Systems Programming (ESP), I was finishing high school. Like the vast majority of people at that time, I had never heard the term “embedded system” or thought much about the computers hidden away inside other kinds of products. Six years later I was a degreed electrical engineer who, like many EEs by that time in the mid-90′s, had a job designing embedded software rather than hardware. Shortly thereafter I discovered the magazine on a colleague’s desk, and became a subscriber and devotee.

The Early Days

In the early 1990s, as now, the specialized knowledge needed to write reliable embedded software was mostly not taught in universities. The only class I’d had in programming was in FORTRAN; I’d taught myself to program in assembly and C through a pair of hands-on labs that were, in hindsight, my only formal education in writing embedded software. It was on the job and from the pages of the magazine, then, that I first learned the practical skills of writing device drivers, porting and using operating systems, meeting real-time deadlines, implementing finite state machines, the pros and cons of languages other than C and assembly, remote debugging and JTAG, and so much more.

In that era, my work as a firmware developer involved daily interactions with Intel hex files, device programmers, tubes of EPROMs with mangled pins, UV erasers, mere kilobytes of memory, 8- and 16-bit processors, in-circuit emulators, and ROM monitors. Databooks were actual books; collectively, they took up whole bookshelves. I wrote and compiled my firmware programs on an HP-UX workstation on my desk, but then had to go downstairs to a lab to burn the chips, insert them into the prototype board, and test and debug via an attached ICE. I remember that on one especially daunting project eight miles separated my compiler and device programmer from the only instance of the target hardware; a single red LED and a dusty oscilloscope were the extent of my debugging toolbox.

Like you I had the Internet at my desk in the mid-90s, but it did not yet provide much useful or relevant information to my work other than via certain FTP sites (does anyone else remember FTPing into sunsite.unc.edu? or Gopher?). The rest was mostly blinking headlines and dancing hamster; and Amazon was merely the world’s biggest river. There was not yet an Embedded.com or EETimes.com. To learn about software and hardware best practices, I pursued an MSEE and CS classes at night and traveled to the Embedded Systems Conferences.

At the time, I wasn’t aware of any books about embedded programming. And every book that I had found on C started with “Hello, World”, only went up in abstraction from there, and ended without ever once addressing peripheral control, interrupt service routines, interfacing to assembly language routines, and operating systems (real-time or other). For reasons I couldn’t explain years later when Jack Ganssle asked me, I had the gumption to think I could write that missing book for embedded C programmers, got a contract from O’Reilly, and did–ending, rather than starting, mine with “Hello, World” (via an RS-232 port).

In 1998, a series of at least three twists of fate spanning four years found me taking a seat next to an empty chair at the speaker’s lunch at an Embedded Systems Conference. The chair’s occupant turned out to be Lindsey Vereen, who was then well into his term as the second editor-in-chief of the magazine. In addition to the book, I’d written an article or two for ESP by that time and Lindsey had been impressed with my ability to explain technical nuances. When he told me that he was looking for someone to serve as a technical editor, I didn’t realize it was the first step towards my role in that position.

Future Trends

Becoming and then staying involved with the magazine, first as technical editor and later as editor-in-chief and contributing editor, has been a highlight of my professional life. I had been a huge fan of ESP and of its many great columnists and other contributors in its first decade. And now, looking back, I believe my work helped make it an even more valuable forum for the exchange of key design ideas, best practices, and industry learning in its second decade. And, though I understand the move away from print towards online publishing and advertising, I am nonetheless saddened to see the magazine come to an end.

Reflecting back on these days long past reminds me that a lot truly has changed about embedded software design. Assembly language is used far less frequently today; C and C++ much more. EPROMs with their device programmers and UV erasers have been supplanted by flash memory and bootloaders. Bus widths and memory sizes have increased dramatically. Expensive in-circuit emulators and ROM monitors have morphed into inexpensive JTAG debug ports. ROM-DOS has been replaced with whatever Microsoft is branding embedded Windows this year. And open-source Linux has done so well that it has limited the growth of the RTOS industry as a whole–and become a piece of technology we all want to master if only for our resumes.

So what does the future hold? What will the everyday experiences of embedded programmers be like in 2020, 2030, or 2040? I see three big trends that will affect us all over those timeframes, each of which has already begun to unfold.

Trend 1: Volumes Finally Shift to 32-bit CPUs

My first prediction is that inexpensive, low-power, highly-integrated microcontrollers–as best exemplified by today’s ARM Cortex-M family–will bring 32-bit CPUs into even the highest volume application domains. The volumes of 8- and 16-bit CPUs will finally decline as these parts become truly obsolete.

Though you may be programming for a 32-bit processor already, it’s still true that 8- and 16-bit processors still drive CPU chip sales volumes. I’m referring, of course, to microcontrollers such as those based on 8051, PIC, and other instruction set architectures dating back 30-40 years. These older architectures remain popular today only because certain low-margin, high-volume applications of embedded processing demand squeezing every penny out of BOM cost.

The limitations of 8- and 16-bit architectures impact the embedded programmers in a number of ways. First, there are the awkward memory limitations resulting from limited address bus widths–and the memory banks, segmenting techniques, and other workarounds to going beyond those limitations. Second, these CPUs are much better at decision making than mathematics–they lack the ability to manipulate large integers efficiently and have no floating-point capability. Finally, these older processors frequently lack modern development tools, are unable to run larger Internet-enabled operating systems, such as Linux, and don’t feature the security and reliabiltiy protections afforded by an MMU.

There will, of course, always be many applications that are extremely cost-conscious, so my prediction is not that they will disappear completely, but that the overall price (including BOM cost as well as power consumption) of 32-bit micro controllers, with their improved instruction set architectures and transistor geometries, will win on price. That will put the necessary amount of computing power into the hands of some designers and make our work easier for all of us. It also helps programmers accomplish more in less time.

Trend 2: Complexity Forces Programmers Beyond C

My second prediction is that the days of the C programming language’s dominance in embedded systems are numbered.

Don’t get me wrong, C is a language I know and love. But, as you may know firsthand, C is simply not up to the task of building systems requiring over a million lines of code. Nonetheless, the demanded complexity of embedded software has been driving our systems towards more than a million lines of code. At this level of complexity, something has to give.

Additionally, our industry is facing a crisis: the average age of an embedded developer is rapidly increasing and C is generally not taught in universities anymore. Thus, even as the demand for embedded intelligence in every industry continues to increase, the population of skilled and experienced C programmers is on the decline. Something has to give on this front too.

But what alternative language can be used to build real-time software, manipulate hardware directly, and be quickly ported to numerous instruction set architectures? It’s not going to be C++ or Ada or Java, for sure–as those have already been tried and found lacking. A new programming language is probably not the answer either, across so many CPU families and with so many other languages already tried.

Thus I predict that tools that are able to reliably generate those millions of lines of C code automatically for us, based on system specifications, will ultimately take over. As an example of a current tool of this sort that could be part of the trend, I direct your attention to Miro Samek’s dandy open source Quantum Platform (QP) framework for event-driven programs and his (optional) free Quantum Modeler (QM) graphical modeling tool. You may not like the idea of auto-generated code today, but I guarantee that once you push a button to generate consistent and correct code from an already expressive statechart diagram, you will see the benefits of the overall structure and be ready to move up a level in programming efficiency.

I view C as a reasonable common output language for such tools (given that C can manipulate hardware registers directly and that every processor ever invented has a C compiler). Note that I do expect there to be continued demand for those of us with the skills and interest to fine tune the performance of the generated code or write device drivers to integrate it more closely to the hardware.

Trend 3: Connectivity Drives Importance of Security

We’re increasingly connecting embedded systems–to each other and to the Internet. You’ve heard the hype (e.g., “Internet of things” and “ubiquitous computing”) and you’ve probably already also put TCP/IP into one or more of your designs. But connectivity has a lot of implications that we are only starting to come to terms with. The most obvious of these is security.

A connected device cannot hide for long behind “security through obscurity” and, so, we must design security into our connected devices from the start. In my travels around our industry I’ve observed that the majority of embedded designers are largely unfamiliar with security. Sure some of you have read about encryption algorithms and know the names of a few. But mostly the embedded community is shooting in the dark as security designers, within organizations that aren’t of much help. And security is only as strong as the weakest link in the chain.

This situation must change. Just as Flash memory has supplanted UV-erasable EPROM, so too will over-the-net patches and upgrades take center stage as a download mechanism in coming years and decades. We must architect our systems first to be secure and then to accepted trusted downloads so that our products can keep up in the inevitable arms race against hackers and attackers.

And That’s a Wrap

Whatever the future holds, I am certain that embedded software development will remain an engaging and challenging career. And you’ll still find me writing about the field at http://embeddedgurus.com/barr-code and http://twitter.com/embeddedbarr.

Building Reliable and Secure Embedded Systems

Tuesday, March 13th, 2012 Michael Barr

In this era of 140 characters or less, it has been well and concisely stated that, “RELIABILITY concerns ACCIDENTAL errors causing failures, whereas SECURITY concerns INTENTIONAL errors causing failures.” In this column I expand on this statement, especially as regards the design of embedded systems and their place in our network-connected and safety-concious modern world.

As the designers of embedded systems, the first thing we must accomplish on any project is to make the hardware and software work. That is to say we need to make the system behave as it was designed to. The first iteration of this is often flaky; certain uses or perturbations of the system by testers can easily dislodge the system into a non-working state. In common parlance, “expect bugs.”

Given time, tightening cycles of debug and test can get us past the bugs and through to a shippable product. But is a debugged system good enough? Neither reliability nor security can be tested into a product. Each must be designed in from the start. So let’s take a closer look at these two important design aspects for modern embedded systems and then I’ll bring them back together at the end.

Reliable Embedded Systems

A product can be stable yet lack reliability. Consider, for example, an anti-lock braking computer installed in a car. The software in the anti-lock brakes may be bug-free, but how does it function if a critical input sensor fails?

Reliable systems are robust in the face of adverse run-time environments. Reliable systems are able to work around errors encountered as they occur to the system in the field–so that the number and impact of failures are minimized. One key strategy for building reliable systems is to eliminate single-points-of-failure. For example, redundancy could be added around that critical input sensor–perhaps by adding a second sensor in parallel with the first.

Another aspect of reliability that is under the complete control of designers (at least when they consider it from the start) are the “fail-safe” mechanisms. Perhaps a suitable but lower-cost alternative to a redundant sensor is detection of the failed sensor with a fall back to mechanical braking.

Failure Mode and Effect Analysis (FMEA) is one of the most effective and important design processes used by engineers serious about designing reliability into their systems. Following this process, each possible failure point is traced from the root failure outward to its effects. In an FMEA, numerical weights can be applied to the likelihoods of each failure as well as the seriousness of consequences. An FMEA can thus help guide you to a cost effective but higher reliability design by highlighting the most valuable places to insert the redundancy, fail-safes, or other elements that reinforce the system’s overall reliability.

In certain industries, reliability is a key driver of product safety. And that is why you see these techniques and FMEA and other design for reliability processes being applied by the designers of safety-critical automotive, medical, avionics, nuclear, and industrial systems. The same techniques can, of course, be used to make any type of embedded system more reliable.

Regardless of your industry, it is typically difficult or impossible to make your product as reliable via patches. There’s no way to add hardware like that redundant sensor, so your options may reduce to a fail-safe that is helpful but less reliable overall. Reliability cannot be patched or tested or debugged into your system. Rather, reliability must be designed in from the start.

Secure Embedded Systems

A product can also be stable yet lack security. For example, an office printer is the kind of product most of us purchase and use without giving a minute of thought to security. The software in the printer may be bug-free, but is it able to prevent a would-be eavesdropper from capturing a remote electronic copy of everything you print, including your sensitive financial documents?

Secure systems are robust in the face of persistent attack. Secure systems are able to keep hackers out by design. One key strategy for building secure systems is to validate all inputs, especially those arriving over an open network connection. For example, security could be added to a printer by ensuring against buffer overflows and encrypting and digitally signing firmware updates.

One of the unfortunate facts of designing secure embedded systems is that the hackers who want to get in only need to find and exploit a single weakness. Adding layers of security is good, but if even any one of those layers remains fundamentally weak, a sufficiently motivated attacker will eventually find and breach that defense. But that’s not an excuse for not trying.

For years, the largest printer maker in the world apparently gave little thought to the security of the firmware in its home/office printers, even as it was putting tens of millions of tempting targets out into the world. Now the security of those printers has been breached by security researchers with a reasonable awareness of embedded systems design. Said one of the lead researchers, “We can actually modify the firmware of the printer as part of a legitimate document. It renders correctly, and at the end of the job there’s a firmware update. … In a super-secure environment where there’s a firewall and no access — the government, Wall Street — you could send a résumé to print out.”

Security is a brave new world for many embedded systems designers. For decades we have relied on the fact that the microcontrollers and Flash memory and real-time operating systems and other less mainstream technologies we use will protect our products from attack. Or that we can gain enough “security by obscurity” by keeping our communications protocols and firmware upgrade processes secret. But we no longer live in that world. You must adapt.

Consider the implications of an insecure design of an automotive safety system that is connected to another Internet-connected computer in the car via CAN; or the insecure design of an implanted medical device; or the insecure design of your product.

Too often, the ability to upgrade a product’s firmware in the field is the very vector that’s used to attack. This can happen even when a primary purpose for including remote firmware updates is motivated by security. For example, as I’ve learned in my work as an expert witness in numerous cases involving reverse engineering of the techniques and technology of satellite television piracy, much of that piracy has been empowered by the same software patching mechanism that allowed the broadcasters to perform security upgrades and electronic countermeasures. Ironically, had the security smart cards in those set-top boxes had only masked ROM images the overall system security may have been higher. This was certainly not what the designers of the system had in mind. But security is also an arms race.

Like reliability, security must be designed in from the start. Security can’t be patched or tested or debugged in. You simply can’t add security as effectively once the product ships. For example, an attacker who wished to exploit a current weakness in your office printer or smart card might download his hack software into your device and write-protect his sectors of the flash today so that his code could remain resident even as you applied security patches.

Reliable and Secure Embedded Systems

It is important to note at this point that reliable systems are inherently more secure. And that, vice versa, secure systems are inherently more reliable. So, although, design for reliability and design for security will often individually yield different results–there is also an overlap between them.

An investment in reliability, for example, generally pays off in security. Why? Well, because a more reliable system is more robust in its handling of all errors, whether they are accidental or intentional. An anti-lock braking system with a fall back to mechanical braking for increased reliability is also more secure against an attack against that critical hardware input sensor. Similarly, those printers wouldn’t be at risk of fuser-induced fire in the case of a security breach if they were never at risk of fire in the case of any misbehavior of the software.

Consider, importantly, that one of the first things a hacker intent on breaching the security of your embedded device might do is to perform a (mental, at least) fault tree analysis of your system. This attacker would then target her time, talents, and other resources at one or more single points of failure she considers most likely to fail in a useful way.

Because a fault tree analysis starts from the general goal and works inward deductively toward the identification of one or more choke points that might produce the desired erroneous outcome, attention paid to increasing reliability such as via FMEA usually reduces choke points and makes the attacker’s job considerably more difficult. Where security can break down even in a reliable system is where the possibility of an attacker’s intentionally induced failure is ignored in the FMEA weighting and thus possible layers of protection are omitted.

Similarly, an investment in security may pay off in greater reliability–even without a directed focus on reliability. For example, if you secure your firmware upgrade process to accept only encrypted and digitally signed binary images you’ll be adding a layer of protection against an inadvertently corrupted binary causing an accidental error and product failure. Anything you do to improve the security of communications (i.e., checksums, prevention of buffer overflows, etc.) can have a similar effect on reliability.

The Only Way Forward

Each year it becomes increasingly important for all of us in the embedded systems design community to learn to design reliable and secure products. If you don’t, it might be your product making the wrong kind of headlines and your source code and design documents being poured over by lawyers. It is no longer acceptable to stick your head in the sand on these issues.

Embedded Software Training in a Box

Friday, May 6th, 2011 Michael Barr

Embedded Software Training in a BoxI am beaming with pride. I think we have finally achieved the holy grail of firmware training: Embedded Software Training in a Box. Priced at just $599, the kit includes Everything-You-Need-to-Know-to-Develop-Quality-Reliable-Firmware-in-C, including software for real-time safety-critical systems such as medical devices.

In many ways, this product is the culmination of about the last fifteen years of my career. The knowledge and skills imparted in the kit are drawn from my varied experiences as:

This kit also–at long last–answers the question I’ve been receiving from around the world since I first started writing articles and books about embedded programming: “Where/How can I learn to be a great embedded programmer?” I believe the answer is now as easy as: “Embedded Software Boot Camp in a Box!”

Beer and Boards at ESC Silicon Valley

Wednesday, April 6th, 2011 Michael Barr

It really looks like I’ve picked the wrong year to miss ESC Silicon Valley (due to a schedule conflict). (The last time I wasn’t at ESC, it was 1997 and White Zombie was still together. The first thing I’d really liked to have seen is Steve Wozniak‘s keynote speech. The second thing I’m really sad to miss is the just announced “Beer and Boards” party/giveaway.

Beer and Boards sounds really fun. Here’s how it works: Every “All Access” attendee will get to choose one of three free development kits to take home:

Once you select your preferred kit you will receive information on the time and place for the relevant Beer and Boards party, at which you will get to drink free beer at a special meet-and-greet with one of your kit’s designers to talk about your new kit and its capabilities. Three boards spread out over three days.

Nerds drinking beer! I love it. What will they think of next?

Before you register for this year’s ESC, be sure to check out my earlier post Save Big on Embedded Systems Conference Registration. Also, remember to use the promo code BARR20 to save an additional 20% off registration and be entered to win a free seat at a future Embedded Software Boot Camp or one of 20 free copies of the Embedded C Coding Standard.

What NHTSA/NASA Didn’t Consider re: Toyota’s Firmware

Wednesday, March 2nd, 2011 Michael Barr

In a blog post yesterday (Unintended Acceleration and Other Embedded Software Bugs), I wrote extensively on the report from NASA’s technical team regarding their analysis of the embedded software in Toyota’s ETCS-i system. My overall point was that it is hard to judge the quality of their analysis (and thereby the overall conclusion that the software isn’t to blame for unintended accelerations) given the large number of redactions.

I need to put the report down and do some other work at this point, but I have a few other thoughts and observations worth writing down.

Insufficient Explanations

First, some of the explanations offered by Toyota, and apparently accepted by NASA, strike me as insufficent. For example, at pages 129-132 of Appendix A to the NASA Report there is a discussion of recursion in the Toyota firmware. “The question then is how to verify that the indirect recursion in the ETCS-i does in fact terminate (i.e., has no infinite recursion) and does not cause a stack overflow.”

“For the case of stack overflow, [redacted phrase], and therefore a stack overflow condition cannot be detected precisely. It is likely, however, that overflow would cause some form of memory corruption, which would in turn cause some bad behavior that would then cause a watchdog timer reset. Toyota relies on this assumption to claim that stack overflow does not occur because no reset occurred during testing.” (emphasis added)

I have written about what really happens during stack overflow before (Firmware-Specific Bug #4: Stack Overflow) and this explains why a reset may not result and also why it is so hard to trace a stack overflow back to that root cause. (From page 20, in NASA’s words: “The system stack is limited to just 4096 bytes, it is therefore important to secure that no execution can exceed the stack limit. This type of check is normally simple to perform in the absence of recursive procedures, which is standard in safety critical embedded software.”)

Similarly, “Toyota designed the software with a high margin of safety with respect to deadlines and timeliness. … [but] documented no formal verification that all tasks actually meet this deadline requirement.” and “All verification of timely behavior is accomplished with CPU load measurements and other measurement-based techniques.” It’s not clear to me if the NASA team is saying it buys those Toyota explanations or merely wanted to write them down. However, I do not see a sufficient explanation in this wording from page 132:

“The [worst case execution time] analysis and recursion analysis involve two distinctly different problems, but they have one thing in common: Both of their failure modes would result in a CPU reset. … These potential malfunctions, and many others such as concurrency deadlocks and CPU starvation, would eventually manifest as a spontaneous system reset.” (emphasis added)

Might not a deadlock, starvation, priority inversion, or infinite recursion be capable of producing a bit of “bad behavior” (perhaps even unintended acceleration) before that “eventual” reset? Or might not a stack overflow just corrupt one or a few important variables a little bit and that result in bad behavior rather than or before a result? These kinds of possibilities, even at very low probabilities, are important to consider in light of NASA’s calculation that the U.S.-owned Camry 2002-2007 fleet alone is running this software a cumulative one billion hours per year.

Paths Not Taken

My second observation is based upon reflection on the steps NASA might have taken in its review of Toyota’s ETCS-i firmware, but apparently did not. Specifically, there is no mention anywhere (unless it was entirely redacted) of:

  • rate monotonic analysis, which is a technique that Toyota could have used to validate the critical set of tasks with deadlines and higher priority ISRs (and that NASA could have applied in its review),
  • cyclomatic complexity, which NASA might have used as an additional winnowing tool to focus its limited time on particularly complex and hard to test routines,
  • hazard analysis and mitigation, as those terms are defined by FDA guidelines regarding software contained in medical devices, nor
  • any discussion or review of Toyota’s specific software testing regimen and bug tracking system.

Importantly, there is also a complete absence of discussion of how Toyota’s ETCS-i firmware versions evolved over time. Which makes and models (and model years) had which versions of that firmware? (Presumably there were also hardware changes worthy of note.) Were updates or patches ever made to cars once they were sold, say while at the dealer during official recalls or other types of service?