embedded software boot camp

Boeing Dreamliner ‘Bug’

May 1st, 2015 by Nigel Jones

There’s an all too familiar story in the press today. The headline at the Guardian reads “US aviation authority: Boeing 787 bug could cause ‘loss of control’. As usual with these kinds of stories, it’s light on technical details other than to reveal that the Dreamliner’s generators will fall into a fail safe mode if kept continuously powered for 248 days. In this fail-safe mode, the generator doesn’t apparently generate power. Thus if all four of the planes generators were powered on at the same time and kept powered for 248 days, then suddenly – no power. That’s what I’d call an unfortunate state of affairs if you were at 40,000 feet over the Atlantic.

So what’s special about 248 days? Well 248 days = 248 * 24 * 3600 = 21427200 seconds. Hmm that number looks familiar. Sure Enough, 2^31 / 100 ~= 21427200. From this I can deduce the following.

Boeing’s generators likely contain a signed 32 bit timer tick counter that is being incremented every 10ms (or possibly an unsigned 32 bit counter being incremented every 5ms – but that would be unusual). On the 248th day after start up, this counter overflows. What happens next is unclear, but Boeing must have some sort of error detection in place to detect that something bad has happened – and thus takes what on the face of it is a reasonable action and shuts down the generator.

However, what has really happened here is that Boeing has fallen into the trap of assuming that redundancy (i.e. four generators) leads to increased reliability. In this line of thinking, if the probability of a system failing is q, then the probability of all four systems failing is q*q*q*q – a number that is multiple orders of magnitude smaller than the probability of just one system failing. For example if q is 0.001, then q^4 is 1,000,000,000 times smaller. At this point, the probability of a complete system failure is essentially zero. However, this is only the case if the systems are statistically independent. If they are not, then you can have as many redundant systems as you like, and the probability of failure will be stuck at q. Or to put it another way, you may as well eschew having any redundancy at all.

Now in the days of purely electro-mechanical systems, you could be excused for arguing that there’s a reasonable amount of statistical independence between redundant systems. However, once computer control comes into the mix, the degree of statistical dependence skyrockets (if you’d pardon the pun). By having four generators presumably running the same software, then Boeing made itself  completely vulnerable to this kind of failure.

Anyway, I gave a talk on this very topic on life support systems at a conference a few years ago – so if you have half an hour to waste have a watch. If you’d rather read something that’s a bit more coherent than the presentation, then the proceedings are available here. My paper starts at page 193.

The bottom line is this. If you are trying to design highly reliable systems and redundancy is an important part of your safety case, then you’d damn well better give a lot of thought to common mode failures of the software system. In this case, Boeing clearly had not, resulting in a supposedly fail-proof system that actually has a trivially simple catastrophic failure mode.



Freescale customer service

March 10th, 2015 by Nigel Jones

I have to admit to having a soft spot for Freescale microprocessors. The first micro I ever used was a Motorola 6809 and for the first few years of my career I worked exclusively on 6800’s, 68HC11’s and 68000 processors.  Times changed and I largely moved away from the product range, although I did return periodically as projects dictated. Well such a project has recently come up. The project requires me to make some modifications to an existing code base and as is often the case, the original compiler and its license file have been lost to the winds of time. Accordingly, I downloaded an evaluation copy from Freescale’s web site and got to work.  After convincing myself that there were no significant issues with moving to the latest compiler version, it was time to purchase a license. And as the joke goes, that’s when the trouble started…

Freescale offers various versions of the compiler, and in addition offers various optional software components that can be purchased. Trying to work out which components I needed to purchase was incredibly hard. Anyway, after considerable time, I came up with what I thought was needed and had my client purchase the requisite licenses. Downloading and installing the licenses was ridiculously complicated (as in it took about an hour to wade through all the documents), but I eventually got there. I then invoked CodeWarrior and got a wonderfully obtuse licensing error message that seemed to be saying I needed to purchase an additional component. However the component wasn’t for sale on Freescale’s website…

Accordingly I called customer support. Here’s the gist of the conversation:

Freescale: This is is unusual. It shouldn’t do that.

Me: OK.

Freescale: We don’t offer support for licensing issues over the phone. You’ll have to send an email to technical support detailing the problem.

Me: OK. How long is the response time?

Freescale: 48 – 72 hours.

Me: Do I have this right. Your product that I’ve paid for doesn’t work as advertised, you don’t offer telephone support for licensing issues, you require me to send you an email and it will then take you up to 72 hours to get me an answer?

Freescale: Yes.

I’m not sure what planet Freescale resides on, but this level of service simply doesn’t comport with what’s needed in the embedded space today. I think I understand now why I see so few Freescale designs. Is my experience unusual or is this the norm for Freescale today?


February 17th, 2015 by Nigel Jones

There’s a fascinating story from Reuters (with a far more detailed report from Kaspersky) about how a very sophisticated hacking operation, presumably the NSA, has been targeting computers by reflashing the firmware of hard drives such that the attacker controls what is loaded at boot time. If you think this has shades of Stuxnet about it, then you aren’t alone.

Why am I posting this? Well I think in the embedded community there’s been a certain amount of nonchalance concerning malware attacks on firmware, aka Firmalware. I see a lot of shrugs – it’s firmware, so it’s not modifiable, or who wants to take control of X, or to attack it they will need the source code and so on. Well if you read the articles – and I strongly recommend you do, then you’ll read that the attacker almost certainly did get hold of the source code for the disk drives, that they exploited undocumented commands, that they reprogrammed the disk drive firmware and that they proceeded to take complete control of the victim’s computer.

So what’s this to do with you? Well I strongly urge you to consider the consequences of what would happen if an attacker took control of the gadget you are working on. For example, sitting on my desk right now is a USB dongle used to receive over the air digital TV broadcasts. It doesn’t sound like it’s a great avenue for exploitation. However, an attacker could easily do the following if they had control of this device.

  1. On broadcast (i.e. over the air) command, switch the dongle into acting like a USB drive. USB drives are a major source of malware infection.
  2. Again on broadcast command, force the dongle to tune to a specific frequency resulting in the user being exposed to whatever the attacker wishes them to.

Although I’m not exactly the paranoid type, it really doesn’t take much imagination to work out how legions of embedded devices could be made to do some rather nasty things to their users.

The bottom line. If you haven’t thought about what happens if an attacker gets control of your gadget then you aren’t doing your job. Some things to ponder:

  1. How secure is your source code?
  2. If you have a bootstrap loader, how secure is it?
  3. When you distribute new firmware for installation on your gadget, is it distributed in encrypted form?
  4. Even if it’s distributed in encrypted form, is it downloaded in encrypted form?
  5. How do you protect the encryption keys?
  6. Are you setting the lock bits correctly so as to at least make binary extraction more challenging. (However don’t get too cocky – see this site )

The list could go on – but I think you get the idea.

Shifting Styles

November 27th, 2014 by Nigel Jones

To say it’s been some time since I last posted is an understatement! I won’t bore you with the details other than to note that sometimes there just aren’t enough hours in a day.

Anyway, today’s post is about a stylistic issue I’ve noticed in just about all code I’ve ever looked at. Unless you are a closeted BASIC programmer, you probably don’t ever write something like this:

foo = foo + 6;

While there’s nothing particularly wrong with this, other than looking rather odd from a mathematical perspective, just about every C programmer would use the += operator, i.e.

foo += 6;

Indeed this is true for all the arithmetic and logical operators. I.e.

foo *= 6;
foo /= 6;
foo -= 6;
foo ^= 6;
foo |= 6;
foo &= 6;

However, when it comes to the shift operators, something odd seems to happen. Almost no one writes:

foo >>= 6;

or even rarer:

foo <<= 6;

Instead folks resort to the syntax of BASIC and use:

foo = foo >> 6;
foo = foo << 6;

Why exactly is this? This thought was triggered by me looking at some of my own code from about ten years ago. Sure enough right in the middle of what was an otherwise well written piece of code (in the sense that ten years later it was easily followed and was a breeze to adapt to my latest project) I found a:

foo = foo << 6;

I have no real explanation other than we are all creatures of habit and sometimes get into inconsistent programming styles. While I wouldn’t fault someone for doing this, I do think that if you quite happily use += but not >>= then you should ponder your rationale for being inconsistent. Perhaps it will trigger a bigger introspection?


The engineering – marketing divide

April 6th, 2014 by Nigel Jones

We have all sat in surreal meetings with the sales and marketing folks. This video captures the dynamic perfectly (caution – you won’t know whether to laugh or cry):

The Expert Video

I actually have some sympathy for the marketing people portrayed here, as it must be very hard when you’re so far out of your depth. The person I can’t stand is the smarmy sales guy who’ll promise anything to make a sale, regardless of the consequences. I have to admit to having chewed out a few sales guys in my time that have pulled stunts like this one.

Anyway, I don’t have any particular insights on this other than to let you all know that you’re not alone when it comes to meetings like these.