embedded software boot camp

How to lockup the in-flight entertainment system on a Boeing 777

June 18th, 2011 by Nigel Jones

I have recently returned from  a short trip to the UK. I flew both ways on what appeared to be a relatively new Boeing 777 courtesy of  United Airlines. As is now common place on trans-Atlantic wide-body aircraft, my seat came with its own in-flight entertainment system. After I took my seat to fly to London, I was a little surprised to see that the in-flight entertainment system was suddenly rebooted. How did I know this? Well there was cute Linux penguin in the top left hand corner, plus I (and the rest of the plane) was treated to the always delightful task of watching hundreds of lines of startup script scrolling across the screen. After a minute or two the reboot came to an end and I was presented with the now standard touch screen user interface.This should have given me a hint that the in-flight entertainment system wasn’t the most stable of applications.

Now flights from the east coast of the USA to Europe tend to be night flights, and this flight was no exception. Being a seasoned traveler I have my trans-Atlantic night flight routine down pat. Get settled in, have something to eat, put on the noise cancelling headphones, select a classical music channel and then try and sleep for the rest of the flight. I followed my routine, and after selecting the appropriate music channel, I selected the volume control icon. This resulted in a pop-up window from which I adjusted the volume to something reasonable. I then made the fateful mistake. Rather than exiting the volume adjust window, I simply hit the ‘on/off’ button. This of course doesn’t turn the system off, it merely blanks the display. I then settled in for my nap. Several hours later, I woke up and wondered how far into the flight we were and thus decided to take a look at the real-time route map. Accordingly I turned the display ‘on’ and was rewarded with what I expected to see, namely the volume control screen. A photograph is shown below:

When I touched the Back button, nothing happened. Hmmm thought I, has my system died? Answer – no. I could still adjust the volume, and I could still mute and un-mute the audio. I could also turn the system ‘on’ and ‘off’. Clearly the problem was that there was no ‘back’ associated with the Back button. Needless to say, United Airlines wasn’t going to reboot the entire system so that I could experiment some more, plus I was tired. Anyway I resolved to experiment some more on the return flight.

Thus a week later I’m on a Boeing 777 with the same in flight entertainment system. I brought up the volume control page and did nothing. After about 30 seconds, the pop-up window auto cleared. This was the clue I needed. I thus brought up the volume control page again and immediately turned the unit ‘off’ and then a few seconds later ‘on’. When I did this the back button worked correctly. I repeated the exercise a few more times, and it always worked. I then repeated the exercise, but this time I waited longer than the screen timeout period. Voila – lockup. Clearly this was a case where there was a gaping hole in the state machine that was driving the user interface. At this point I found some interesting thoughts crossing my mind:

  • Idiot. Now you can’t watch any movies.
  • I bet the designer didn’t use a formal state machine tool such as visualSTATE, or QP
  • Whenever there are two distinct ways of exiting a state (in this case, user action or the passage of time), life gets complicated
  • Preserving state across a ‘power down’ is always difficult
  • I hope the guy that wrote the in-flight entertainment  system had nothing to do with the flight control systems on the plane!

Anyway if you find yourself on a plane with an in-flight entertainment system in the near future, see if you too can crash the system – and let us know how you did it.

P.S. I woke up this morning to read that a computer ‘glitch’ effectively grounded United Airlines yesterday. See this for thoughts on United’s computer system.

Sanity checking data

May 29th, 2011 by Nigel Jones

One of the major differences between embedded systems engineers and the general public is that we tend to notice embedded systems a lot more – both when they do something very well, and also of course when they do things not so well. The latter happened to me recently when I was pursuing one of my hobbies, namely riding my bicycle. I’m quite a keen cyclist, in part because I happen to live in part of the world which has some terrific riding country. (To see what I mean, check out the photos of the ride in question. They were taken by my cycling (and skiing) buddy Bill – who I think you’ll agree is a very fine photographer. Click here to see more of his published work. If you hunt carefully, you’ll find some pictures of yours truly).

Anyway at about 25 miles into the ride, I noticed that my Incite bike computer was showing that I had ridden 34 miles. This struck me as unlikely, and so I asked Bill what mileage he had on his computer – 25 being the answer. I thus cycled through the computer screens, until I came to this one (photo courtesy of Bill Tan).

Apparently the computer thought that I had hit 132.6 mph at some point – and obviously sustained it for quite a while for me to gain about 9 miles. Now to understand how this could come about, you need to know a little about my bike computer. The computer consists of two parts, the User Interface (shown above) and the pick up. The pick up senses wheel rotation by a magnet passing close to what I assume is a Hall Effect Sensor. Now whereas many bike computers transfer the signal along a cable to the display, mine transmits it using an RF link – and this I suspect was the root cause of the problem. My guess is that at some point in the ride, I rode into an area of RF interference that the display interpreted as signals from the pickup. The firmware in the bike computer appears to have blithely accepted the RF data as valid and thus produced the ridiculous result shown here.

Now I have often been faced with this kind of problem – and the solution is not easy. However IMHO Incite really fell down on the job here. If I had been writing the code, I suspect that I’d have done the following:

  1. Median filter the data to remove random outliers. (Incite may be doing this).
  2. Sanity check the output of the median filter. If the answer is ‘impossible’ (like a human pedaling a bicycle at 132.6 mph), then reject the data and let the user know that something is amiss. Incite did neither of these things.

Rejecting the data is actually a little harder than it sounds. If you reject data, what do you replace it with? Common choices are:

  1. Zero
  2. The most recent valid data
  3. The average of the last N readings.

Each of these has its place and is application dependent.

Letting the user know that something is amiss, is usually straight forward – flashing the erroneous value is the normal solution.

Anyway, the bottom line is that a wise embedded engineer always sanity checks the incoming (and outgoing) data. If you do you are less likely to end up as the subject of a blog posting.

A personal note

I apologize for the abysmal rate at which I have been posting. I moved house this spring, which coupled with me being extremely busy has resulted in there simply not being enough hours in the day. When that happens, something has to give. I hope to return to my normal blog posting rate in July.

The N_ELEMENTS macro

March 18th, 2011 by Nigel Jones

Many years ago I came across a simple macro that has proven to be quite useful. Its usual definition looks something like this:

#define N_ELEMENTS(X)           (sizeof(X)/sizeof(*(X)))

Its nominal use is to determine the number of elements in an incomplete array declaration. For example

void foo(void)
{
 uint8_t bar[] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

Clearly this is quite useful. However, once you have this macro in your arsenal you will eventually run into a conundrum. To illustrate what I mean consider the following code:

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < BUF_SIZE; ++i)
 {
  txc(bar[i]);
 }
}

This uses the classic approach of  defining a manifest constant  (BUF_SIZE) and then using it to define the array size and also as the loop limit. The conundrum is this: is one better off using N_ELEMENTS in this case as well. In other words, is the following better code?

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

This code is guaranteed to operate on every element of the array bar[] regardless of what is done to the array declaration. For example:

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE + 1] = {0, 1, 2, 3, 4, 5};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

In this case I have changed the array declaration. The code that uses N_ELEMENTS would still work while the code that used BUF_SIZE would have failed. So from this perspective the N_ELEMENTS code is more robust. However I don’t think the N_ELEMENTS based code is as easy to read. As a result I have oscillated back and fore over the years as to which approach is better. My current view is that the N_ELEMENTS approach is indeed the better way. I’d be interested in your opinion.

An embedded systems hardware test – a collaborative effort

February 25th, 2011 by Nigel Jones

Regular readers will probably be aware that back in 2000 I wrote an article for Embedded Systems Programming magazine entitled A ‘C’ Test: The 0×10 Best Questions for Would-be Embedded Programmers. In the intervening years I have often thought that it would be entertaining / useful to come up with a similar test—except this time I would be testing someone’s hardware knowledge. As a result over the years I have collected together a number of fun questions, which I intend to use in the forth-coming article. However it occurred to me that I have a lot of very smart readers and that collectively we could put together a far better test than I could do so on my own. Thus I’m looking for your hardware questions! Before you flood me with your suggestions here are the ground rules:

  1. Embedded systems design, not hardware design
    The test is intended to test the hardware knowledge of persons writing embedded code. It is NOT a test for persons that will be designing hardware. Thus questions about the minutiae of hardware filter design are not what I’m looking for.
  2. Traps
    The best questions will be examples from your past where someone got into trouble because they didn’t understand something about the hardware that you thought they should have.
  3. Why
    As well as posing the question (and giving the answer!), please explain why you think it’s important that someone should know what you are asking.
  4. Oscilloscope and logic analyzer
    I expect that the questions will cover circuits, processor architectures and tools. While I’m interested in all three, I’m particularly interested in elegant questions that will allow the questioner to determine if the candidate knows how to use an oscilloscope or logic analyzer.
  5. Original
    Please don’t send me any copyrighted or plagiarized material. Links are of course fine. (I mention this because not only is it legally and morally wrong – but I’m also tired of people ripping off my work and claiming it as their own).
  6. Attribution
    If I choose to use your suggestion, then tell me how you’d like it attributed. Full name + email address through anonymous are all fine.
  7. Early bird…
    If I get multiple similar suggestions, then the first one received gets the credit.
  8. Fame
    By sending me something you are agreeing to let me publish it. Other than attribution (and the accompanying fame 🙂 ), no other compensation will be given.

Anyway, if you’d like to participate then contact me

Thanks! I expect that I will publish the article in a few weeks.

Consulting as a leading economic indicator – update #2

February 25th, 2011 by Nigel Jones

I have written before about consulting being a leading economic indicator. My hypothesis is that when companies need engineering help, but are unsure whether to take on employees, then they turn to consultants. Conversely when companies need to cut costs, the first to go are consultants and contractors. In short, consultants are the first to go in bad times and the first to be retained in good times. I posted an update in October 2010 where I reported that the consultants I know were seeing an increase in interest level – but not yet any real increase in actual work. So where are we 5 months later? Well my informal survey of other consultants confirms that the interest seen back in October has translated into a lot of work today. All the consultants I know are very busy; indeed their biggest problem seems to be managing demand. On this basis I’m quite confident that the embedded systems industry will see robust hiring here in the USA in the coming months. If you are looking to change jobs, it’s a good time to start dusting off the resume.