embedded software boot camp

Firmalware

February 17th, 2015 by Nigel Jones

There’s a fascinating story from Reuters (with a far more detailed report from Kaspersky) about how a very sophisticated hacking operation, presumably the NSA, has been targeting computers by reflashing the firmware of hard drives such that the attacker controls what is loaded at boot time. If you think this has shades of Stuxnet about it, then you aren’t alone.

Why am I posting this? Well I think in the embedded community there’s been a certain amount of nonchalance concerning malware attacks on firmware, aka Firmalware. I see a lot of shrugs – it’s firmware, so it’s not modifiable, or who wants to take control of X, or to attack it they will need the source code and so on. Well if you read the articles – and I strongly recommend you do, then you’ll read that the attacker almost certainly did get hold of the source code for the disk drives, that they exploited undocumented commands, that they reprogrammed the disk drive firmware and that they proceeded to take complete control of the victim’s computer.

So what’s this to do with you? Well I strongly urge you to consider the consequences of what would happen if an attacker took control of the gadget you are working on. For example, sitting on my desk right now is a USB dongle used to receive over the air digital TV broadcasts. It doesn’t sound like it’s a great avenue for exploitation. However, an attacker could easily do the following if they had control of this device.

  1. On broadcast (i.e. over the air) command, switch the dongle into acting like a USB drive. USB drives are a major source of malware infection.
  2. Again on broadcast command, force the dongle to tune to a specific frequency resulting in the user being exposed to whatever the attacker wishes them to.

Although I’m not exactly the paranoid type, it really doesn’t take much imagination to work out how legions of embedded devices could be made to do some rather nasty things to their users.

The bottom line. If you haven’t thought about what happens if an attacker gets control of your gadget then you aren’t doing your job. Some things to ponder:

  1. How secure is your source code?
  2. If you have a bootstrap loader, how secure is it?
  3. When you distribute new firmware for installation on your gadget, is it distributed in encrypted form?
  4. Even if it’s distributed in encrypted form, is it downloaded in encrypted form?
  5. How do you protect the encryption keys?
  6. Are you setting the lock bits correctly so as to at least make binary extraction more challenging. (However don’t get too cocky – see this site )

The list could go on – but I think you get the idea.

Shifting Styles

November 27th, 2014 by Nigel Jones

To say it’s been some time since I last posted is an understatement! I won’t bore you with the details other than to note that sometimes there just aren’t enough hours in a day.

Anyway, today’s post is about a stylistic issue I’ve noticed in just about all code I’ve ever looked at. Unless you are a closeted BASIC programmer, you probably don’t ever write something like this:

foo = foo + 6;

While there’s nothing particularly wrong with this, other than looking rather odd from a mathematical perspective, just about every C programmer would use the += operator, i.e.

foo += 6;

Indeed this is true for all the arithmetic and logical operators. I.e.

foo *= 6;
foo /= 6;
foo -= 6;
foo ^= 6;
foo |= 6;
foo &= 6;

However, when it comes to the shift operators, something odd seems to happen. Almost no one writes:

foo >>= 6;

or even rarer:

foo <<= 6;

Instead folks resort to the syntax of BASIC and use:

foo = foo >> 6;
foo = foo << 6;

Why exactly is this? This thought was triggered by me looking at some of my own code from about ten years ago. Sure enough right in the middle of what was an otherwise well written piece of code (in the sense that ten years later it was easily followed and was a breeze to adapt to my latest project) I found a:

foo = foo << 6;

I have no real explanation other than we are all creatures of habit and sometimes get into inconsistent programming styles. While I wouldn’t fault someone for doing this, I do think that if you quite happily use += but not >>= then you should ponder your rationale for being inconsistent. Perhaps it will trigger a bigger introspection?

 

The engineering – marketing divide

April 6th, 2014 by Nigel Jones

We have all sat in surreal meetings with the sales and marketing folks. This video captures the dynamic perfectly (caution – you won’t know whether to laugh or cry):

The Expert Video

I actually have some sympathy for the marketing people portrayed here, as it must be very hard when you’re so far out of your depth. The person I can’t stand is the smarmy sales guy who’ll promise anything to make a sale, regardless of the consequences. I have to admit to having chewed out a few sales guys in my time that have pulled stunts like this one.

Anyway, I don’t have any particular insights on this other than to let you all know that you’re not alone when it comes to meetings like these.

Replacing nested switches with multi-dimensional arrays of pointers to functions

March 17th, 2014 by Nigel Jones

It’s been way too long since I’ve written a blog post. To those kind souls that have written to inquire if I’m still alive and kicking – thank you.  The bottom line is that there simply aren’t enough hours in the day. Anyway in an effort to get back in the groove so to speak, I thought I’d answer an email from Francois Alibert who wrote to ask how to replace a nested switch statement with a multi-dimensional array of pointers to functions. Here’s a program that illustrates his conundrum:

#include <stdint.h>
#include <stdio.h>

typedef enum 
{State1, State2, State3, Last_State}
MainState_t;

typedef enum 
    {SubState1, SubState2, SubState3, SubState4, SubState5, SubState6, Last_SubState}
SubState_t;

void demo(MainState_t State,  SubState_t SubState);

/*     Functions called from nested switch statement. 
    First digit is main state, second digit is substate */

void fn11(void);
void fn16(void);

void fn24(void);

void fn32(void);
void fn33(void);
void fn35(void);

void main(void)
{
    MainState_t main_state;
    SubState_t sub_state;
    
    for (main_state = State1; main_state < Last_State; main_state++)
    {
        for(sub_state = SubState1; sub_state < Last_SubState; sub_state++)
        {
            demo(main_state, sub_state);
        }
    }
}

void demo(MainState_t State,  SubState_t SubState)
{
    switch (State)
    {
        case State1:
            switch (SubState)
            {
                case SubState1:
                fn11();
                break;
                
                case SubState6:
                fn16();
                break;
            
                default:
                break;
            }
        break;

        case State2:
            switch (SubState)
            {
                case SubState4:
                fn24();
                break;
    
                default:
                break;
            }
        break;
    
        case State3:
        {
            switch (SubState)
            {
                case SubState2:
                fn32();
                break;
                
                case SubState3:
                fn33();
                break;                
                
                case SubState5:
                fn35();
                break;
    
                default:
                break;
            }
        }
        break;
        
        default:
        break;
    }
}

void fn11(void)
{
    puts("State 1, substate 1");
}

void fn16(void)
{
    puts("State 1, substate 6");
}

void fn24(void)
{
    puts("State 2, substate 4");
}

void fn32(void)
{
    puts("State 3, substate 2");
}

void fn33(void)
{
    puts("State 3, substate 3");
}

void fn35(void)
{
    puts("State 3, substate 5");
}

The key points are that we have nested switch statements and the substate is sparse. That is the number of substates for main state 1 is different to that of the substates for main state 2 and so on. If you’ve ever been in the situation of having to write a nested state machine like this, you’ll rapidly find that the code becomes very unwieldy. In particular functions many of hundreds of lines long with break statements all over the place are the norm. The result can be a maintenance nightmare. Of course if you end up going to three levels, then the problem compounds. Anyway, before looking at a pointer to function implementation, here’s the output from the above code:

State 1, substate 1
State 1, substate 6
State 2, substate 4
State 3, substate 2
State 3, substate 3
State 3, substate 5

In addition, using IAR’s AVR compiler, the code size with full size optimization is 574 bytes and the execution time is  2159 cycles, with the bulk of the execution time taken up by the puts() call.

Let’s now turn this into a pointer to function implementation. The function demo becomes this:

void demo(MainState_t State,  SubState_t SubState)
{
    static void (* const pf[Last_State][Last_SubState])(void) = 
    {
        {fn11, fnDummy, fnDummy, fnDummy, fnDummy, fn16},
        {fnDummy, fnDummy, fnDummy, fn24, fnDummy, fnDummy},
        {fnDummy, fn32, fn33, fnDummy, fn35, fnDummy}
    };
    
    if ((State < Last_State) && (SubState < Last_SubState))
    {
        (*pf[State][SubState])();
    }
}

Note that the empty portions of the array are populated with a call to fnDummy(), which as its name suggests is a dummy function that does nothing. You can of course put a NULL pointer in the array, and then extract the pointer, check to see if its non-NULL and call the function, However in my experience its always faster to just call a dummy function.

So how does this stack up to the nested switch statements? Well as written, the code size has increased to 628 bytes and cycles to 2846. This is a significant increase in overhead. However the code is a lot more compact, and in my opinion dramatically more maintainable. Furthermore, if you can guarantee by design that the parameters passed to demo() are within the array bounds (as is the case with this example), then you can arguably dispense with the bounds checking code. In which case the code size becomes 618 bytes and the execution time 2684 cycles. It’s your call as to whether the tradeoff is worth it.

 

Idling along, (or what to do in the idle task)

April 14th, 2013 by Nigel Jones

If you are using an RTOS in your latest design then no doubt you have an idle task. (Most of the time, the idle task is explicit and is the user task with the lowest priority; sometimes it’s built into the RTOS). It’s been my experience that the idle task is an interesting beast. On the one hand it is what gets executed when there’s nothing else to do, and thus inherently contains nothing directly related the product. On the other hand it’s this wonderful resource that you can exploit to do all sorts of interesting things to improve your product without having to worry too much about it impacting the running of your product.

With that being said, here are some suggestions for what your idle task can do, starting with one thing it shouldn’t do.

Do Nothing

If your idle task consists of a do-nothing loop, then you are almost certainly missing an opportunity. Hopefully the suggestions below will serve to spark your creative juices.

Watchdog

In any RTOS based design, the idle task should play a role in the overall watchdog supervisor. Without going into a treatise on watchdog design, suffice it to say if the idle task isn’t being run frequently then you’ve got a problem. Thus if the idle task doesn’t feature in your watchdog supervisor then you are doing something wrong. Note that if the idle task is featured in your watchdog, then it doesn’t necessarily mean you are doing it right either!

Power Save

For power constrained systems, the idle task is usually the place to put the microprocessor to sleep and / or indulge in other power saving strategies. I often find that my idle tasks consist of some of the features described here, plus power save. In other words, the idle task takes care of some housekeeping and then takes a nap.

Load calculation

Used in conjunction with hardware timers and the task switch hook function, it’s normally possible to construct a system that gives a decent indication of both overall CPU load and also the CPU utilization of each of the tasks. The idle task isn’t a bad place to do all the calculations. I find this a very useful diagnostic aid as I’m developing a system. Once you are done using it as a development aid, with a bit more work it can be modified to be part of your overall watchdog strategy, in that it can provide useful information about how tasks are (mis)behaving.

Flash Check

Just about every embedded system I’ve looked at in the last decade or two performs a CRC check on program memory on start up. However, if you are designing a system that is safety critical and / or expected to run a long time between power cycles, then you should seriously consider running a Flash CRC check in the idle task. Because no writing of memory is involved and one is instead reading memory that is supposed to be constant, there are are no real race-conditions to worry about and thus there’s no need to be entering critical sections to perform the reads. Of course if you are using an MMU or MPU then things might get a little more challenging. Naturally such a challenge is nothing for a reader of your ability! [As an aside, one of my electrical engineering professors used to say to me, “Nigel, this is nothing for a man of your ability!” One is of course simultaneously flattered, irritated and motivated. I’ve never forgotten it.]

RAM Check

This is of the course the evil twin to the Flash check. However this time you need to perform both reads and writes from locations that are being used by higher priority tasks and interrupts. You can of course only do this safely by executing a suitable lock / unlock procedure on each RAM location. Now doing this in the idle task could seriously change your system’s response time, so you need to think very seriously about how to structure such a test. A good starting point is to do just one locked read / write per idle task invocation. Of course if that results in a 10 year RAM test on your system, then you’ll need to rethink the strategy.

If you have other good ideas for idle task work then please leave them in the comments.