embedded software boot camp

EEPROM Wear Leveling

July 5th, 2017 by Nigel Jones

A problem that occurs from time to time is the need for an embedded system to keep track of the number of doodads it has processed since it was put in to service. Clearly you can’t keep this value in RAM as it will be lost when power is removed. Thus the obvious solution is to store the count in EEPROM. However, common EEPROM  specifications are 10^5 guaranteed write cycles, with 10^6 typical write cycles. If you expect to process say 20 million doodads during the life of your product you clearly can’t do something like this:

__eeprom uint32_t doodad_cntr;

The above assumes that the __eeprom memory space qualifier is sufficient for the compiler to generate the requisite code sequences to access EEPROM.

The first part of the solution is to define an array in EEPROM for holding the doodad counter. Thus:

__eeprom uint32_t doodad_cntr[DOODAD_ARRAY_SIZE];

Since the number of doodads processed increases monotonically, on power up one simply searches the array looking for the last entry, – i.e. the entry in the array with the biggest value. One records this offset into a static variable. Thus:

static uint8_t doodad_array_offset;
doodad_array_offset = find_last_used_entry();

The next time a doodad is processed, then the write occurs at the next location beyond doodad_array_offset in doodad_cntr[]. This simple technique immediately gives one a guaranteed increase of DOODAD_ARRAY_SIZE more writes than using a single variable. Whilst this is nice, there are a number of other things that one can do to improve the situation. These are all based on the observation that the erased state of EEPROM is a ‘1’ and not a zero. Thus a ‘blank’ EEPROM actually consists of 0xFF …. 0xFF, without a zero in sight. To take advantage of this, rather than writing the actual doodad count value to the array, instead write the 1’s complement of the value. This means that rather than writing, for example, the value 0x00000001 to EEPROM, you’ll instead write 0xFFFFFFFE. In this case the actual number of bits that have to change state is just 1 rather than 31, resulting in considerably less stress on the EEPROM, potentially increasing the life of the EEPROM. Note that this technique is equivalent to initializing the EEPROM to 0xFFFFFFFF and then decrementing.

Writing the 1’s complement also opens up another potential improvement. EEPROM is often byte or word addressable. Furthermore, to program the EEPROM is usually a 2-step process consisting of erasing the memory location (i.e. erase it to 0xFF) and then programming the erased location with the desired value. Using 1’s complement values, it should be apparent that for most of the time, many of the bytes in the EEPROM will be at 0xFF. If we combine this with the fact that most of the time only one out of the four bytes in a uint32_t will change value when a uint32_t value is incremented, then we can dramatically minimize the actual number of erases / writes performed on the EEPROM. The code to do this looks a bit like this:

typedef union {
    uint8_t cnt_bytes[4U];
    uint32_t counter;
} eeprom_union_t;
__eeprom eeprom_union_t doodad_cntr[DOODAD_ARRAY_SIZE];

void increment_doodad(void) {
    eeprom_union_t current_value.counter = doodad_cntr[doodad_array_offset].counter;
    if (++doodad_array_offset >= DOODAD_ARRAY_SIZE) {
        doodad_array_offset = 0U;
    --current_value.counter;  //1's complement increment
    for (uint8_t i = 0U; i < 4U; ++i) {
        //Are the bytes different? If no, then do nothing
        if (doodad_array[doodad_array_offset].cnt_bytes[i] != current_value.cnt_bytes[i]) {
            //Need to erase byte
            //See if we need to actually program now
            if (current_value.cnt_bytes[i] != 0xFFU) {
                doodad_array[doodad_array_offset].cnt_bytes[i] = current_value.cnt_bytes[i];

Clearly this is quite a bit more work. However given that EEPROM erase / write times are in the 2 – 10ms arena, the time taken in comparison to execute the above code is insignificant. Given that it will also save millions of EEPROM writes over the life of the product, then the complexity is well worth it.
Finally if your product needs to keep track of an enormous number of doodads, then you’ll likely have no choice other than to keep track of when EEPROM cells go bad. This is typically done by assigning another area of EEPROM that has  DOODAD_ARRAY_SIZE bits – e.g._eeprom uint8_t bad_cell[DOODAD_ARRAY_SIZE / sizeof(uint8_t)]. These bits are erased to 1. Once you detect that a write has failed at a certain cell in doodad_cntr[], then you change the corresponding bit in bad_cell[] from ‘1’ to a ‘0’ and the cell is considered bad for all time.  Obviously you then have to interrogate the bad_cell[] array to determine whether the code should use a specific cell.

Hamming distances

June 5th, 2017 by Nigel Jones

My guess is that most readers of this blog have at least a vague idea of what “Hamming distance” is.  At the most abstract level it is a measure of how different two equal length “strings” are. In the case where the “strings” are actually binary integers, then the the Hamming distance is simply the number of bit positions that differ. Thus if we are comparing 8-bit integers, then the values 0x00 and 0x01 have a Hamming distance of 1, whereas 0xFF and 0x00 have a Hamming distance of 8.

So what you ask? Well over the years I’ve noticed a recurring “problem” in the industry. The scenario is the following. A design calls for some sort of distributed processing architecture, often with a master processor and a number of slaves. These slaves communicate with the master over some sort of serial bus, and each slave has a unique address. In just about every application I have ever seen, the slave addresses are assigned sequential addresses starting at 1 (0 is usually reserved for the master or for broadcasts). So what’s wrong with this you ask? Well, if you do this you are increasing the probability that the wrong processor will be addressed. For example if two processors have addresses of 0x02 and 0x03, then a single bit flip in the LSB will cause the wrong processor to be addressed. Conversely if the two addresses are e.g. 0x55 and 0xAA then you’ll need 8 bit flips to cause the wrong processor to be addressed. In short, you can minimize the probability of incorrect addressing by maximizing the Hamming distance of the addresses.

Now before you rush to the comment section to tell me that you always protect your address bytes with a block check character or a CRC, go and read this article I wrote a few years back (or more to the point read the referenced article by Phil Koopman). Unless you have chosen the optimal CRC, the chances are you aren’t getting the level of protection you think you are. Even if you are using a good CRC, employing the concept of maximizing the Hamming distance essentially comes for free and so why not use it?

I’ll assume for the sake of argument that I’ve convinced you that this is a good idea. Now if you have just two processors on the bus, then it’s easy to choose addresses that give a Hamming distance of 8. For example if the first processor has an address of A, then just assign the second processor an address of A XOR 0xFF. E.g. 0x55 and 0xAA.

However, what happens when you have three processors on the bus? In this case you want to maximize the Hamming distance between all three. The questions now become:

  1. What is the maximum Hamming distance obtainable when choosing three unique values out of an 8 bit byte?
  2. What are example values I can use?

Now I’m sure that some mathematician (Hamming?) has worked out a generalized solution to this problem. In my case I wrote a quick program to explore the solution space and discovered that the maximum Hamming distance achievable between three 8-bit values is 5. An example is {0xFF, 0xE0, 0x1C}. To see that this is the case: 0xFF XOR 0xE0 = 0x1F (1.e. five ones); 0xFF XOR 0x1C = 0xE3 (i.e. five ones); 0xE0 XOR 0x1C = 0xFC (i.e. 6 ones).  Note that the fact that the last pair yields a Hamming distance of 6 is nice, but does not alter the fact that the minimum of the maximum Hamming distances is 5.  FWIW my program tells me that there are something like 143,360 different combinations that will yield a Hamming distance of 5 for this particular set of requirements.

What about when you have four devices on the bus? In this case the minimum of the maximum achievable Hamming distance drops to 4. An example is {0xFF, 0xF0, 0xCC, 0x0F}.  For five devices the minimum of the maximum Hamming distance is also 4. An example is {0xFF, 0xF0, 0xCC, 0xC3, 0xAA}

So to summarize. With an 8 bit address field, the maximum Hamming distance achievable by device count is:

  • Two devices. Maximum Hamming distance = 8.
  • Three devices. Maximum Hamming distance = 5.
  • Four devices. Maximum Hamming distance = 4.
  • Five devices. Maximum Hamming distance = 4.

Beyond 5 devices I’ll need a better algorithm than what I’m using to find a solution. However given that in most of my work I rarely go beyond 4 devices I’ve never felt the need to put the effort in to find out. If anyone out there has a closed form solution or an efficient algorithm for determining any arbitrary solution (e.g. I have 13 devices sitting on a CANBus which uses 11-bit identifiers.  What addresses should I assign to maximize the Hamming distance), then I’d be very interested in hearing about it.

Note that I have concentrated here on address bytes. The same principle applies to any other parameter over which you have control. Typical examples are command bytes and also enumerated values. In short, if a typical message that you send looks something like this: <STX> 01 00 00 02 00 [BCC] <ETX>, then try applying the Hamming distance principles such that a typical message looks more like <STX> FC 73 00 42 D9 [BCC] <ETX>. Not only will you get more protection, you’ll also find that the messages are easier to interpret on a protocol analyzer since you aren’t dealing with a sea of zeros and ones.

Finally, as in most things in engineering, there is another way of looking at the problem. See this article I wrote a few years back.

Peak detection of a time series

September 18th, 2015 by Nigel Jones

I’ve been doing embedded work for so long now that it’s rare that I come across a need that I haven’t run into before. Well, it happened the other day, so I thought I’d share it with you.

Here’s the situation. I have a transducer whose role is to determine a binary condition (essentially X is present / absent).  The transducer is AC coupled and pass band filtered such that what I’m looking for is the presence of a noisy AC signal. My first inclination was to take the pass-band filtered signal, rectify it and average it over time. While this worked reasonably well, the delta between X being present / absent was not as large as I’d have liked. Accordingly I investigated other metrics. Somewhat to my surprise I found that the peak signal values (i.e peak X present : peak X absent) gave a far better ratio. Thus I found myself needing to find the peak (i.e. maximum value) of a moving time series of data values.

To prove the concept, I used a brute force approach. That is, every time I received a new reading I’d store it in a buffer, the size of which was determined by the amount of time I wished to look back over. I’d then search the entire buffer looking for the maximum value. This approach of course worked fine – but it was hopelessly inefficient. Given that I am receiving a new value to process every few milliseconds, this inefficiency is unacceptable in the real product.  Accordingly, I needed a better algorithm.

My first port of call was of course the internet. If you do a search for peak detector algorithms then you’ll find a plethora of algorithms. However they are all way too sophisticated for what I need, as they are aimed at finding *all* the local maxima within an N-Dimensional array (apparently an important problem in physics, image processing etc).  I just want to know what’s the maximum value in a buffer of values – and I want to know it as efficiently as possible.

After pondering the problem for awhile, my thoughts first turned to an efficient median filtering algorithm. I discussed median filtering here. My rationale was quite simple. The efficient median filtering algorithm inherently keeps the time series data sorted and so I should be able to easily adapt it to return a maximum value rather than a median value. Well it turned out to be quite trivial. Indeed it’s so trivial that I wonder why I haven’t modified it before. Here’s the code (note, as explained in my original posting on median filtering, this algorithm and code is mostly courtesy of Phil Ekstrom):

#define STOPPER 0                                /* Smaller than any datum */
#define MEDIAN_FILTER_SIZE    (31)

void median_filter(uint16_t datum, uint16_t *calc_median, uint16_t *calc_peak)
    struct pair
        struct pair   *point;        /* Pointers forming list linked in sorted order */
        uint16_t  value;             /* Values to sort */
    static struct pair buffer[MEDIAN_FILTER_SIZE] = {0}; /* Buffer of nwidth pairs */
    static struct pair *datpoint = buffer;   /* Pointer into circular buffer of data */
    static struct pair small = {NULL, STOPPER};          /* Chain stopper */
    static struct pair big = {&small, 0}; /* Pointer to head (largest) of linked list.*/
    struct pair *successor;         /* Pointer to successor of replaced data item */
    struct pair *scan;              /* Pointer used to scan down the sorted list */
    struct pair *scanold;           /* Previous value of scan */
    struct pair *median;            /* Pointer to median */
    uint16_t i;
    if (datum == STOPPER)
        datum = STOPPER + 1;                             /* No stoppers allowed. */
    if ( (++datpoint - buffer) >= MEDIAN_FILTER_SIZE)
        datpoint = buffer;                  /* Increment and wrap data in pointer.*/
    datpoint->value = datum;                /* Copy in new datum */
    successor = datpoint->point;            /* Save pointer to old value's successor */
    median = &big;                          /* Median initially to first in chain */
    scanold = NULL;                         /* Scanold initially null. */
    scan = &big;              /* Points to pointer to first (largest) datum in chain */
    /* Handle chain-out of first item in chain as special case */
    if (scan->point == datpoint)
        scan->point = successor;
    scanold = scan;                                     /* Save this pointer and   */
    scan = scan->point ;                                /* step down chain */
    /* Loop through the chain, normal loop exit via break. */
    for (i = 0 ; i < MEDIAN_FILTER_SIZE; ++i)
        /* Handle odd-numbered item in chain  */
        if (scan->point == datpoint)
            scan->point = successor;                      /* Chain out the old datum.*/
        if (scan->value < datum)         /* If datum is larger than scanned value,*/
            datpoint->point = scanold->point;             /* Chain it in here.  */
            scanold->point = datpoint;                    /* Mark it chained in. */
            datum = STOPPER;
        /* Step median pointer down chain after doing odd-numbered element */
        median = median->point;                       /* Step median pointer.  */
        if (scan == &small)
            break;                                      /* Break at end of chain  */
        scanold = scan;                               /* Save this pointer and   */
        scan = scan->point;                           /* step down chain */
        /* Handle even-numbered item in chain.  */
        if (scan->point == datpoint)
            scan->point = successor;
        if (scan->value < datum)
            datpoint->point = scanold->point;
            scanold->point = datpoint;
            datum = STOPPER;
        if (scan == &small)
        scanold = scan;
        scan = scan->point;
    *calc_median = median->value;
    *calc_peak = big.point->value;

To use this code, simply set the MEDIAN_FILTER_SIZE to your desired value, and then call median_filter every time you have a new value to process. The advantage of this code is that it gives you two parameters – the median and the maximum.
However, when I bench marked it against the brute force approach I discovered that the brute force algorithm was actually faster – by a reasonable amount. After reflecting upon this for awhile I realized that the median filtering approach was of course keeping all of the data sorted – which was way more than I needed for a simple peak detector. Thus it was time for another approach.

After thinking about it for awhile, it was clear that the only datum I cared about in my buffer was the current maximum value. Thus upon receipt of a new value, I essentially had to decide:

  1. Is this new value >= current maximum value? If it is then it’s the new maximum value. Note that I really do mean >= and not >. The reason is explained below.
  2. Otherwise, is this new value about to overwrite the existing maximum value? If so, overwrite it and use a brute force search to find the new maximum value.

In order to make the algorithm as efficient as possible we need to minimize the number of times a brute force search is required. Now if your buffer contains only one incidence of the maximum value, then you’ll simply have to do the brute force search when you replace the maximum value. However, if your buffer contains multiple copies of the maximum value, then you’ll minimize the number of brute force searches by declaring the operative maximum value as the one furthest away from where we will write to in the buffer next. It’s for this reason that we use a >= comparison in step 1.

In a similar vein, if we have to do a brute force search then if there are multiple copies of the maximum value then once again we want to choose the maximum value furthest away from where we will write to in the buffer next.

Anyway, here’s the code

#define WINDOW_SIZE 31

uint16_t efficient_peak_detector(uint16_t value)
    static uint16_t wbuffer[WINDOW_SIZE] = {0U};
    static uint16_t wr_idx = 0U;
    static uint16_t peak_value = 0U;
    static uint16_t peak_value_idx = 0U;
    if (value >= peak_value)
        peak_value = value;            /* New peak value, so record it */
        peak_value_idx = wr_idx;
        wbuffer[wr_idx] = value;
        wbuffer[wr_idx] = value;        /* Not a new peak value, so just store it */
        if (wr_idx == peak_value_idx)    /* Have we over written the peak value ? */
            /*  Yes, so we need to do a brute force search to find the new
                maximum. Note that for efficiency reasons, if there are multiple
                values of the new peak value, then we want to chose the one
                whose index value is as far away as possible from the current index */
            uint16_t idx;
            uint16_t cnt;
            for (cnt = 0U, idx = wr_idx, peak_value = 0U; cnt < WINDOW_SIZE; ++cnt)
                if (wbuffer[idx] >= peak_value)
                    peak_value = wbuffer[idx];    /* Record new peak */
                    peak_value_idx = idx;
                if (++idx >= WINDOW_SIZE)
                    idx = 0;
    if (++wr_idx >= WINDOW_SIZE)
        wr_idx = 0;
    return peak_value;

Obviously, if you make your buffer size a power of two then you can optimize the buffer wrapping code. There are a couple of other minor optimizations that can be made. For example on eight bit processors, making the various indices and counters  (wr_idx, peak_value_idx, idx, cnt) 8 bits will speed things up a lot. Similarly on a 32 bit processor you will likely gain a bit by using 32 bit values.

Notwithstanding the aforementioned optimizations, this code is on average considerably faster than the brute force approach and not much larger. Its main limitation is that its run time is highly variable depending upon whether we have to determine a new maximum value using the brute force approach or not.

Clearly it’s trivial to modify this code to perform a ‘trough detector’ (aka minimum detector), or indeed have it do both functions.

<I’ve noticed that comments are shown as closed. I’m not sure why this is, but am working with the system administrator to get it fixed. Apologies>

Boeing Dreamliner ‘Bug’

May 1st, 2015 by Nigel Jones

There’s an all too familiar story in the press today. The headline at the Guardian reads “US aviation authority: Boeing 787 bug could cause ‘loss of control’. As usual with these kinds of stories, it’s light on technical details other than to reveal that the Dreamliner’s generators will fall into a fail safe mode if kept continuously powered for 248 days. In this fail-safe mode, the generator doesn’t apparently generate power. Thus if all four of the planes generators were powered on at the same time and kept powered for 248 days, then suddenly – no power. That’s what I’d call an unfortunate state of affairs if you were at 40,000 feet over the Atlantic.

So what’s special about 248 days? Well 248 days = 248 * 24 * 3600 = 21427200 seconds. Hmm that number looks familiar. Sure Enough, 2^31 / 100 ~= 21427200. From this I can deduce the following.

Boeing’s generators likely contain a signed 32 bit timer tick counter that is being incremented every 10ms (or possibly an unsigned 32 bit counter being incremented every 5ms – but that would be unusual). On the 248th day after start up, this counter overflows. What happens next is unclear, but Boeing must have some sort of error detection in place to detect that something bad has happened – and thus takes what on the face of it is a reasonable action and shuts down the generator.

However, what has really happened here is that Boeing has fallen into the trap of assuming that redundancy (i.e. four generators) leads to increased reliability. In this line of thinking, if the probability of a system failing is q, then the probability of all four systems failing is q*q*q*q – a number that is multiple orders of magnitude smaller than the probability of just one system failing. For example if q is 0.001, then q^4 is 1,000,000,000 times smaller. At this point, the probability of a complete system failure is essentially zero. However, this is only the case if the systems are statistically independent. If they are not, then you can have as many redundant systems as you like, and the probability of failure will be stuck at q. Or to put it another way, you may as well eschew having any redundancy at all.

Now in the days of purely electro-mechanical systems, you could be excused for arguing that there’s a reasonable amount of statistical independence between redundant systems. However, once computer control comes into the mix, the degree of statistical dependence skyrockets (if you’d pardon the pun). By having four generators presumably running the same software, then Boeing made itself  completely vulnerable to this kind of failure.

Anyway, I gave a talk on this very topic on life support systems at a conference a few years ago – so if you have half an hour to waste have a watch. If you’d rather read something that’s a bit more coherent than the presentation, then the proceedings are available here. My paper starts at page 193.

The bottom line is this. If you are trying to design highly reliable systems and redundancy is an important part of your safety case, then you’d damn well better give a lot of thought to common mode failures of the software system. In this case, Boeing clearly had not, resulting in a supposedly fail-proof system that actually has a trivially simple catastrophic failure mode.



Freescale customer service

March 10th, 2015 by Nigel Jones

I have to admit to having a soft spot for Freescale microprocessors. The first micro I ever used was a Motorola 6809 and for the first few years of my career I worked exclusively on 6800’s, 68HC11’s and 68000 processors.  Times changed and I largely moved away from the product range, although I did return periodically as projects dictated. Well such a project has recently come up. The project requires me to make some modifications to an existing code base and as is often the case, the original compiler and its license file have been lost to the winds of time. Accordingly, I downloaded an evaluation copy from Freescale’s web site and got to work.  After convincing myself that there were no significant issues with moving to the latest compiler version, it was time to purchase a license. And as the joke goes, that’s when the trouble started…

Freescale offers various versions of the compiler, and in addition offers various optional software components that can be purchased. Trying to work out which components I needed to purchase was incredibly hard. Anyway, after considerable time, I came up with what I thought was needed and had my client purchase the requisite licenses. Downloading and installing the licenses was ridiculously complicated (as in it took about an hour to wade through all the documents), but I eventually got there. I then invoked CodeWarrior and got a wonderfully obtuse licensing error message that seemed to be saying I needed to purchase an additional component. However the component wasn’t for sale on Freescale’s website…

Accordingly I called customer support. Here’s the gist of the conversation:

Freescale: This is is unusual. It shouldn’t do that.

Me: OK.

Freescale: We don’t offer support for licensing issues over the phone. You’ll have to send an email to technical support detailing the problem.

Me: OK. How long is the response time?

Freescale: 48 – 72 hours.

Me: Do I have this right. Your product that I’ve paid for doesn’t work as advertised, you don’t offer telephone support for licensing issues, you require me to send you an email and it will then take you up to 72 hours to get me an answer?

Freescale: Yes.

I’m not sure what planet Freescale resides on, but this level of service simply doesn’t comport with what’s needed in the embedded space today. I think I understand now why I see so few Freescale designs. Is my experience unusual or is this the norm for Freescale today?