embedded software boot camp

Median Filter Performance Results

November 9th, 2010 by Nigel Jones

In my earlier post on median filtering I made the claim that for filter sizes of 3, 5 or 7 that using a simple insertion sort is ‘better’ than using Phil Ekstrom’s technique.  It occurred to me that this claim was based upon my testing with 8 bit processors quite a few years ago, and that the results might not be valid for 32 bit processors with their superior pointer manipulation.  Accordingly I ran some bench marks comparing an insertion sort based approach with Ekstrom’s method.

The procedure was as follows:

  1. I generated an array of random integers on the interval 900 – 1000. The idea is that these would represent data from a typical 10 bit ADC found on many microcontrollers.
  2. I then put together a base line project which performed all the basic house keeping functions, but without performing any filtering. The idea was to try and get a feel for the non-algorithm specific overhead.
  3. I then put together a project which median filtered using an insertion sort, for sizes, 3, 5, 7, 9, 11, and 13. Note that I elected to take a copy of the data prior to sorting. See this comment thread for a discussion of whether this is necessary or not.
  4. I put together another project which median filtered using Ekstrom’s method.
  5. I compiled the above for an ARM Cortex M3 target using an IAR compiler with full speed optimization.

The results were a clear win for Ekstrom. His code size was 132 bytes versus 224. His code was 5%, 32%, 61%, 89%,113% and 146% faster than the insertion sort for filters sizes of 3, 5, 7, 9, 11 and 13 respectively. To be fair to the insertion sort technique, I have made no effort to optimize it. Notwithstanding this, I think I can say that for 32 bit targets, you may as well just use Ekstrom’s approach for all filter sizes.

I’ll endeavor to update this post with results for a 16 bit target (MSP430) in the next few days.

Well I finally got around to running the tests on an MSP430 target. In this case Ekstrom’s method produced a larger code size (186 bytes versus 160). Much to my surprise, Ekstrom’s method was dramatically superior to the insertion sort approach, with speeds of 69% faster for a filter size of 3, going up to a whopping 250% faster with a filter size of 13.  The bottom line: I think my original claim is bunk. Use Ekstrom’s method by default!

DigiView Logic Analyzer

October 6th, 2010 by Nigel Jones

Today is one of those rare days on which I recommend a product. I only do this when I find a product that has genuinely made my life easier, and which by extension I think will also make your life easier. The product in question is a  DigiView logic analyzer. Now the fact that logic analyzers are useful tools should not be news to you. Indeed if you have been in this business long enough you will no doubt remember the bad old days of debugging code by decoding execution traces on a logic analyzer. That being said, I almost stopped using logic analyzers because they were big, expensive, difficult to set up and highly oriented towards bus-based systems. Given that I had my own consulting company with limited cash, limited space and a propensity to work on non-bus based systems (i.e. single chip microcontrollers), it’s hardly surprising that a logic analyzer wasn’t part of my toolbox.

This state of affairs persisted for a number of years until I obtained via a convoluted route a DigiView DV1-100. This is a USB powered, hand-sized box, with 18 channels at 100 MHz. It’s successor (The DV3100) sells for $499. The device sat on my shelf for a while until I decided to give it a spin one day. Since then I have found it to be an indispensable tool. Interestingly I find I use it the most when implementing the myriad of synchronous protocols that seem to exist on peripheral ICs today. While it is of course very useful for getting the interfaces working, I also find it extremely useful in fine tuning the interfaces. Via the use of the logic analyzer I can really examine set-up and hold times, clock frequencies, transmission latencies and so on. Doing so has allowed me to dramatically improve the performance of these interfaces in many cases. Indeed, I have had such success in this area that I now routinely hook the analyzer up, even when the interface works first time. If nothing else it gives me a nice warm fuzzy feeling that the interface is working the way it was designed – and not by luck.

Another area where I find it very useful is when I need to reverse engineer a product. I do this a lot as part of my expert witness work – and it is really quite remarkable how much you can learn from looking at a logic analyzer trace.

Anyway, the bottom line is this. $499 gets you an 18 channel 100 MHz personal logic analyzer that can handle most of the circuitry most of us see on a daily basis. If you value your time at all, then the DigiView will pay for itself the first time you use it. Go hassle your boss to get one.

Median filtering

October 2nd, 2010 by Nigel Jones

NOTE: I have heavily edited this blog post since it was originally published, based on some recent testing

If your engineering education was anything like mine then I’m sure that you learned all about different types of linear filters whose essential objective was  to pass signals within a certain frequency band and to reject as far as possible all others. These filters are of course indispensable for many types of ‘noise’. However in the real world of embedded systems it doesn’t take one too long to realize that these classical linear filters are useless against  burst noise. This kind of noise typically arises from a quasi-random event. For example a 2-way radio may be keyed next to your product or an ESD event may occur close to your signal. Whenever this happens your input signal may transiently go to a ridiculous value. For example I have often seen A2D readings that look something like this: 385, 389, 388, 388, 912, 388, 387. The 912 value is presumably anomalous and as such should be rejected. If you try and use a classical linear filter then you will almost certainly find that the 912 reading actually ends up having a significant impact on the output. The ‘obvious’ answer in this case is to use a median filter. Despite the supposed obviousness of this, it’s my experience that median filters are used remarkably infrequently in embedded systems. I don’t know why this is, but my guess is that it is a combination of a lack of knowledge of their existence, coupled with difficulty of implementation. Hopefully this post will go some way to rectifying both issues.

As its name suggests, a median filter is one which takes the middle of a group of readings. It’s normal for the group to have an odd number of members such that there is no ambiguity about the middle value.  Thus the general idea is that one buffers a certain number of readings and takes the middle reading.

Now Until recently I recognized three classes of median filter, based purely on the size of the filter. They were:

  • Filter size of 3 (i.e. the smallest possible).
  • Filter size of 5, 7 or 9 (the most common).
  • Filter size of 11 or more.

However, I now espouse a simple dichotomy

  • Filter size of 3
  • Filter size > 3

Filter size of 3

The filter size of three is of course the smallest possible filter. It’s possible to find the middle value simply via a few if statements. The code below is based on an algorithm described here. Clearly this is small and fast code.

uint16_t middle_of_3(uint16_t a, uint16_t b, uint16_t c)
{
 uint16_t middle;

 if ((a <= b) && (a <= c))
 {
   middle = (b <= c) ? b : c;
 }
 else if ((b <= a) && (b <= c))
 {
   middle = (a <= c) ? a : c;
 }
 else
 {
   middle = (a <= b) ? a : b;
 }
 return middle;
}

Filter size > 3

For filter sizes greater than 3 I suggest you turn to an algorithm described by Phil Ekstrom in the November 2000 edition of Embedded Systems Programming magazine. With the recent hatchet job on embedded.com I can’t find the original article. However there is a copy here. Ekstrom’s approach is to use a linked list. The approach works essentially by observing that once an array is sorted, the act of removing the oldest value and inserting the newest value doesn’t result in the array being significantly unsorted. As a result his approach works well – particularly for large filter sizes.

Be warned that there are some bugs in the originally published code (which Ekstrom corrected). However given the difficulty of finding anything on embedded.com nowadays I have opted to publish my implementation of his code. Be warned that the code below was originally written in Dynamic C and has been ported to standard C for this blog posting. It is believed to work. However it would behoove you to check it thoroughly before use!

#define STOPPER 0                                      /* Smaller than any datum */
#define    MEDIAN_FILTER_SIZE    (13)

uint16_t median_filter(uint16_t datum)
{
 struct pair
 {
   struct pair   *point;                              /* Pointers forming list linked in sorted order */
   uint16_t  value;                                   /* Values to sort */
 };
 static struct pair buffer[MEDIAN_FILTER_SIZE] = {0}; /* Buffer of nwidth pairs */
 static struct pair *datpoint = buffer;               /* Pointer into circular buffer of data */
 static struct pair small = {NULL, STOPPER};          /* Chain stopper */
 static struct pair big = {&small, 0};                /* Pointer to head (largest) of linked list.*/

 struct pair *successor;                              /* Pointer to successor of replaced data item */
 struct pair *scan;                                   /* Pointer used to scan down the sorted list */
 struct pair *scanold;                                /* Previous value of scan */
 struct pair *median;                                 /* Pointer to median */
 uint16_t i;

 if (datum == STOPPER)
 {
   datum = STOPPER + 1;                             /* No stoppers allowed. */
 }

 if ( (++datpoint - buffer) >= MEDIAN_FILTER_SIZE)
 {
   datpoint = buffer;                               /* Increment and wrap data in pointer.*/
 }

 datpoint->value = datum;                           /* Copy in new datum */
 successor = datpoint->point;                       /* Save pointer to old value's successor */
 median = &big;                                     /* Median initially to first in chain */
 scanold = NULL;                                    /* Scanold initially null. */
 scan = &big;                                       /* Points to pointer to first (largest) datum in chain */

 /* Handle chain-out of first item in chain as special case */
 if (scan->point == datpoint)
 {
   scan->point = successor;
 }
 scanold = scan;                                     /* Save this pointer and   */
 scan = scan->point ;                                /* step down chain */

 /* Loop through the chain, normal loop exit via break. */
 for (i = 0 ; i < MEDIAN_FILTER_SIZE; ++i)
 {
   /* Handle odd-numbered item in chain  */
   if (scan->point == datpoint)
   {
     scan->point = successor;                      /* Chain out the old datum.*/
   }

   if (scan->value < datum)                        /* If datum is larger than scanned value,*/
   {
     datpoint->point = scanold->point;             /* Chain it in here.  */
     scanold->point = datpoint;                    /* Mark it chained in. */
     datum = STOPPER;
   };

   /* Step median pointer down chain after doing odd-numbered element */
   median = median->point;                       /* Step median pointer.  */
   if (scan == &small)
   {
     break;                                      /* Break at end of chain  */
   }
   scanold = scan;                               /* Save this pointer and   */
   scan = scan->point;                           /* step down chain */

   /* Handle even-numbered item in chain.  */
   if (scan->point == datpoint)
   {
     scan->point = successor;
   }

   if (scan->value < datum)
   {
     datpoint->point = scanold->point;
     scanold->point = datpoint;
     datum = STOPPER;
   }

   if (scan == &small)
   {
     break;
   }

   scanold = scan;
   scan = scan->point;
 }
 return median->value;
}

To use this code, simply call the function every time you have a new input value. It will return the median of the last MEDIAN_FILTER_SIZE readings. This approach can consume a fair amount of RAM as one has to store both the values and the pointers. However if this isn’t a problem for you then it really is a nice algorithm that deserves to be in your tool box as it is dramatically faster than algorithms based upon sorting.

Median filtering based on sorting

In the original version of this article I espoused using a sorting based approach to median filtering when the filter size was 5, 7 or 9. I no longer subscribe to this belief. However for those of you that want to do it, here’s the basic outline:

 if (ADC_Buffer_Full)
 {
   uint_fast16_t adc_copy[MEDIAN_FILTER_SIZE];
   uint_fast16_t filtered_cnts;

   /* Copy the data */
   memcpy(adc_copy, ADC_Counts, sizeof(adc_copy));
   /* Sort it */
   shell_sort(adc_copy, MEDIAN_FILTER_SIZE);
   /* Take the middle value */
   filtered_cnts = adc_copy[(MEDIAN_FILTER_SIZE - 1U) / 2U];
   /* Convert to engineering units */
   ...
 }

Final Thoughts

Like most things in embedded systems, median filters have certain costs associated with them. Clearly median filters introduce a delay to a step change in value which can be problematic at times. In addition median filters can completely clobber frequency information in the signal. Of course if you are only interested in DC values then this is not a problem. With these caveats I strongly recommend that you consider incorporating median filters in your next embedded design.

A volatile tempest

September 27th, 2010 by Nigel Jones

Regular readers will know that I often comment on the use of volatile in embedded systems. As a result I am occasionally contacted about my opinion on whether a compiler is generating correct code – particularly when hardware is being accessed. Well I was contacted last week by Ratish Punoose who had a classic problem in that his code compiled okay on GCC but not on IAR. He had contacted IAR, who in turn basically said the compiler is correct – and here is the explanation. Ratish turned to me and John Regehr for our opinions. Well John and I came to similar opinions – namely:

  1. Ratish’s code was a bit weird, but not dramatically so.
  2. The explanation from IAR made no sense.
  3. It did indeed appear to be a compiler bug.

Ratish then posted his issue to the Msp430 forum on Yahoo. You can read his post and the responses here.

I’m sure many of you are at this point thinking that IAR is in for another round of bashing from me. Well you’d be wrong. One of the first responders to Ratish’s post was Paul Curtis of Rowley compilers. Paul gives an admirable explanation as to why Ratish’s code is wrong (and by extension so am I). Now I’m sure that IAR and Rowley are fierce competitors, and so Paul is also to be commended for leaping to the defense of IAR.

Furthermore, later in the thread Anders Lindgren of IAR chimes in and adds his detailed and compelling explanation.

Having read the posts from Paul and Anders I think they are right and I’m wrong. So thanks Gentlemen for:

  1. Setting me straight
  2. Proving that in the wonderful world of volatile accesses, there is always something more to learn.

I think there are several other lessons to be learned from this episode. However I think I’ll save them for another post.

#include “includes.h”

September 9th, 2010 by Nigel Jones

I am sure that the title of this blog posting is familiar to most of the readers of this blog, in that you have opened up a C source file and found a single #include statement that references a file that is typically called ‘includes.h’. On opening ‘includes.h’ one invariably finds an enormous list of other header files. Furthermore, as you go through other source files in the project, you will find that they all use ‘includes.h’. I suspect that by this point, the readers of this blog are divided into camps, namely:

  1. Either: So what, I do it all the time because it makes my life a lot easier.
  2. Or: I want to scream whenever I see this done.

I’m one of the screamers – and this is my rationale.

Back in the dark ages when one had to compile on computers with extremely limited resources, the compilation time of a module was a major issue. One of the things that significantly affected compile time was the number and the size of the header files that a module opened. As a result, most of us took steps to ensure that we only included the header files that were needed. However, as processor speeds increased and compilers started using pre-compiled header files, this became less of an issue, such that today I seriously doubt if you’d notice much difference in compilation times regardless of the number of header files that are included. I don’t know but I suspect that this was the enabler that caused people to start using ‘includes.h’.

So if compilation time is no longer an issue, what’s the big deal? After all we have all had the hassle of compiling a file only to be told that we are missing a prototype or a data type. At which point we have to hunt down the requisite header file, include it and recompile. If you do this half a dozen times in a new module, then it takes you say 15 minutes before everything is OK – and who has 15 minutes to waste on such irritating details? Well, in my opinion it’s time well spent. Here’s my case:

Coupling Indication

The number of header files a module needs to use is a crude but effective indicator of coupling. A module that needs to include almost no header files is clearly a module that is extremely self contained. That is it isn’t relying upon the outside world. Modules like this are typically easier to maintain and also more immune from changes made elsewhere in the system. In short I like modules that don’t have to include a lot of header files. Indeed when I have finished writing a module, I take a look at its include list. If the list is long then it really makes me wonder whether I should be breaking the module apart in some way so as to reduce the degree of coupling between it and the outside world.

Maintenance – understanding coupling

This is related to the first point. If I need to do some maintenance on a module, then a quick look at the include list can tell me how this module interacts with the rest of the code. This can be extremely useful if one is trying to understand how a program is put together.

Maintenance – understanding functionality

If I look at the include list and I see ‘math.h’, then I know that the module is using transcendental functions, which in turn implies complex floating point operations, which in turn implies potentially long execution times. In a similar manner, if it includes the header for the hardware interrupt handler, then I know I’m dealing with something related to the chip. I can get all this sort of information in a two second scan of the include list.

Documentation

If you use an automated documentation tool such as Doxygen, then only including the header files that are needed by a module ensures that Doxygen generates a meaningful documentation set for you, rather than including hyperlinks to useless files.

Not getting what you want

I have left what is probably the biggest problem to last. By including an enormous number of header files you lay yourself wide open to problems like this:

Header1.h

#define FALSE 0
#define TRUE  !FALSE

Header17.h

#ifndef FALSE
#define FALSE 0UL
#define TRUE  1UL
#endif

Header26.h

#pragma(IGNORE_REDEFINITION_OF_MACROS)
#define FALSE NULL
#define TRUE !FALSE
#pragma(ERROR_ON_REDEFINITION_OF_MACROS)

Trust me when I tell you I have seen this done! In other words the more files you include, the more likely it is that the macro that you are blithely using does not in fact have the value you think it does. Time to debug problems such as these – a lot longer than 15 minutes!

Remedial Action

On the off chance that I have convinced an ‘includes.h’ fan of the error of their ways, it would be remiss of me to not tell you how to quickly find out just the header files needed by a module.

  1. Paste the include list of includes.h into the module.
  2. Delete the entry for includes.h
  3. Compile the code to make sure you haven’t broken anything.
  4. Lint the file. Lint will tell you all the header files that aren’t being used.
  5. Delete the unnecessary include statements.
  6. Repeat from step 3 until Lint is happy.

Of course the chances are that if you use ‘includes.h’ you aren’t using Lint. If you do start using Lint then it will do a lot more for you than just telling you about unnecessary includes.