<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stack Overflow &#187; Algorithms</title>
	<atom:link href="http://embeddedgurus.com/stack-overflow/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://embeddedgurus.com/stack-overflow</link>
	<description>Thoughts on embedded systems by Nigel Jones</description>
	<lastBuildDate>Sun, 25 Mar 2012 12:30:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Efficient C Tip #13 &#8211; use the modulus (%) operator with caution</title>
		<link>http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 14:21:40 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Efficient C/C++]]></category>
		<category><![CDATA[General C issues]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/stack-overflow/?p=628</guid>
		<description><![CDATA[This is the thirteenth in a series of tips on writing efficient C for embedded systems.  As the title suggests, if you are interested in writing efficient C, you need to be cautious about using the modulus operator.  Why is this? Well a little thought shows that C = A % B is equivalent to [...]]]></description>
			<content:encoded><![CDATA[<p>This is the thirteenth in a <a href="../2008/06/efficient-c-tips-1-choosing-the-correct-integer-size/">series</a> of tips on writing efficient C for embedded systems.  As the  title suggests, if you are interested in writing efficient C, you need  to be cautious about using the modulus operator.  Why is this? Well a little thought shows that C = A % B is equivalent to C = A &#8211; B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result it&#8217;s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely have to use the modulus operator. However in many cases it&#8217;s possible to restructure the code such that the modulus operator is not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.</p>
<h2>Converting seconds to days, hours, minutes and seconds</h2>
<p>In <a href="http://www.rmbconsulting.us/embedded-systems-design">Embedded Systems Design</a> there is an increasing need for some form of real time  clock. When this is done, the designer typically implements the time as a 32 bit variable containing the number of seconds since a particular  date. When this is done, it&#8217;s not usually long before one has to convert  the &#8216;time&#8217; into days, hours, minutes and seconds. Well I found myself  in just such a situation recently. As a result, I thought a quick  internet search was in order to find the &#8216;best&#8217; way of converting &#8216;time&#8217;  to days, hours, minutes and seconds. The code I <a href="http://techsupt.winbatch.com/TS/T000001012F7.html">found</a> wasn&#8217;t <a href="http://www.daniweb.com/forums/thread23621.html">great</a> and as usual was highly PC centric. I thus sat down to write my own code.</p>
<h3>Attempt #1 &#8211; Using the modulus operator</h3>
<p>My first attempt used the &#8216;obvious&#8217; algorithm and employed the modulus operator. The relevant code fragment appears below.</p>
<pre>void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 seconds = time % 60UL;
 time /= 60UL;
 minutes = time % 60UL;
 time /= 60UL;
 hours = time % 24UL;
 time /= 24UL;
 days = time;  
}</pre>
<p>This approach has a nice looking symmetry to it.  However, it contained three divisions and three modulus operations. I thus was rather concerned about its performance and so I measured its speed for three different architectures &#8211; AVR (8 bit), MSP430 (16 bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted are for 10 invocations of the test code and include the test harness overhead:</p>
<p>AVR:  29,825 cycles</p>
<p>MSP430: 27,019 cycles</p>
<p>ARM Cortex: 390 cycles</p>
<p>No that isn&#8217;t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my claim that the modulus operator can be very inefficient is true for some architectures &#8211; but not all.  Thus if you are using the modulus operator on an ARM processor then it&#8217;s probably not worth worrying about. However if you are working on smaller processors then clearly something needs to be done  &#8211; and so I investigated some alternatives.</p>
<h3>Attempt #2 &#8211; Replace the modulus operator</h3>
<p>As mentioned in the introduction,  C = A % B is equivalent to C = A &#8211; B * (A / B). If we compare this to the code in attempt 1, then it should be apparent that the intermediate value (A/B) computed as part of the modulus operation is in fact needed in the next line of code. Thus this suggests a simple optimization to the algorithm.</p>
<pre>void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 days = time / (24UL * 3600UL);    
 time -= days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = time / 3600UL;
 time -= (hours * 3600UL);
 /* time now contains the number of seconds in the last hour */
 minutes = time / 60U;
 seconds = time - minutes * 60U;
 }</pre>
<p>In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators in one (- * /).  Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results however were a little surprising:</p>
<p>AVR:  18,720 cycles</p>
<p>MSP430: 14,805 cycles</p>
<p>ARM Cortex: 384 cycles</p>
<p>Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no impact on the ARM code.  Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the ARM results, it&#8217;s clear that at least in this example, it&#8217;s possible to significantly speed up an algorithm by eliminating the modulus operator.</p>
<p>I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.</p>
<h3>Attempt #3 &#8211; Data type size reduction</h3>
<p>My naive implementation of the code looked like this:</p>
<pre>void compute_time(uint32_t time)
{
 uint16_t    days;
 uint8_t     hours, minutes, seconds;
 uint16_t    stime;

 days = (uint16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}</pre>
<p>All I have done is change the data types and to add casts where appropriate. The results were interesting:</p>
<p>AVR:  14,400 cycles</p>
<p>MSP430: 11,457 cycles</p>
<p>ARM Cortex: 434 cycles</p>
<p>Thus while this resulted in a significant improvement for the AVR &amp; MSP430, it resulted in a significant worsening for the ARM. Clearly the ARM doesn&#8217;t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a lot more portable &#8211; and that is to use the <a href="http://embeddedgurus.com/stack-overflow/2008/06/efficient-c-tips-1-choosing-the-correct-integer-size/">C99 fast types</a>. Doing this gives the following code:</p>
<h3>Attempt #4 &#8211; Using the C99 fast data types</h3>
<pre>void display_time(uint32_t time)
{
 uint_fast16_t    days;
 uint_fast8_t    hours, minutes, seconds;
 uint_fast16_t    stime;

 days = (uint_fast16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint_fast8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}</pre>
<p>All I have done is change the data types to the C99 fast types. The results were encouraging:</p>
<p>AVR:  14,400 cycles</p>
<p>MSP430: 11,595 cycles</p>
<p>ARM Cortex: 384 cycles</p>
<p>Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast and portable.</p>
<h3>Conclusion</h3>
<p>Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for further optimizations. As a result with the AVR &amp; MSP430 I was able to more than halve the execution time.</p>
<h2>Converting Integers for Display</h2>
<p>A similar problem (with a similar solution) occurs when one wants to display integers on a display. For example if you are using a custom LCD panel with say a 3 digit numeric field, then the problem arises as to how to determine the value of each digit. The obvious way, using the modulus operator is as follows:</p>
<pre>void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value &gt; 999)
 {
 value = 999;
 }

 lsd = value % 10;
 value /= 10;
 nsd = value % 10;
 value /= 10;
 msd = value;

 /* Now display the digits */
}</pre>
<p>However, using the technique espoused above, we can rewrite this much more efficiently as:</p>
<pre>void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value &gt; 999U)
 {
  value = 999U;
 }

 msd = value / 100U;
 value -= msd * 100U;

 nsd = value / 10U;
 value -= nsd * 10U;

 lsd = value;

 /* Now display the digits */
}</pre>
<p>If you benchmark this you should find it considerably faster than the modulus based approach.</p>
<p><a href="http://embeddedgurus.com/stack-overflow/2010/04/efficient-c-tip-12-be-wary-of-switch-statements/">Previous Tip</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/feed/</wfw:commentRss>
		<slash:comments>47</slash:comments>
		</item>
		<item>
		<title>Median Filter Performance Results</title>
		<link>http://embeddedgurus.com/stack-overflow/2010/11/median-filter-performance-results/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2010/11/median-filter-performance-results/#comments</comments>
		<pubDate>Wed, 10 Nov 2010 01:59:46 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Median filter]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/stack-overflow/?p=567</guid>
		<description><![CDATA[In my earlier post on median filtering I made the claim that for filter sizes of 3, 5 or 7 that using a simple insertion sort is &#8216;better&#8217; than using Phil Ekstrom&#8217;s technique.  It occurred to me that this claim was based upon my testing with 8 bit processors quite a few years ago, and [...]]]></description>
			<content:encoded><![CDATA[<p>In my earlier <a href="http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/">post </a>on median filtering I made the claim that for filter sizes of 3, 5 or 7 that using a simple insertion sort is &#8216;better&#8217; than using Phil Ekstrom&#8217;s technique.  It occurred to me that this claim was based upon my testing with 8 bit processors quite a few years ago, and that the results might not be valid for 32 bit processors with their superior pointer manipulation.  Accordingly I ran some bench marks comparing an insertion sort based approach with Ekstrom&#8217;s method.</p>
<p>The procedure was as follows:</p>
<ol>
<li>I generated an array of random integers on the interval 900 &#8211; 1000. The idea is that these would represent data from a typical 10 bit ADC found on many microcontrollers.</li>
<li>I then put together a base line project which performed all the basic house keeping functions, but without performing any filtering. The idea was to try and get a feel for the non-algorithm specific overhead.</li>
<li>I then put together a project which median filtered using an insertion sort, for sizes, 3, 5, 7, 9, 11, and 13. Note that I elected to take a copy of the data prior to sorting. See this <a href="http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/#comment-2232">comment thread</a> for a discussion of whether this is necessary or not.</li>
<li>I put together another project which median filtered using Ekstrom&#8217;s method.</li>
<li>I compiled the above for an ARM Cortex M3 target using an IAR compiler with full speed optimization.</li>
</ol>
<p>The results were a clear win for Ekstrom. His code size was 132 bytes versus 224. His code was 5%, 32%, 61%, 89%,113% and 146% faster than the insertion sort for filters sizes of 3, 5, 7, 9, 11 and 13 respectively. To be fair to the insertion sort technique, I have made no effort to optimize it. Notwithstanding this, I think I can say that for 32 bit targets, you may as well just use Ekstrom&#8217;s approach for all filter sizes.</p>
<p>I&#8217;ll endeavor to update this post with results for a 16 bit target (MSP430) in the next few days.</p>
<p>Well I finally got around to running the tests on an MSP430 target. In this case Ekstrom&#8217;s method produced a larger code size (186 bytes versus 160). Much to my surprise, Ekstrom&#8217;s method was dramatically superior to the insertion sort approach, with speeds of 69% faster for a filter size of 3, going up to a whopping 250% faster with a filter size of 13.  The bottom line: I think my original claim is bunk. Use Ekstrom&#8217;s method by default!</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2010/11/median-filter-performance-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Median filtering</title>
		<link>http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 13:53:26 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Median filter]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/stack-overflow/?p=543</guid>
		<description><![CDATA[NOTE: I have heavily edited this blog post since it was originally published, based on some recent testing If your engineering education was anything like mine then I&#8217;m sure that you learned all about different types of linear filters whose essential objective was  to pass signals within a certain frequency band and to reject as [...]]]></description>
			<content:encoded><![CDATA[<p>NOTE: I have heavily edited this blog post since it was originally published, based on <a href="http://embeddedgurus.com/stack-overflow/2010/11/median-filter-performance-results/">some recent testing</a></p>
<p>If your engineering education was anything like mine then I&#8217;m sure that you learned all about different types of linear filters whose essential objective was  to pass signals within a certain frequency band and to reject as far as possible all others. These filters are of course indispensable for many types of &#8216;noise&#8217;. However in the real world of embedded systems it doesn&#8217;t take one too long to realize that these classical linear filters are useless against  burst noise. This kind of noise typically arises from a quasi-random event. For example a 2-way radio may be keyed next to your product or an ESD event may occur close to your signal. Whenever this happens your input signal may transiently go to a ridiculous value. For example I have often seen A2D readings that look something like this: 385, 389, 388, 388, 912, 388, 387. The 912 value is presumably anomalous and as such should be rejected. If you try and use a classical linear filter then you will almost certainly find that the 912 reading actually ends up having a significant impact on the output. The &#8216;obvious&#8217; answer in this case is to use a median filter. Despite the supposed obviousness of this, it&#8217;s my experience that median filters are used remarkably infrequently in embedded systems. I don&#8217;t know why this is, but my guess is that it is a combination of a lack of knowledge of their existence, coupled with difficulty of implementation. Hopefully this post will go some way to rectifying both issues.</p>
<p>As its name suggests, a median filter is one which takes the middle of a group of readings. It&#8217;s normal for the group to have an odd number of members such that there is no ambiguity about the middle value.  Thus the general idea is that one buffers a certain number of readings and takes the middle reading.</p>
<p>Now Until recently I recognized three classes of median filter, based purely on the size of the filter. They were:</p>
<ul>
<li>Filter size of 3 (i.e. the smallest possible).</li>
<li>Filter size of 5, 7 or 9 (the most common).</li>
<li>Filter size of 11 or more.</li>
</ul>
<p>However, I now espouse a simple dichotomy</p>
<ul>
<li>Filter size of 3</li>
<li>Filter size &gt; 3</li>
</ul>
<h2>Filter size of 3</h2>
<p>The filter size of three is of course the smallest possible filter. It&#8217;s possible to find the middle value simply via a few if statements. The code below is based on an algorithm described <a href="http://www.cs.mtu.edu/~shene/COURSES/cs201/NOTES/chap03/sort.html">here</a>. Clearly this is small and fast code.</p>
<pre>uint16_t middle_of_3(uint16_t a, uint16_t b, uint16_t c)
{
 uint16_t middle;

 if ((a &lt;= b) &amp;&amp; (a &lt;= c))
 {
   middle = (b &lt;= c) ? b : c;
 }
 else if ((b &lt;= a) &amp;&amp; (b &lt;= c))
 {
   middle = (a &lt;= c) ? a : c;
 }
 else
 {
   middle = (a &lt;= b) ? a : b;
 }
 return middle;
}</pre>
<h2>Filter size &gt; 3</h2>
<p>For filter sizes greater than 3 I suggest you turn to an algorithm described by Phil Ekstrom in the November 2000 edition of Embedded Systems Programming magazine. With the recent hatchet job on embedded.com I can&#8217;t find the original article. However there is a copy <a href="http://www.eetindia.co.in/STATIC/PDF/200011/EEIOL_2000NOV03_EMS_EDA_TA.pdf?SOURCES=DOWNLOAD">here</a>. Ekstrom&#8217;s approach is to use a linked list. The approach works essentially by observing that once an array is sorted, the act of removing the oldest value and inserting the newest value doesn&#8217;t result in the array being significantly unsorted. As a result his approach works well &#8211; particularly for large filter sizes.</p>
<p>Be warned that there are some bugs in the originally published code (which Ekstrom corrected). However given the difficulty of finding anything on embedded.com nowadays I have opted to publish my implementation of his code. Be warned that the code below was originally written in Dynamic C and has been ported to standard C for this blog posting. It is believed to work. However it would behoove you to check it thoroughly before use!</p>
<pre>#define STOPPER 0                                      /* Smaller than any datum */
#define    MEDIAN_FILTER_SIZE    (13)

uint16_t median_filter(uint16_t datum)
{
 struct pair
 {
   struct pair   *point;                              /* Pointers forming list linked in sorted order */
   uint16_t  value;                                   /* Values to sort */
 };
 static struct pair buffer[MEDIAN_FILTER_SIZE] = {0}; /* Buffer of nwidth pairs */
 static struct pair *datpoint = buffer;               /* Pointer into circular buffer of data */
 static struct pair small = {NULL, STOPPER};          /* Chain stopper */
 static struct pair big = {&amp;small, 0};                /* Pointer to head (largest) of linked list.*/

 struct pair *successor;                              /* Pointer to successor of replaced data item */
 struct pair *scan;                                   /* Pointer used to scan down the sorted list */
 struct pair *scanold;                                /* Previous value of scan */
 struct pair *median;                                 /* Pointer to median */
 uint16_t i;

 if (datum == STOPPER)
 {
   datum = STOPPER + 1;                             /* No stoppers allowed. */
 }

 if ( (++datpoint - buffer) &gt;= MEDIAN_FILTER_SIZE)
 {
   datpoint = buffer;                               /* Increment and wrap data in pointer.*/
 }

 datpoint-&gt;value = datum;                           /* Copy in new datum */
 successor = datpoint-&gt;point;                       /* Save pointer to old value's successor */
 median = &amp;big;                                     /* Median initially to first in chain */
 scanold = NULL;                                    /* Scanold initially null. */
 scan = &amp;big;                                       /* Points to pointer to first (largest) datum in chain */

 /* Handle chain-out of first item in chain as special case */
 if (scan-&gt;point == datpoint)
 {
   scan-&gt;point = successor;
 }
 scanold = scan;                                     /* Save this pointer and   */
 scan = scan-&gt;point ;                                /* step down chain */

 /* Loop through the chain, normal loop exit via break. */
 for (i = 0 ; i &lt; MEDIAN_FILTER_SIZE; ++i)
 {
   /* Handle odd-numbered item in chain  */
   if (scan-&gt;point == datpoint)
   {
     scan-&gt;point = successor;                      /* Chain out the old datum.*/
   }

   if (scan-&gt;value &lt; datum)                        /* If datum is larger than scanned value,*/
   {
     datpoint-&gt;point = scanold-&gt;point;             /* Chain it in here.  */
     scanold-&gt;point = datpoint;                    /* Mark it chained in. */
     datum = STOPPER;
   };

   /* Step median pointer down chain after doing odd-numbered element */
   median = median-&gt;point;                       /* Step median pointer.  */
   if (scan == &amp;small)
   {
     break;                                      /* Break at end of chain  */
   }
   scanold = scan;                               /* Save this pointer and   */
   scan = scan-&gt;point;                           /* step down chain */

   /* Handle even-numbered item in chain.  */
   if (scan-&gt;point == datpoint)
   {
     scan-&gt;point = successor;
   }

   if (scan-&gt;value &lt; datum)
   {
     datpoint-&gt;point = scanold-&gt;point;
     scanold-&gt;point = datpoint;
     datum = STOPPER;
   }

   if (scan == &amp;small)
   {
     break;
   }

   scanold = scan;
   scan = scan-&gt;point;
 }
 return median-&gt;value;
}
</pre>
<p>To use this code, simply call the function every time you have a new input value. It will return the median of the last MEDIAN_FILTER_SIZE readings. This approach can consume a fair amount of RAM as one has to store both the values and the pointers. However if this isn&#8217;t a problem for you then it really is a nice algorithm that deserves to be in your tool box as it is dramatically <a href="http://embeddedgurus.com/stack-overflow/2010/11/median-filter-performance-results/">faster </a>than algorithms based upon sorting.</p>
<h2>Median filtering based on sorting</h2>
<p>In the original version of this article I espoused using a sorting based approach to median filtering when the filter size was 5, 7 or 9. I no longer subscribe to this belief. However for those of you that want to do it, here&#8217;s the basic outline:</p>
<pre> if (ADC_Buffer_Full)
 {
   uint_fast16_t adc_copy[MEDIAN_FILTER_SIZE];
   uint_fast16_t filtered_cnts;

   /* Copy the data */
   memcpy(adc_copy, ADC_Counts, sizeof(adc_copy));
   /* Sort it */
   shell_sort(adc_copy, MEDIAN_FILTER_SIZE);
   /* Take the middle value */
   filtered_cnts = adc_copy[(MEDIAN_FILTER_SIZE - 1U) / 2U];
   /* Convert to engineering units */
   ...
 }</pre>
<h2>Final Thoughts</h2>
<p>Like most things in embedded systems, median filters have certain costs associated with them. Clearly median filters introduce a delay to a step change in value which can be problematic at times. In addition median filters can completely clobber frequency information in the signal. Of course if you are only interested in DC values then this is not a problem. With these caveats I strongly recommend that you consider incorporating median filters in your next embedded design.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2010/10/median-filtering/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Sorting (in) embedded systems</title>
		<link>http://embeddedgurus.com/stack-overflow/2009/03/sorting-in-embedded-systems/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2009/03/sorting-in-embedded-systems/#comments</comments>
		<pubDate>Sun, 15 Mar 2009 19:30:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Sorting]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2009/03/15/sorting-in-embedded-systems/</guid>
		<description><![CDATA[Although countless PhD&#8217;s have been awarded on sorting algorithms, it&#8217;s not a topic that seems to come up much in embedded systems (or at least the kind of embedded systems that I work on). Thus it was with some surprise recently that I found myself needing to sort an array of integers. The array wasn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>Although countless PhD&#8217;s have been awarded on sorting algorithms, it&#8217;s not a topic that seems to come up much in embedded systems (or at least the kind of embedded systems that I work on). Thus it was with some surprise recently that I found myself needing to sort an array of integers. The array wasn&#8217;t very large (about twenty entries) and I was eager to move on to the real problem at hand and so I just dropped in a call to the standard C routine qsort(). I didn&#8217;t give it a great deal of thought because I &#8216;knew&#8217; that a &#8216;Quick Sort&#8217; algorithm is in general fast and well behaved and that with sorting so few entries I wasn&#8217;t too concerned about it being &#8216;optimal&#8217;. Anyway, with the main task at hand solved, on a whim I decided to take another look at qsort(), just to make sure that I wasn&#8217;t being too cavalier in my approach. Boy did I get a shock! My call to qsort() was increasing my code size by 1500 bytes and it wasn&#8217;t giving very good sort times either. For those of you programming big systems, this may seem acceptable. In my case, the target processor had 16K of memory and so 1500 bytes was a huge hit.</p>
<p>Surely there had to be a better solution? Well there&#8217;s always a better solution, but in my case in particular, and for embedded systems in general, what is the optimal sorting algorithm?</p>
<p>Well, after thinking about it for a while, I think the optimal sorting algorithm for embedded systems has these characteristics:</p>
<ol>
<li>It must sort in place.</li>
<li>The algorithm must not be recursive.</li>
<li>Its best, average and worst case running times should be of similar magnitude.</li>
<li>Its code size should be commensurate with the problem.</li>
<li>Its running time should increase linearly or logarithmically with the number of elements to be sorted.</li>
<li>Its implementation must be &#8216;clean&#8217; &#8211; i.e. free of breaks and returns in the middle of a loop.</li>
</ol>
<h5>Sort In Place</h5>
<p>This is an important criterion not just because it saves memory, but most importantly because it obviates the need for dynamic memory allocation. In general dynamic memory allocation should be avoided in embedded systems because of problems with heap fragmentation and allocation performance. If you aren&#8217;t aware of this issue, then read <a href="http://www.embedded.com/design/207402546?pgno=2">this</a> series of articles by Dan Saks on the issue.</p>
<h5>Recursion</h5>
<p>Recursion is beautiful and solves certain problems amazingly elegantly. However, it&#8217;s not fast and it can easily lead to problems of stack overflow. As a result, it should never be used in embedded systems.</p>
<h5>Running Time Variability</h5>
<p>Even the softest of real time systems have some time constraints that need to be met. As a result a function whose execution time varies enormously with the input data can often be problematic. Thus I prefer code whose execution time is nicely bounded.</p>
<h5>Code Size</h5>
<p>This is often a concern. Suffice to say that the code size should be reasonable for the target system.</p>
<h5>Data Size Dependence</h5>
<p>Sorting algorithms are usually classified using &#8216;Big O notation&#8217; to denote how sensitive they are to the amount of data to be sorted. If N is the number of elements to be sorted, then an algorithm whose running time is N Log N is usually preferred to one whose running time is N<sup>2</sup>. However, as you shall see, for small N the advantage of the more sophisticated algorithms can be lost by the the overhead of the sophistication.</p>
<h5>Clean Implementation</h5>
<p>I&#8217;m a great proponent of &#8216;clean&#8217; code. Thus code where one exits from the middle of a loop isn&#8217;t as acceptable as code where everything proceeds in an orderly fashion. Although this is a personal preference of mine, it is also codified in for example the MISRA C requirements, to which many embedded systems are built.  Anyway to determine the optimal sorting algorithm, I went to the Wikipedia <a href="http://en.wikipedia.org/wiki/Sorting_algorithm">page</a> on sorting algorithms and initially selected the following for comparison to the built in qsort: Comb, Gnome, Selection, Insertion, Shell &amp; Heap sorts. All of these are sort in place algorithms. I originally eschewed the Bubble &amp; Cocktail sorts as they really have nothing to commend them. However, several people posted comments asking that I include them &#8211; so I did. As predicted they have nothing to commend them. In all cases, I used the Wikipedia code pretty much as is, optimized for maximum speed. (I recognize that the implementations in Wikipedia may not be optimal &#8211; but they are the best I have). For each algorithm, I sorted arrays of 8, 32 &amp; 128 signed integers. In every case I sorted the same random array, together with a sorted array and an inverse sorted array.  First the code sizes in bytes:</p>
<pre>qsort()      1538
Gnome()        76
Selection()   130
Insertion()   104
Shell()       242
Comb()        190
Heap()        200
Bubble()      104
Cocktail()    140</pre>
<p>Clearly, anything is a lot better than the built in qsort(). However, we are not comparing apples and oranges, because qsort() is a general purpose routine, whereas the others are designed explicitly to sort integers. Leaving aside qsort(), the Gnome sort Insertion sort and Bubble sorts are clearly the code size leaders. Having said that, in most embedded systems, a 100 bytes here or there is irrelevant and so we are free to choose based upon other criteria.</p>
<h4>Execution times for the 8 element array</h4>
<pre>Name        Random  Sorted  Inverse Sorted
qsort()     3004     832    2765
Gnome()     1191     220    2047
Selection() 1120    1120    1120
Insertion() 544      287    756
Shell()     1233    1029    1425
Comb()      2460    1975    2480
Heap()      1265    1324    1153
Bubble()     875     208    1032
Cocktail()  1682     927    2056</pre>
<p>In this case, the Insertion sort is the clear winner. Not only is it dramatically faster in almost all cases, it also has reasonable variability and it has almost the smallest code size. Notice that the bubble sort for all its vaunted simplicity consumes as much code and runs considerably slower. Notice that the Selection sort&#8217;s running time is completely consistent &#8211; and not too bad when compared to other methods.</p>
<h4>Execution times for the 32 element array</h4>
<pre>Name        Random  Sorted  Inverse Sorted
qsort()     23004    3088   19853
Gnome()     17389     892   35395
Selection() 14392   14392   14392
Insertion()  5588    1179   10324
Shell()      6589    4675    6115
Comb()      10217    8638   10047
Heap()       8449    8607    7413
Bubble()    13664     784   16368
Cocktail()  17657    3807   27634</pre>
<p>In this case, the winner isn&#8217;t so clear cut. Although the insertion sort still performed well, it&#8217;s showing a very large variation in running time now. By contrast the shell sort has got decent times with small variability. The Gnome, Bubble and Cocktail sorts are showing huge variability in execution times (with a very bad worst case), while the Selection sort shows consistent execution time. On balance, I&#8217;d go with the shell sort in most cases.</p>
<h4>Execution times for the 128 element array</h4>
<pre>Name         Random  Sorted  Inverse Sorted
qsort()      120772   28411   77896
Gnome()      316550    3580  577747
Selection()  217420  217420  217420
Insertion()   88475    4731  158020
Shell()       41661   25611   34707
Comb()        50858   43523   48568
Heap()        46959   49215   43314
Bubble()     231294    3088  262032
Cocktail()   271821   15327  422266</pre>
<p>In this case the winner is either the shell sort or the heap sort depending on whether you want raw performance more or less when compared to performance variability. The Gnome, Bubble and Cocktail sorts are hopelessly outclassed.  So what to make of all this? Well in any comparison like this there are a myriad of variables that one should take into account, and so I don&#8217;t believe these data should be treated as gospel. What is clear to me is that:</p>
<ol>
<li>Being a general purpose routine, qsort() is unlikely to be the optimal solution for an embedded system.</li>
<li>For many embedded applications, a shell sort has a lot to commend it &#8211; decent code size, fast running time, well behaved and a clean implementation. Thus if you don&#8217;t want to bother with this sort of investigation every time you need to sort an array, then a shell sort should be your starting point. It will be for me henceforth.</li>
</ol>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2009/03/sorting-in-embedded-systems/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Horner&#8217;s rule addendum</title>
		<link>http://embeddedgurus.com/stack-overflow/2009/02/horners-rule-addendum/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2009/02/horners-rule-addendum/#comments</comments>
		<pubDate>Sun, 15 Feb 2009 12:28:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2009/02/15/horners-rule-addendum/</guid>
		<description><![CDATA[A few weeks ago I wrote about using Horner&#8217;s rule to evaluate polynomials. Well today I&#8217;m following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one&#8217;s mistakes than one&#8217;s successes, I thought I&#8217;d share it with you. First, some background. I [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I <a href="http://embeddedgurus.com/stack-overflow/2009/01/horners-rule-and-related-thoughts/">wrote</a> about using Horner&#8217;s rule to evaluate polynomials. Well today I&#8217;m following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one&#8217;s mistakes than one&#8217;s successes, I thought I&#8217;d share it with you. First, some background. I had some experimental data on the behavior of a sensor against temperature. I needed to be able to fit a regression curve through the data, and so after some experimentation I settled on a quadratic polynomial fit. This is what the data and the curve looked like:</p>
<p><a href="http://embeddedgurus.com/stack-overflow/files/2009/02/Curve12.jpg"><img class="aligncenter size-full wp-image-595" src="http://embeddedgurus.com/stack-overflow/files/2009/02/Curve12.jpg" alt="" width="891" height="425" /></a></p>
<p>On the face of it, everything looks OK. However, if you look carefully, you will notice two things:</p>
<ul>
<li>The bulk of the experimental data cover the temperature range of 5 &#8211; 48 degrees.</li>
<li>There is a very slight hook on the right hand side of the graph</li>
</ul>
<p>So where&#8217;s the mistake? Well actually I made two mistakes:</p>
<ul>
<li>I assumed that my experimental data covered the entire expected operating temperature range.</li>
<li>I failed to check at run time that the temperature was indeed bounded to the experimental input range.</li>
</ul>
<p>Why is this important? Well, what happened, was that in some circumstances the sensor would experience temperatures somewhat higher than I expected when the experimental data was gathered, e.g. 55 degrees. Well that doesn&#8217;t sound too bad &#8211; until you take the polynomial and extend it out a bit. This is what it looks like:</p>
<p><a href="http://embeddedgurus.com/stack-overflow/files/2009/02/Curve2.jpg"><img class="aligncenter size-full wp-image-596" src="http://embeddedgurus.com/stack-overflow/files/2009/02/Curve2.jpg" alt="" width="905" height="485" /></a></p>
<p>You can see that at 55 degrees, the polynomial generates a value which is about the same as at 25 degrees. Needless to say, things didn&#8217;t work too well! So what advice can I offer?</p>
<ul>
<li>Ensure that when fitting a polynomial to experimental data, that the experimental data covers all the possible range of values that can be physically realized.</li>
<li>Always plot the polynomial to see how it performs outside your range of interest. In particular, if it &#8216;takes off&#8217; in a strange manner, then treat it very warily.</li>
<li>At run time, ensure that the data that you are feeding into the polynomial is bounded to the range over which the polynomial is known to be valid.</li>
</ul>
<p>The maddening thing about this for me, was that I &#8216;learned&#8217; this lesson about polynomial fits many years ago. I just chose to ignore it this time. Before I leave this topic, I&#8217;d like to offer one other insight. If you search for Horner&#8217;s rule, you&#8217;ll find a plethora of articles. The more detailed ones will opine on topics such as evaluation stability, numeric overflow issues and so on. However, it&#8217;s rare that you&#8217;ll find this sort of information on polynomial evaluation posted. I think it&#8217;s because we tend to get wrapped up in the details of the algorithm while losing sight of the underlying mathematics of what is going on. The bottom line, the next time you find a neat algorithm posted on the web for &#8216;solving&#8217; your problem, take a big step back and think hard about what is really going on and what are the inherent weaknesses in what you are doing. <a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2009/02/horners-rule-addendum/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Horner&#039;s rule and related thoughts</title>
		<link>http://embeddedgurus.com/stack-overflow/2009/01/horners-rule-and-related-thoughts/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2009/01/horners-rule-and-related-thoughts/#comments</comments>
		<pubDate>Mon, 05 Jan 2009 22:09:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Efficient C/C++]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2009/01/05/horners-rule-and-related-thoughts/</guid>
		<description><![CDATA[Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors&#8217; performance. Using the regression tools built into Excel, I was looking at the various models, from a [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors&#8217; performance. Using the regression tools built into Excel, I was looking at the various models, from a &#8216;goodness of fit&#8217; perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.</p>
<p>Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner&#8217;s rule. I say relatively well known, because I&#8217;d say about half the time I see a polynomial evaluated, it doesn&#8217;t use Horner&#8217;s rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner&#8217;s rule, I thought I&#8217;d mention it here.</p>
<p>OK, so what is it? Well it&#8217;s based on simply refactoring a polynomial expression:</p>
<p>a<sub>n</sub>x<sup>n</sup> + a<sub>(n-1)</sub>x<sup>(n-1) </sup>+ &#8230; + a<sub>0</sub>=((a<sub>n</sub>x + a<sub>(n-1)</sub>)x +&#8230;)x + a<sub>0</sub>.</p>
<p>Thus a polynomial of order n, requires exactly n multiplications and n additions.</p>
<p>For example:</p>
<p>23.1x<sup>2</sup> &#8211; 45.6x + 12.3 = (23.1x -45.6)x + 12.3</p>
<p>In this case a quadratic equation or order 2, using Horner&#8217;s rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.</p>
<p>For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of x<sup>N</sup>.</p>
<pre>y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;
</pre>
<p>The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.</p>
<p>It should be noted that as well as being more computationally efficient, Horner&#8217;s rule is also more accurate. This comes about in two ways:</p>
<ul>
<li>The very act of using less floating point operations leads to less rounding errors</li>
<li>Higher order polynomials generate very large numbers in a hurry. Horner&#8217;s method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude</li>
</ul>
<p>Although Horner&#8217;s rule is a nice tool to have at one&#8217;s disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner&#8217;s rule. Other times, it&#8217;s an approximation method, and other times it&#8217;s just a flat out really neat algorithm (see for example my <a href="http://embeddedgurus.com/stack-overflow/2007/04/crest-factor-square-roots-neat-algorithms/">posting </a>on Crenshaw&#8217;s square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You&#8217;ll probably be glad you did.</p>
<h4>Update</h4>
<p>There is a highly relevant addendum to this posting <a href="http://embeddedgurus.com/stack-overflow/2009/02/horners-rule-addendum/">here</a>.</p>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2009/01/horners-rule-and-related-thoughts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Modulo Means (reprised)</title>
		<link>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means-reprised/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means-reprised/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 00:03:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2008/11/30/modulo-means-reprised/</guid>
		<description><![CDATA[In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their Cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X &#38; Y back into a phase angle. Well kudos to [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their Cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X &amp; Y back into a phase angle. Well kudos to Bruno because this is exactly what I ended up doing. However, as Bruno observed, it&#8217;s not exactly an efficient process. It is however robust, and in my application, the robustness counts for a lot.</p>
<p>The suggestion that I average the inputs to the phase comparator has its merits. However for reasons that would take too long to explain, I&#8217;m not really able to do this in my application.</p>
<p>Finally, I&#8217;d like to mention the second solution that Kyle had proposed. First a caveat. I haven&#8217;t fully thought through this solution, and I most certainly have not implemented and tested it. With that in mind, here&#8217;s another approach to contemplate.</p>
<p>You&#8217;ll remember that we can compute the average of the phase angle by using the simple arithmetic mean, provided that we do not cross back and fore across the zero phase line. Well Kyle&#8217;s insight was that as well as computing the arithmetic mean of the phase angle, we also do the same for the quadrature angle. The idea is that while it is possible that the phase could alternate across the zero degree line, it would not simultaneously alternate across the 90 degree line (or indeed the 180 degree line).  Thus, the method then becomes one of computing two means and choosing the correct one. If I get the time I&#8217;ll develop this into a fully fledged algorithm and publish it for you all to, ahem, enjoy. I&#8217;m fairly sure that this method is not as robust as the Cartesian method. However, it is dramatically more efficient and thus is deserving of greater investigation. Bruno &#8211; perhaps you&#8217;d care to do the analysis in your CFT (Copious Free Time)?</p>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means-reprised/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Modulo means</title>
		<link>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 19:56:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2008/11/21/modulo-means/</guid>
		<description><![CDATA[Normally on this blog I&#8217;m either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I&#8217;m turning the tables, as I&#8217;d like your help. Yesterday I ran into a rather perplexing problem, which I&#8217;d be interested to see if any of my readers can solve. In a [...]]]></description>
			<content:encoded><![CDATA[<p>Normally on this blog I&#8217;m either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I&#8217;m turning the tables, as I&#8217;d like your help. Yesterday I ran into a rather perplexing problem, which I&#8217;d be interested to see if any of my readers can solve.</p>
<p>In a product I am working on, there is a phase comparator generating difference readings in the range 0 &#8211; 0xF. The phase comparator is somewhat noisy and so I want to obtain a moving average of the phase differences. Now typically to perform a moving average filter, one sums the elements in a buffer and divides by the number of elements to obtain the arithmetic mean. Indeed we can do this here, provided that we don&#8217;t flip back and fore across the zero line. If we do cross the zero line then the method breaks down. For example, if successive phase differences are 0, F, 0, F, 0, F &#8230;. 0, F, then the simple arithmetic mean of these numbers will be 8 instead of some value between F and 0.</p>
<p>You may think that the answer is to switch to signed arithmetic and operate over the range -8 &#8230; +7. However, a little thought will show that you have now merely shifted the problem as to what happens when the system is close to -8 such that the values alternate between -8, 7, -8, 7 &#8230; -8, 7.</p>
<p>Thus, can you come up with a robust, efficient solution to compute the mean of an array of modulo numbers?</p>
<p>The problem is solvable as one of the Engineers that I&#8217;m working with hit upon not one, but two possible solutions (nice work Kyle). However, I&#8217;d be interested in other possible approaches.</p>
<p>I&#8217;ll publish Kyle&#8217;s method(s) next week.</p>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2008/11/modulo-means/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Integer Log functions</title>
		<link>http://embeddedgurus.com/stack-overflow/2008/05/integer-log-functions/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2008/05/integer-log-functions/#comments</comments>
		<pubDate>Sun, 11 May 2008 22:44:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Efficient C/C++]]></category>
		<category><![CDATA[Log]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2008/05/11/integer-log-functions/</guid>
		<description><![CDATA[A few months ago I wrote about a very nifty square root function in Jack Crenshaw&#8217;s book &#8220;Math Toolkit for Real-time Programming&#8221;. As elegant as the square root function is, it pails in comparison to what Crenshaw calls his &#8216;bitlog&#8217; function. This is some code that computes the log (to base 2 of course) of [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago I <a href="http://embeddedgurus.com/stack-overflow/2007/04/crest-factor-square-roots-neat-algorithms/">wrote</a> about a very nifty square root function in Jack Crenshaw&#8217;s book &#8220;Math Toolkit for Real-time Programming&#8221;. As elegant as the square root function is, it pails in comparison to what Crenshaw calls his &#8216;bitlog&#8217; function. This is some code that computes the log (to base 2 of course) of an integer &#8211; and does it in amazingly few cycles and with amazing accuracy. The code in the book is for a 32 bit integer; the code I present here is for a 16 bit integer. Although you are of course free to use this code as is, I strongly suggest you buy Crenshaw&#8217;s book and read about this function. You&#8217;ll see it truly is a work of art. BTW, one of the things I really like about Crenshaw is that he takes great pains to note that he didn&#8217;t invent this algorithm. Rather he credits Tom Lehman. Kudos to Lehman.</p>
<pre>/**
 FUNCTION: bitlog

 DESCRIPTION:
 Computes 8 * (log(base 2)(x) -1).

 PARAMETERS:
 -    The uint16_t value whose log we desire

 RETURNS:
 -    An approximation to log(x)

 NOTES:
 -   

**/
uint16_t bitlog(uint16_t x)
{
    uint8_t    b;
    uint16_t res;

    if (x &lt;=  8 ) /* Shorten computation for small numbers */
    {
        res = 2 * x;
    }
    else
    {
        b = 15; /* Find the highest non zero bit in the input argument */
        while ((b &gt; 2) &amp;&amp; ((int16_t)x &gt; 0))
        {
            --b;
            x &lt;&lt;= 1;
        }
        x &amp;= 0x7000;
        x &gt;&gt;= 12;

        res = x + 8 * (b - 1);
    }

    return res;
}</pre>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2008/05/integer-log-functions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Continued Fractions</title>
		<link>http://embeddedgurus.com/stack-overflow/2007/05/continued-fractions/</link>
		<comments>http://embeddedgurus.com/stack-overflow/2007/05/continued-fractions/#comments</comments>
		<pubDate>Sat, 19 May 2007 23:50:00 +0000</pubDate>
		<dc:creator>Nigel Jones</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2007/05/19/continued-fractions/</guid>
		<description><![CDATA[Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don&#8217;t know what these are, then check out this link. As entertaining as the link is, let me cut [...]]]></description>
			<content:encoded><![CDATA[<p>Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don&#8217;t know what these are, then check out <a href="http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/cfINTRO.html">this </a>link.</p>
<p>As entertaining as the link is, let me cut to the chase as to why you need to know this technique. In a nutshell<span> </span>, in the embedded world we often need to perform fixed point arithmetic for cost / performance reasons. Although this is not a problem in many cases, what happens when you need to multiply something by say 1.2764? The naive way to do this might be:</p>
<pre>uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = (x * 12764) / 10000;
 return y;
}</pre>
<p>As written, this will fail because of numeric overflow in the expression (x * 12764). Thus it&#8217;s necessary to throw in some very expensive casts. E.g.</p>
<pre>uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = ((uint32_t)x * 12764) / 10000;
 return y;
}</pre>
<p>Our speedy integer arithmetic isn&#8217;t looking so good now is it?</p>
<p>What we really want to do is to use a fraction (a/b) that is a close approximation to 1.2764 &#8211; but (in this case) has a numerator that doesn&#8217;t exceed 255 (so that we can do the calculation in 16 bit arithmetic).</p>
<p>Enter continued fractions. One of the many uses for this technique is finding fractions (a/b) that are approximations to real numbers. In this case using the calculator <a href="http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/cfCALC.html">here</a>, we get the following results:</p>
<p>Convergents:<br />
1: 1/1 = 1<br />
3: 4/3 = 1.3333333333333333<br />
1: 5/4 = 1.25<br />
1: 9/7 = 1.2857142857142858<br />
1: 14/11 = 1.2727272727272727<br />
1: 23/18 = 1.2777777777777777<br />
1: 37/29 = 1.2758620689655173<br />
1: 60/47 = 1.2765957446808511<br />
1: 97/76 = 1.2763157894736843<br />
1: 157/123 = 1.2764227642276422<br />
2: 411/322 = 1.2763975155279503<br />
1: 1801/1411 = 1.2763997165131113<br />
1: 3191/2500 = 1.2764</p>
<p>We get higher accuracy as we go down the list. In this case, I chose the approximation (157 / 123) because it&#8217;s the highest accuracy fraction that has a numerator less than 255. Thus my code now becomes:</p>
<pre>uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = ((uint16_t)x * 157) / 123;
 return y;
}</pre>
<p>The error is less than 0.002% &#8211; but the calculation speed is dramatically improved because I don&#8217;t need to resort to 32 bit arithmetic.  [On an ATmega88 processor, calling scale() for every value from 0-255 took 148,677 cycles for the naive approach and 53,300 cycles for the continued fraction approach.]</p>
<p>Incidentally, you might be wondering if there are other fractions that give better results than the ones generated by this technique. The mathematicians tell us no.</p>
<p>So there you have it. A nifty technique that once you know about it will make you wonder how you got along without it for all these years.</p>
<p><a href="http://www.embeddedgurus.com/stack-overflow/">Home</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/stack-overflow/2007/05/continued-fractions/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

