Archive for the ‘Algorithms’ Category

Sorting (in) embedded systems

Sunday, March 15th, 2009 Nigel Jones

Although countless PhD’s have been awarded on sorting algorithms, it’s not a topic that seems to come up much in embedded systems (or at least the kind of embedded systems that I work on). Thus it was with some surprise recently that I found myself needing to sort an array of integers. The array wasn’t very large (about twenty entries) and I was eager to move on to the real problem at hand and so I just dropped in a call to the standard C routine qsort(). I didn’t give it a great deal of thought because I ‘knew’ that a ‘Quick Sort’ algorithm is in general fast and well behaved and that with sorting so few entries I wasn’t too concerned about it being ‘optimal’. Anyway, with the main task at hand solved, on a whim I decided to take another look at qsort(), just to make sure that I wasn’t being too cavalier in my approach. Boy did I get a shock! My call to qsort() was increasing my code size by 1500 bytes and it wasn’t giving very good sort times either. For those of you programming big systems, this may seem acceptable. In my case, the target processor had 16K of memory and so 1500 bytes was a huge hit.

Surely there had to be a better solution? Well there’s always a better solution, but in my case in particular, and for embedded systems in general, what is the optimal sorting algorithm?

Well, after thinking about it for a while, I think the optimal sorting algorithm for embedded systems has these characteristics:

  1. It must sort in place.
  2. The algorithm must not be recursive.
  3. Its best, average and worst case running times should be of similar magnitude.
  4. Its code size should be commensurate with the problem.
  5. Its running time should increase linearly or logarithmically with the number of elements to be sorted.
  6. Its implementation must be ‘clean’ – i.e. free of breaks and returns in the middle of a loop.
Sort In Place

This is an important criterion not just because it saves memory, but most importantly because it obviates the need for dynamic memory allocation. In general dynamic memory allocation should be avoided in embedded systems because of problems with heap fragmentation and allocation performance. If you aren’t aware of this issue, then read this series of articles by Dan Saks on the issue.

Recursion

Recursion is beautiful and solves certain problems amazingly elegantly. However, it’s not fast and it can easily lead to problems of stack overflow. As a result, it should never be used in embedded systems.

Running Time Variability

Even the softest of real time systems have some time constraints that need to be met. As a result a function whose execution time varies enormously with the input data can often be problematic. Thus I prefer code whose execution time is nicely bounded.

Code Size

This is often a concern. Suffice to say that the code size should be reasonable for the target system.

Data Size Dependence

Sorting algorithms are usually classified using ‘Big O notation’ to denote how sensitive they are to the amount of data to be sorted. If N is the number of elements to be sorted, then an algorithm whose running time is N Log N is usually preferred to one whose running time is N2. However, as you shall see, for small N the advantage of the more sophisticated algorithms can be lost by the the overhead of the sophistication.

Clean Implementation

I’m a great proponent of ‘clean’ code. Thus code where one exits from the middle of a loop isn’t as acceptable as code where everything proceeds in an orderly fashion. Although this is a personal preference of mine, it is also codified in for example the MISRA C requirements, to which many embedded systems are built. Anyway to determine the optimal sorting algorithm, I went to the Wikipedia page on sorting algorithms and initially selected the following for comparison to the built in qsort: Comb, Gnome, Selection, Insertion, Shell & Heap sorts. All of these are sort in place algorithms. I originally eschewed the Bubble & Cocktail sorts as they really have nothing to commend them. However, several people posted comments asking that I include them – so I did. As predicted they have nothing to commend them. In all cases, I used the Wikipedia code pretty much as is, optimized for maximum speed. (I recognize that the implementations in Wikipedia may not be optimal – but they are the best I have). For each algorithm, I sorted arrays of 8, 32 & 128 signed integers. In every case I sorted the same random array, together with a sorted array and an inverse sorted array. First the code sizes in bytes:

qsort()      1538
Gnome()        76
Selection()   130
Insertion()   104
Shell()       242
Comb()        190
Heap()        200
Bubble()      104
Cocktail()    140

Clearly, anything is a lot better than the built in qsort(). However, we are not comparing apples and oranges, because qsort() is a general purpose routine, whereas the others are designed explicitly to sort integers. Leaving aside qsort(), the Gnome sort Insertion sort and Bubble sorts are clearly the code size leaders. Having said that, in most embedded systems, a 100 bytes here or there is irrelevant and so we are free to choose based upon other criteria.

Execution times for the 8 element array

Name        Random  Sorted  Inverse Sorted
qsort()     3004     832    2765
Gnome()     1191     220    2047
Selection() 1120    1120    1120
Insertion() 544      287    756
Shell()     1233    1029    1425
Comb()      2460    1975    2480
Heap()      1265    1324    1153
Bubble()     875     208    1032
Cocktail()  1682     927    2056

In this case, the Insertion sort is the clear winner. Not only is it dramatically faster in almost all cases, it also has reasonable variability and it has almost the smallest code size. Notice that the bubble sort for all its vaunted simplicity consumes as much code and runs considerably slower. Notice that the Selection sort’s running time is completely consistent – and not too bad when compared to other methods.

Execution times for the 32 element array

Name        Random  Sorted  Inverse Sorted
qsort()     23004    3088   19853
Gnome()     17389     892   35395
Selection() 14392   14392   14392
Insertion()  5588    1179   10324
Shell()      6589    4675    6115
Comb()      10217    8638   10047
Heap()       8449    8607    7413
Bubble()    13664     784   16368
Cocktail()  17657    3807   27634

In this case, the winner isn’t so clear cut. Although the insertion sort still performed well, it’s showing a very large variation in running time now. By contrast the shell sort has got decent times with small variability. The Gnome, Bubble and Cocktail sorts are showing huge variability in execution times (with a very bad worst case), while the Selection sort shows consistent execution time. On balance, I’d go with the shell sort in most cases.

Execution times for the 128 element array

Name         Random  Sorted  Inverse Sorted
qsort()      120772   28411   77896
Gnome()      316550    3580  577747
Selection()  217420  217420  217420
Insertion()   88475    4731  158020
Shell()       41661   25611   34707
Comb()        50858   43523   48568
Heap()        46959   49215   43314
Bubble()     231294    3088  262032
Cocktail()   271821   15327  422266

In this case the winner is either the shell sort or the heap sort depending on whether you want raw performance more or less when compared to performance variability. The Gnome, Bubble and Cocktail sorts are hopelessly outclassed. So what to make of all this? Well in any comparison like this there are a myriad of variables that one should take into account, and so I don’t believe these data should be treated as gospel. What is clear to me is that:

  1. Being a general purpose routine, qsort() is unlikely to be the optimal solution for an embedded system.
  2. For many embedded applications, a shell sort has a lot to commend it – decent code size, fast running time, well behaved and a clean implementation. Thus if you don’t want to bother with this sort of investigation every time you need to sort an array, then a shell sort should be your starting point. It will be for me henceforth.

Home

Horner's rule addendum

Sunday, February 15th, 2009 Nigel Jones

A few weeks ago I wrote about using Horner’s rule to evaluate polynomials. Well today I’m following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one’s mistakes than one’s successes, I thought I’d share it with you. First, some background. I had some experimental data on the behavior of a sensor against temperature. I needed to be able to fit a regression curve through the data, and so after some experimentation I settled on a quadratic polynomial fit. This is what the data and the curve looked like: On the face of it, everything looks OK. However, if you look carefully, you will notice two things:

  • The bulk of the experimental data cover the temperature range of 5 – 48 degrees.
  • There is a very slight hook on the right hand side of the graph

So where’s the mistake? Well actually I made two mistakes:

  • I assumed that my experimental data covered the entire expected operating temperature range.
  • I failed to check at run time that the temperature was indeed bounded to the experimental input range.

Why is this important? Well, what happened, was that in some circumstances the sensor would experience temperatures somewhat higher than I expected when the experimental data was gathered, e.g. 55 degrees. Well that doesn’t sound too bad – until you take the polynomial and extend it out a bit. This is what it looks like: You can see that at 55 degrees, the polynomial generates a value which is about the same as at 25 degrees. Needless to say, things didn’t work too well! So what advice can I offer?

  • Ensure that when fitting a polynomial to experimental data, that the experimental data covers all the possible range of values that can be physically realized.
  • Always plot the polynomial to see how it performs outside your range of interest. In particular, if it ‘takes off’ in a strange manner, then treat it very warily.
  • At run time, ensure that the data that you are feeding into the polynomial is bounded to the range over which the polynomial is known to be valid.

The maddening thing about this for me, was that I ‘learned’ this lesson about polynomial fits many years ago. I just chose to ignore it this time. Before I leave this topic, I’d like to offer one other insight. If you search for Horner’s rule, you’ll find a plethora of articles. The more detailed ones will opine on topics such as evaluation stability, numeric overflow issues and so on. However, it’s rare that you’ll find this sort of information on polynomial evaluation posted. I think it’s because we tend to get wrapped up in the details of the algorithm while losing sight of the underlying mathematics of what is going on. The bottom line, the next time you find a neat algorithm posted on the web for ’solving’ your problem, take a big step back and think hard about what is really going on and what are the inherent weaknesses in what you are doing. Home

Horner's rule and related thoughts

Monday, January 5th, 2009 Nigel Jones

Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors’ performance. Using the regression tools built into Excel, I was looking at the various models, from a ‘goodness of fit’ perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.

Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner’s rule. I say relatively well known, because I’d say about half the time I see a polynomial evaluated, it doesn’t use Horner’s rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner’s rule, I thought I’d mention it here.

OK, so what is it? Well it’s based on simply refactoring a polynomial expression:

anxn + a(n-1)x(n-1) + … + a0=((anx + a(n-1))x +…)x + a0.

Thus a polynomial of order n, requires exactly n multiplications and n additions.

For example:

23.1x2 – 45.6x + 12.3 = (23.1x -45.6)x + 12.3

In this case a quadratic equation or order 2, using Horner’s rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.

For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of xN.

y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;

The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.

It should be noted that as well as being more computationally efficient, Horner’s rule is also more accurate. This comes about in two ways:

  • The very act of using less floating point operations leads to less rounding errors
  • Higher order polynomials generate very large numbers in a hurry. Horner’s method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude

Although Horner’s rule is a nice tool to have at one’s disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner’s rule. Other times, it’s an approximation method, and other times it’s just a flat out really neat algorithm (see for example my posting on Crenshaw’s square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You’ll probably be glad you did.

Update

There is a highly relevant addendum to this posting here.

Home

Modulo Means (reprised)

Sunday, November 30th, 2008 Nigel Jones

In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their Cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X & Y back into a phase angle. Well kudos to Bruno because this is exactly what I ended up doing. However, as Bruno observed, it’s not exactly an efficient process. It is however robust, and in my application, the robustness counts for a lot.

The suggestion that I average the inputs to the phase comparator has its merits. However for reasons that would take too long to explain, I’m not really able to do this in my application.

Finally, I’d like to mention the second solution that Kyle had proposed. First a caveat. I haven’t fully thought through this solution, and I most certainly have not implemented and tested it. With that in mind, here’s another approach to contemplate.

You’ll remember that we can compute the average of the phase angle by using the simple arithmetic mean, provided that we do not cross back and fore across the zero phase line. Well Kyle’s insight was that as well as computing the arithmetic mean of the phase angle, we also do the same for the quadrature angle. The idea is that while it is possible that the phase could alternate across the zero degree line, it would not simultaneously alternate across the 90 degree line (or indeed the 180 degree line). Thus, the method then becomes one of computing two means and choosing the correct one. If I get the time I’ll develop this into a fully fledged algorithm and publish it for you all to, ahem, enjoy. I’m fairly sure that this method is not as robust as the Cartesian method. However, it is dramatically more efficient and thus is deserving of greater investigation. Bruno – perhaps you’d care to do the analysis in your CFT (Copious Free Time)?

Home

Modulo means

Friday, November 21st, 2008 Nigel Jones

Normally on this blog I’m either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I’m turning the tables, as I’d like your help. Yesterday I ran into a rather perplexing problem, which I’d be interested to see if any of my readers can solve.

In a product I am working on, there is a phase comparator generating difference readings in the range 0 – 0xF. The phase comparator is somewhat noisy and so I want to obtain a moving average of the phase differences. Now typically to perform a moving average filter, one sums the elements in a buffer and divides by the number of elements to obtain the arithmetic mean. Indeed we can do this here, provided that we don’t flip back and fore across the zero line. If we do cross the zero line then the method breaks down. For example, if successive phase differences are 0, F, 0, F, 0, F …. 0, F, then the simple arithmetic mean of these numbers will be 8 instead of some value between F and 0.

You may think that the answer is to switch to signed arithmetic and operate over the range -8 … +7. However, a little thought will show that you have now merely shifted the problem as to what happens when the system is close to -8 such that the values alternate between -8, 7, -8, 7 … -8, 7.

Thus, can you come up with a robust, efficient solution to compute the mean of an array of modulo numbers?

The problem is solvable as one of the Engineers that I’m working with hit upon not one, but two possible solutions (nice work Kyle). However, I’d be interested in other possible approaches.

I’ll publish Kyle’s method(s) next week.

Home