Archive for the ‘Algorithms’ Category

Horner's rule and related thoughts

Monday, January 5th, 2009 Nigel Jones

Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors’ performance. Using the regression tools built into Excel, I was looking at the various models, from a ‘goodness of fit’ perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.

Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner’s rule. I say relatively well known, because I’d say about half the time I see a polynomial evaluated, it doesn’t use Horner’s rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner’s rule, I thought I’d mention it here.

OK, so what is it? Well it’s based on simply refactoring a polynomial expression:

anxn + a(n-1)x(n-1) + … + a0=((anx + a(n-1))x +…)x + a0.

Thus a polynomial of order n, requires exactly n multiplications and n additions.

For example:

23.1x2 – 45.6x + 12.3 = (23.1x -45.6)x + 12.3

In this case a quadratic equation or order 2, using Horner’s rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.

For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of xN.

y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;

The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.

It should be noted that as well as being more computationally efficient, Horner’s rule is also more accurate. This comes about in two ways:

  • The very act of using less floating point operations leads to less rounding errors
  • Higher order polynomials generate very large numbers in a hurry. Horner’s method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude

Although Horner’s rule is a nice tool to have at one’s disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner’s rule. Other times, it’s an approximation method, and other times it’s just a flat out really neat algorithm (see for example my posting on Crenshaw’s square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You’ll probably be glad you did.

Update

There is a highly relevant addendum to this posting here.

Home

Modulo Means (reprised)

Sunday, November 30th, 2008 Nigel Jones

In my previous post I had asked for some input on how to compute the mean of a phase comparator. Bruno Santiago suggested converting the phase readings to their Cartesian co-ordinates and averaging the resulting (X, Y) data, and then converting the means of X & Y back into a phase angle. Well kudos to Bruno because this is exactly what I ended up doing. However, as Bruno observed, it’s not exactly an efficient process. It is however robust, and in my application, the robustness counts for a lot.

The suggestion that I average the inputs to the phase comparator has its merits. However for reasons that would take too long to explain, I’m not really able to do this in my application.

Finally, I’d like to mention the second solution that Kyle had proposed. First a caveat. I haven’t fully thought through this solution, and I most certainly have not implemented and tested it. With that in mind, here’s another approach to contemplate.

You’ll remember that we can compute the average of the phase angle by using the simple arithmetic mean, provided that we do not cross back and fore across the zero phase line. Well Kyle’s insight was that as well as computing the arithmetic mean of the phase angle, we also do the same for the quadrature angle. The idea is that while it is possible that the phase could alternate across the zero degree line, it would not simultaneously alternate across the 90 degree line (or indeed the 180 degree line). Thus, the method then becomes one of computing two means and choosing the correct one. If I get the time I’ll develop this into a fully fledged algorithm and publish it for you all to, ahem, enjoy. I’m fairly sure that this method is not as robust as the Cartesian method. However, it is dramatically more efficient and thus is deserving of greater investigation. Bruno – perhaps you’d care to do the analysis in your CFT (Copious Free Time)?

Home

Modulo means

Friday, November 21st, 2008 Nigel Jones

Normally on this blog I’m either giving my opinions on embedded matters, or offering tips on how to do things better. Well today I’m turning the tables, as I’d like your help. Yesterday I ran into a rather perplexing problem, which I’d be interested to see if any of my readers can solve.

In a product I am working on, there is a phase comparator generating difference readings in the range 0 – 0xF. The phase comparator is somewhat noisy and so I want to obtain a moving average of the phase differences. Now typically to perform a moving average filter, one sums the elements in a buffer and divides by the number of elements to obtain the arithmetic mean. Indeed we can do this here, provided that we don’t flip back and fore across the zero line. If we do cross the zero line then the method breaks down. For example, if successive phase differences are 0, F, 0, F, 0, F …. 0, F, then the simple arithmetic mean of these numbers will be 8 instead of some value between F and 0.

You may think that the answer is to switch to signed arithmetic and operate over the range -8 … +7. However, a little thought will show that you have now merely shifted the problem as to what happens when the system is close to -8 such that the values alternate between -8, 7, -8, 7 … -8, 7.

Thus, can you come up with a robust, efficient solution to compute the mean of an array of modulo numbers?

The problem is solvable as one of the Engineers that I’m working with hit upon not one, but two possible solutions (nice work Kyle). However, I’d be interested in other possible approaches.

I’ll publish Kyle’s method(s) next week.

Home

Integer Log functions

Sunday, May 11th, 2008 Nigel Jones

A few months ago I wrote about a very nifty square root function in Jack Crenshaw’s book “Math Toolkit for Real-time Programming”. As elegant as the square root function is, it pails in comparison to what Crenshaw calls his ‘bitlog’ function. This is some code that computes the log (to base 2 of course) of an integer – and does it in amazingly few cycles and with amazing accuracy. The code in the book is for a 32 bit integer; the code I present here is for a 16 bit integer. Although you are of course free to use this code as is, I strongly suggest you buy Crenshaw’s book and read about this function. You’ll see it truly is a work of art. BTW, one of the things I really like about Crenshaw is that he takes great pains to note that he didn’t invent this algorithm. Rather he credits Tom Lehman. Kudos to Lehman.

/**
 FUNCTION: bitlog

 DESCRIPTION:
 Computes 8 * (log(base 2)(x) -1).

 PARAMETERS:
 -    The uint16_t value whose log we desire

 RETURNS:
 -    An approximation to log(x)

 NOTES:
 -   

**/
uint16_t bitlog(uint16_t x)
{
    uint8_t    b;
    uint16_t res;

    if (x <=  8 ) /* Shorten computation for small numbers */
    {
        res = 2 * x;
    }
    else
    {
        b = 15; /* Find the highest non zero bit in the input argument */
        while ((b > 2) && ((int16_t)x > 0))
        {
            --b;
            x <<= 1;
        }
        x &= 0x7000;
        x >>= 12;

        res = x + 8 * (b - 1);
    }

    return res;
}

Home

Continued Fractions

Saturday, May 19th, 2007 Nigel Jones

Once in a while something happens that makes me realize that techniques that I routinely use are simply not widely known in the embedded world. I had such an epiphany recently concerning continued fractions. If you don’t know what these are, then check out this link.

As entertaining as the link is, let me cut to the chase as to why you need to know this technique. In a nutshell , in the embedded world we often need to perform fixed point arithmetic for cost / performance reasons. Although this is not a problem in many cases, what happens when you need to multiply something by say 1.2764? The naive way to do this might be:

uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = (x * 12764) / 10000;
 return y;
}

As written, this will fail because of numeric overflow in the expression (x * 12764). Thus it’s necessary to throw in some very expensive casts. E.g.

uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = ((uint32_t)x * 12764) / 10000;
 return y;
}

Our speedy integer arithmetic isn’t looking so good now is it?

What we really want to do is to use a fraction (a/b) that is a close approximation to 1.2764 – but (in this case) has a numerator that doesn’t exceed 255 (so that we can do the calculation in 16 bit arithmetic).

Enter continued fractions. One of the many uses for this technique is finding fractions (a/b) that are approximations to real numbers. In this case using the calculator here, we get the following results:

Convergents:
1: 1/1 = 1
3: 4/3 = 1.3333333333333333
1: 5/4 = 1.25
1: 9/7 = 1.2857142857142858
1: 14/11 = 1.2727272727272727
1: 23/18 = 1.2777777777777777
1: 37/29 = 1.2758620689655173
1: 60/47 = 1.2765957446808511
1: 97/76 = 1.2763157894736843
1: 157/123 = 1.2764227642276422
2: 411/322 = 1.2763975155279503
1: 1801/1411 = 1.2763997165131113
1: 3191/2500 = 1.2764

We get higher accuracy as we go down the list. In this case, I chose the approximation (157 / 123) because it’s the highest accuracy fraction that has a numerator less than 255. Thus my code now becomes:

uint16_t scale(uint8_t x)
{
 uint16_t y;
 y = ((uint16_t)x * 157) / 123;
 return y;
}

The error is less than 0.002% – but the calculation speed is dramatically improved because I don’t need to resort to 32 bit arithmetic. [On an ATmega88 processor, calling scale() for every value from 0-255 took 148,677 cycles for the naive approach and 53,300 cycles for the continued fraction approach.]

Incidentally, you might be wondering if there are other fractions that give better results than the ones generated by this technique. The mathematicians tell us no.

So there you have it. A nifty technique that once you know about it will make you wonder how you got along without it for all these years.

Home