Archive for the ‘Efficient C/C++’ Category

Efficient C Tips #8 – Use const

Monday, March 30th, 2009 Nigel Jones

One of the easiest ways to make your code more efficient is to use const wherever feasible. Just like declaring local functions as static, this is one of those changes that makes your code more robust, more maintainable and faster – a true win-win situation. So how does this work? Well you get the most benefit when passing pointers as parameters to functions. Here’s an example of a function whose job it is to compute the sum of an array of integers. The naive implementation would look something like this:

uint32_t sum(uint16_t *ptr, uint16_t n_elements)
{
 uint16_t lpc;
 uint32_t sum = 0;

 for (lpc = 0; lpc < n_elements; lpc++)
 {
  sum += *ptr++;
 }
 return sum;
}

I’ll ignore the issues of post increment and counting up (for now). Instead, consider the declaration of ptr. As it stands, the caller of this function has no idea whether sum() will modify the data or not, and hence must assume that it does. This has obvious implications for the compiler when it comes to optimization. To overcome this, it is necessary to declare ptr as pointing to const. The function prototype for sum() now becomes:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements);

You’ll notice that I prefer to use what I call Saks notation for where I place the const modifier. The more conventional, albeit less sensible way of writing the declaration is:

uint32_t sum(const uint16_t *ptr, uint16_t n_elements);

Regardless of the style, by doing this you are indicating to the compiler that you will not be modifying the data that ptr points to. As a result, the optimizer can make assumptions that will typically lead to tighter code.

Before I give you the final code, I’d like to make a few other observations.

  • As well as potentially making your code more efficient, use of const also makes your code more readable and maintainable. That is, someone examining your code will know something extra about the function simply by looking at the prototype. Personally I find this very useful.
  • If you examine the C standard library, you’ll find very liberal use of the const modifier. You should take this as a strong hint that it’s a good idea.
  • PC-Lint will very helpfully tell you if a pointer can be declared as pointing to const. Yet another reason for using Lint!

So what does my sum() function look like? Well, incorporating my previous hints on post increment and counting down, it looks something like this:

uint32_t sum(uint16_t const *ptr, uint16_t n_elements)
{
 uint32_t sum = 0;

 for (; n_elements != 0; --n_elements)
 {
  sum += *ptr;
  ++ptr;
 }
 return sum;
}

Next Tip
Previous Tip
Home

Editorial Note

I’ve been following my own advice and have been on a short vacation. As a result I’ve been tardy in responding to some of your comments. I’ll try and rectify this over the next few days.

Efficient C Tips #7 – Fast loops

Thursday, March 5th, 2009 Nigel Jones

Every program at some point requires some set of actions to be taken a fixed number of times. Indeed this is such a common occurrence that we typically code it without really giving it much thought. For example, if I asked you to call a function foo() ten times, I’m sure that most of you would write something like this:

for (uint8_t lpc = 0; lpc < 10; ++lpc)
{
 foo();
}

While there is nothing wrong with this, per se, it is sub optimal on just about every processor. Instead you are better off using a construct which counts down to zero. Here are two alternative ways of doing this:

for(uint8_t lpc = 10; lpc != 0; --lpc)
{
 foo();
}

uint8_t lpc = 10;
do
{
 foo();
} while (--lpc);

Which one you think is more natural is entirely up to you.

So how does this efficiency arise? Well in the count up case, the assembly language generated by the compiler typically looks something like this:

INC lpc      ; Increment loop counter
SUB lpc, #10 ; Compare loop counter to 10
BNZ loop     ; Branch if loop counter not equal to 10

By contrast, in the count down case the assembly language looks something like this

DEC lpc   ; Decrement loop counter
BNZ loop  ; Branch if non zero

Evidently, because of the ‘specialness’ of zero, more efficient code can be generated.

So why don’t you see C programs littered with these count down constructs? Well counting down has a major limitation. If you need to use the loop variable as an index into an array then you have a problem. For example, let’s say I wanted to zero the elements of an array. Using the count down technique you might be tempted to do this:

uint8_t bar[10];
uint8_t lpc;

do
{
 bar[lpc] = 0; // Error! First time through results in index beyond end of array
} while (--lpc);

Evidently it doesn’t work. You can of course modify the code to make it work. However doing so typically loses you all the efficiency gains, such that you are better off with a standard up-counting for loop.

As a parting thought, concepts such as these are second nature to assembly language programmers – all of whom do this sort of thing instinctively. As a result, if you are really interested in getting the best out of your C compiler, you could do a lot worse than learning how to program your target processor in assembly language. Does this defeat one of the objectives in programming in a high level language – yes. However, for giving you insight in terms of what is going on under the hood it cannot be beaten.

Next Tip
Previous Tip
Home

Efficient C Tips #6 – Don’t use the ternary operator

Wednesday, February 18th, 2009 Nigel Jones

I have to confess that I like the ternary operator. K&R obviously liked it, as it is heavily featured in their seminal work. However after running experiments on a wide range of compilers I have concluded that with the optimizer turned on, you are better off with a simple if-else statement. Thus next time you write something like this:

y = (a > b) ? c : d;

be aware that as inelegant as it is in comparison, this will usually compile to better code:

if (a > b)
{
 y = c;
}
else
{
 y = d;
}

I find this frustrating, as I’ve consumed 8 lines doing what is more easily and elegantly performed in 1 line.

I can’t say that I have any particular insight as to why the ternary operator performs so poorly. Perhaps if there is a compiler writer out there, they could throw some light on the matter?

Next Tip
Previous Tip
Home

Horner's rule and related thoughts

Monday, January 5th, 2009 Nigel Jones

Recently I was examining some statistical data on the performance of a sensor against temperature. The data were from a number of sensors and I was interested in determining a mathematical model that most closely described the sensors’ performance. Using the regression tools built into Excel, I was looking at the various models, from a ‘goodness of fit’ perspective. After playing around for a while, I came to the conclusion that a quadratic polynomial really was the best fit, and should be the model to adopt. At this point, I turned to the issue of computational efficiency.

Now, it turns out that there is a relatively well known algorithm for evaluating polynomials, called Horner’s rule. I say relatively well known, because I’d say about half the time I see a polynomial evaluated, it doesn’t use Horner’s rule, but instead evaluates the polynomial directly. Thus in an effort to increase the use of Horner’s rule, I thought I’d mention it here.

OK, so what is it? Well it’s based on simply refactoring a polynomial expression:

anxn + a(n-1)x(n-1) + … + a0=((anx + a(n-1))x +…)x + a0.

Thus a polynomial of order n, requires exactly n multiplications and n additions.

For example:

23.1x2 – 45.6x + 12.3 = (23.1x -45.6)x + 12.3

In this case a quadratic equation or order 2, using Horner’s rule requires 2 multiplications and two additions to evaluate the polynomial, versus the direct approach which requires 5 multiplications and 2 additions.

For those of you that are looking for code to just use, then this snippet will work. This is for a cubic polynomial. COEFFN is the coefficient of xN.

y = x * COEFF3;
y += COEFF2;
y *= x
y += COEFF1;
y *= x
y += COEFF0;

The recurrence relationship for higher order polynomials should be obvious. Note that unlike most implementations, I perform the code in line, rather than using a loop.

It should be noted that as well as being more computationally efficient, Horner’s rule is also more accurate. This comes about in two ways:

  • The very act of using less floating point operations leads to less rounding errors
  • Higher order polynomials generate very large numbers in a hurry. Horner’s method significantly reduces the magnitude of the intermediate values, thus minimizing problems associated with adding / subtracting floating point numbers that differ in magnitude

Although Horner’s rule is a nice tool to have at one’s disposal, I think there is a larger point to be made here. Whenever you need to perform any sort of calculation, there is nearly always a superior method than the obvious direct method of evaluation. Sometimes it requires algebraic manipulation such as for Horner’s rule. Other times, it’s an approximation method, and other times it’s just a flat out really neat algorithm (see for example my posting on Crenshaw’s square root code). The bottom line. Next time you write code to perform some sort of numerical calculation, take a step back and investigate possibilities other than direct computation. You’ll probably be glad you did.

Update

There is a highly relevant addendum to this posting here.

Home

Efficient C Tips #5 – Make ‘local’ functions ‘static’

Saturday, December 13th, 2008 Nigel Jones

In my humble opinion, one of the biggest mistakes the designers of the ‘C’ language made, was to make the scope of all functions global by default. In other words, whenever you write a function in ‘C’, by default any other function in the entire application may call it. To prevent this from happening, you can declare a function as static, thus limiting its scope to typically the module it resides in. Thus a typical declaration looks like this:

static void function_foo(int a)
{
}

Now I’d like to think that the benefits of doing this to code stability are so obvious that everyone would do it as a matter of course. Alas, my experience is that those of us that do this are in a minority. Thus in an effort to persuade more of you to do this, I’d like to give you another reason – it can lead to much more efficient code. To illustrate how this comes about, let’s consider a module called adc.c This module contains a number of public functions (i.e. functions designed to be called by the outside world), together with a number of functions that are intended to be called only by functions within adc.c. Our module might look something like this:

void adc_Process(void)
{
 ...
 fna();
 ...
 fnb(3);
}
...
void fna(void)
{
  ...
}

void fnb(uint8_t foo)
{
 ...
}

At compile time, the compiler will treat fna() and fnb() like any other function. Furthermore, the linker may link them ‘miles’ away from adc_Process(). However, if you declare fna() and fnb() as ‘static’, then something magical happens. The code would now look like this:

static void fna(void);
static void fnb(uint8_t foo);

void adc_Process(void)
{
 ...
 fna();
 ...
 fnb(3);
}

...

static void fna(void)
{
 ...
}

static void fnb(uint8_t foo)
{
 ...
}

In this case, the compiler will know all the possible callers of fna() and fnb(). With this information to hand, the compiler / linker will potentially do all of the following:

  • Inline the functions, thus avoiding the overhead of a function call.
  • Locate the static functions close to the callers such that a ‘short’ call or jump may be performed rather than a ‘long’ call or jump.
  • Look at registers used by the local functions and thus only stack the required scratch registers rather than stacking all of the registers required by the compiler’s calling convention

Together these can add up to a significant reduction in code size and a commensurate increase in execution speed.

Thus making all non public functions not only makes for better code quality, it also leads to more compact and faster code. A true win-win situation! Thus if you are not already doing this religiously, I suggest you go through your code and do it now. I guarantee you’ll be very pleased with the results.

Next Tip
Previous Tip

A Request …

If I’m to believe the statistics for this blog, it appears that I’m gradually building a decent sized readership. Furthermore many of you choose to come back and read the latest postings which tells me that I’m doing something of value. Anyway, if this describes you, I’d be obliged if you’d encourage your colleagues to read the blog and also to post comments / questions. Why do I ask this? Well, an increased readership has several benefits, for both me and you the readers.

  • I believe quite passionately about improving the quality of embedded systems. Those of us that are working in this field collectively have an enormous impact on the world. Thus anything that helps improve the quality of embedded systems in turn helps improve the world. (I appreciate that this is a little melodramatic. It is, however, true).
  • Writing about something is the best way to I know to find out if I truly understand it. Thus, the very act of publishing a blog causes me to improve my skills and knowledge.
  • Some of the (too few) comments I get are quite profound and often instructive. Thus I also learn in this way.
  • The bigger the readership I have, the more inclined I am to publish. If I’m publishing things of value, then presumably the readers benefit.

Anyway, if you concur, then please encourage your colleagues. If you don’t, then that’s OK as well.

Thanks for reading.

Home