Archive for the ‘General C issues’ Category

The absolute truth about abs()

Wednesday, February 1st, 2012 Nigel Jones

One of the more depressing things about the C language is how often the results of various operations are undefined. A prime example of this is the abs() function that I’m fairly sure is liberally dispersed throughout your code (it is through mine). The undefined operation of the abs() function comes about if you have the temerity to use a compiler that represents negative numbers using 2′s complement notation. In this case, the most negative representable number is always numerically larger than the most positive representable number. In plain English, for 16 bit integers, the range is -32768 … + 32767. Thus if you pass -32768 to the abs() function, the result is undefined.

The problem of course in an embedded system is that undefined operations are just dangerous, so surely an embedded compiler will do something sensible, like return +32767 if you pass -32768 to abs? To test this hypothesis I whipped up the following code for my favourite 8 bit compiler (IAR’s AVR compiler).

#include <stdlib.h>
#include <inttypes.h>
#include <stdio.h>

void main(void)
{
    int16_t i;
    int16_t absi;

    i = INT16_MIN + 1;        /* Set i to one more than the most negative number */
    absi = abs(i);

    printf("Argument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);

    i--;                    /* i should now equal INT16_MIN */
    absi = abs(i);
    printf("\nArgument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);
}

If you don’t understand the printf strings, see this article. Here’s the output of this code:

Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = -32768

Clearly abs(-32768) = -32768 is not a very useful result! If I look in <stdlib.h> I find that abs() is implemented as

  int abs(int i)
  {      /* compute absolute value of int argument */
    return (i < 0 ? -i : i);
  }

Clearly no check is being made on the bounds of the parameter, and so the result that we get depends upon how the negation of the argument is performed. Thus this leads me to my first suggestion, namely write your own abs function. I’ll call this function sabs() for safe abs(). The first pass implementation for sabs() looks like this (note you’ll have to include <limits.h> to get INT_MIN and INT_MAX):

int sabs(int i)
{
    int res;

    if (INT_MIN == i)
    {
        res = INT_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

Here’s the output:

Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = 32767

I think for most embedded systems this is a better result. However what happens if you use a smaller integer than the native integer size (e.g. a 16 bit integer on a 32 bit system, or an 8 bit integer on a 16 bit system?). To test this question, I modified the code thus:

#include <stdlib.h>
#include <limits.h>
#include <inttypes.h>
#include <stdio.h>

int sabs(int i);

void main(void)
{
    int8_t i;
    int8_t absi;

    i = INT8_MIN + 1;        /* Set i to one more than the most negative number */
    absi = sabs(i);

    printf("Argument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);

    i--;                    /* i should now equal INT8_MIN */
    absi = sabs(i);
    printf("\nArgument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);
}

int sabs(int i)
{
    int res;

    if (INT_MIN == i)
    {
        res = INT_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

So in this case, the native integer size is 16 bits and I’m passing an 8 bit integer. Here’s the output:

Argument = -127. Absolute value of argument = 127
Argument = -128. Absolute value of argument = -128

Clearly, I still haven’t solved the problem, as I’d really like abs(-128) to be 127 when using 8 bit integers.  I suspect that I could come up with some fancy expression that handles all integer types.  However I’m a great believer in simple code, and so  my recommendation is that you write sabs() functions for all your integer types. Thus:

/* Safe 8 bit absolute function */
int8_t sabs8(int8_t i)
{
    int8_t res;

    if (INT8_MIN == i)
    {
        res = INT8_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}
/* Safe 16 bit absolute function */
int16_t sabs16(int16_t i)
{
    int16_t res;

    if (INT16_MIN == i)
    {
        res = INT16_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}
/* Safe 32 bit absolute function */
int32_t sabs32(int32_t i)
{
    int32_t res;

    if (INT32_MIN == i)
    {
        res = INT32_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

 

The above approach is all well and good, but let’s face it, I’ve added a lot of overhead for a very rare condition. So this raises the question as to whether there is a better way of doing things? Well in many cases we use the abs() function to check for some limit. For example

    if (abs(i) > SOME_LIMIT)
    {
        printf("\nLimit exceeded");
    }

In cases like these, you can use what I call a negative absolute function, aka nabs(). nabs() works with negative absolutes and so can’t overflow. To demonstrate, here’s the code:

int nabs(int i);

void main(void)
{
    int i;
    int absi;

    i = INT_MIN + 1;        /* Set i to one more than the most negative number */
    absi = nabs(i);

    printf("Argument = %d Negative absolute value of argument = %d", i, absi);

    i--;                    /* i should now equal INT_MIN */
    absi = nabs(i);
    printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);

    i = INT_MAX;
    absi = nabs(i);
    printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);
}

int nabs(int i)
{
    return i > 0 ? -i : i;
}

The output looks like this:

Argument = -32767 Negative absolute value of argument = -32767
Argument = -32768 Negative absolute value of argument = -32768
Argument = 32767 Negative absolute value of argument = -32767

Armed with this function, you merely flip your tests around, such that you have:

    if (nabs(i) < SOME_NEGATIVE_LIMIT)
    {
        printf("\nLimit exceeded");
    }

I’ll leave it to you to decide whether gaining the efficiency is worth it for the rather strange looking code.

As a final note, do a search for the abs() function on the Internet. You’ll find that most references don’t mention the undefined behavior of abs with INT_MIN as an argument. The notable exception is the always excellent GNU reference . It’s thus hardly surprising that most embedded systems use abs().

 

 

Simplify, then add lightness

Saturday, December 31st, 2011 Nigel Jones

As 2011 draws to a close I have reason to be thinking about things automotive. As part of my musings, I was reminded of the famous mantra espoused by the late Colin Chapman of Lotus Cars – ‘Simplify, then add lightness’. I have always liked this aphorism for its idea of ‘adding lightness’ rather than ‘subtracting weight’.

So what’s this got to do with embedded systems you ask? Well, when it comes to embedded systems programming, I think Chapman’s concept is spot on. While most people would probably agree that simplifying an embedded system is a good thing, I wonder though how many actively work so as to achieve this? In my case I regularly hold conversations with my clients which can be summarized as: Yes I can do that, but it comes at the expense of complexity – and complexity leads to bugs – and bugs are very expensive. I have found that clients are quite appreciative of this as it forces them to evaluate what is truly important about the product. Incidentally, when I first started in the consulting business, I thought that it was my job to give the client whatever it was they asked for. The fact that in doing so I was often giving them a serious disservice was quite a revelation when the penny finally dropped.

So what about the ‘adding lightness’ directive? Well clearly this is an interesting concept when it comes to firmware. I take it to mean that the code should be simple, easy to read, and fast in execution – in other words not unlike a Lotus sports car. If I have cause to re-read the code many months or years later and I find that the code has stood the test of time and is a pleasure to read, then I think I have achieved the required lightness. If by contrast the code is out of date and ugly to read then I know I have failed. In short it’s the difference between designing a Lotus Elan and an AMC Pacer.

 

 

The N_ELEMENTS macro

Friday, March 18th, 2011 Nigel Jones

Many years ago I came across a simple macro that has proven to be quite useful. Its usual definition looks something like this:

#define N_ELEMENTS(X)           (sizeof(X)/sizeof(*(X)))

Its nominal use is to determine the number of elements in an incomplete array declaration. For example

void foo(void)
{
 uint8_t bar[] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

Clearly this is quite useful. However, once you have this macro in your arsenal you will eventually run into a conundrum. To illustrate what I mean consider the following code:

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < BUF_SIZE; ++i)
 {
  txc(bar[i]);
 }
}

This uses the classic approach of  defining a manifest constant  (BUF_SIZE) and then using it to define the array size and also as the loop limit. The conundrum is this: is one better off using N_ELEMENTS in this case as well. In other words, is the following better code?

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE] = {0, 1, 2, 3, 4};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

This code is guaranteed to operate on every element of the array bar[] regardless of what is done to the array declaration. For example:

#define BUF_SIZE    (5)
void foo(void)
{
 uint8_t bar[BUF_SIZE + 1] = {0, 1, 2, 3, 4, 5};
 uint8_t    i;

 /* Transmit each byte in bar[] */
 for (i = 0; i < N_ELEMENTS(bar); ++i)
 {
  txc(bar[i]);
 }
}

In this case I have changed the array declaration. The code that uses N_ELEMENTS would still work while the code that used BUF_SIZE would have failed. So from this perspective the N_ELEMENTS code is more robust. However I don’t think the N_ELEMENTS based code is as easy to read. As a result I have oscillated back and fore over the years as to which approach is better. My current view is that the N_ELEMENTS approach is indeed the better way. I’d be interested in your opinion.

Efficient C Tip #13 – use the modulus (%) operator with caution

Tuesday, February 8th, 2011 Nigel Jones

This is the thirteenth in a series of tips on writing efficient C for embedded systems.  As the title suggests, if you are interested in writing efficient C, you need to be cautious about using the modulus operator.  Why is this? Well a little thought shows that C = A % B is equivalent to C = A – B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result it’s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely have to use the modulus operator. However in many cases it’s possible to restructure the code such that the modulus operator is not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.

Converting seconds to days, hours, minutes and seconds

In Embedded Systems Design there is an increasing need for some form of real time clock. When this is done, the designer typically implements the time as a 32 bit variable containing the number of seconds since a particular date. When this is done, it’s not usually long before one has to convert the ‘time’ into days, hours, minutes and seconds. Well I found myself in just such a situation recently. As a result, I thought a quick internet search was in order to find the ‘best’ way of converting ‘time’ to days, hours, minutes and seconds. The code I found wasn’t great and as usual was highly PC centric. I thus sat down to write my own code.

Attempt #1 – Using the modulus operator

My first attempt used the ‘obvious’ algorithm and employed the modulus operator. The relevant code fragment appears below.

void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 seconds = time % 60UL;
 time /= 60UL;
 minutes = time % 60UL;
 time /= 60UL;
 hours = time % 24UL;
 time /= 24UL;
 days = time;  
}

This approach has a nice looking symmetry to it.  However, it contained three divisions and three modulus operations. I thus was rather concerned about its performance and so I measured its speed for three different architectures – AVR (8 bit), MSP430 (16 bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted are for 10 invocations of the test code and include the test harness overhead:

AVR:  29,825 cycles

MSP430: 27,019 cycles

ARM Cortex: 390 cycles

No that isn’t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my claim that the modulus operator can be very inefficient is true for some architectures – but not all.  Thus if you are using the modulus operator on an ARM processor then it’s probably not worth worrying about. However if you are working on smaller processors then clearly something needs to be done  – and so I investigated some alternatives.

Attempt #2 – Replace the modulus operator

As mentioned in the introduction,  C = A % B is equivalent to C = A – B * (A / B). If we compare this to the code in attempt 1, then it should be apparent that the intermediate value (A/B) computed as part of the modulus operation is in fact needed in the next line of code. Thus this suggests a simple optimization to the algorithm.

void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 days = time / (24UL * 3600UL);    
 time -= days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = time / 3600UL;
 time -= (hours * 3600UL);
 /* time now contains the number of seconds in the last hour */
 minutes = time / 60U;
 seconds = time - minutes * 60U;
 }

In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators in one (- * /).  Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results however were a little surprising:

AVR:  18,720 cycles

MSP430: 14,805 cycles

ARM Cortex: 384 cycles

Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no impact on the ARM code.  Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the ARM results, it’s clear that at least in this example, it’s possible to significantly speed up an algorithm by eliminating the modulus operator.

I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.

Attempt #3 – Data type size reduction

My naive implementation of the code looked like this:

void compute_time(uint32_t time)
{
 uint16_t    days;
 uint8_t     hours, minutes, seconds;
 uint16_t    stime;

 days = (uint16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}

All I have done is change the data types and to add casts where appropriate. The results were interesting:

AVR:  14,400 cycles

MSP430: 11,457 cycles

ARM Cortex: 434 cycles

Thus while this resulted in a significant improvement for the AVR & MSP430, it resulted in a significant worsening for the ARM. Clearly the ARM doesn’t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a lot more portable – and that is to use the C99 fast types. Doing this gives the following code:

Attempt #4 – Using the C99 fast data types

void display_time(uint32_t time)
{
 uint_fast16_t    days;
 uint_fast8_t    hours, minutes, seconds;
 uint_fast16_t    stime;

 days = (uint_fast16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint_fast8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}

All I have done is change the data types to the C99 fast types. The results were encouraging:

AVR:  14,400 cycles

MSP430: 11,595 cycles

ARM Cortex: 384 cycles

Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast and portable.

Conclusion

Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for further optimizations. As a result with the AVR & MSP430 I was able to more than halve the execution time.

Converting Integers for Display

A similar problem (with a similar solution) occurs when one wants to display integers on a display. For example if you are using a custom LCD panel with say a 3 digit numeric field, then the problem arises as to how to determine the value of each digit. The obvious way, using the modulus operator is as follows:

void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value > 999)
 {
 value = 999;
 }

 lsd = value % 10;
 value /= 10;
 nsd = value % 10;
 value /= 10;
 msd = value;

 /* Now display the digits */
}

However, using the technique espoused above, we can rewrite this much more efficiently as:

void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value > 999U)
 {
  value = 999U;
 }

 msd = value / 100U;
 value -= msd * 100U;

 nsd = value / 10U;
 value -= nsd * 10U;

 lsd = value;

 /* Now display the digits */
}

If you benchmark this you should find it considerably faster than the modulus based approach.

Previous Tip

Formatted output when using C99 data types

Tuesday, February 1st, 2011 Nigel Jones

Regular readers of this blog will know that I am a proponent of using the C99 data types. They will also know that I’m no fan of formatted output. Notwithstanding this, I do use formatted output (particularly vsprintf) on larger systems. Well if you use the C99 data types and you use formatted output, you will quickly run into a problem – namely what modifier do you give printf()  to print say a uint16_t variable? Now if you are working on an 8 or 16 bit architecture, then you’d probably be OK guessing that %u would work quite nicely. However if you are working on a 32 bit architecture, what would you use for say a uint_fast8_t variable? Well it so happens that the C99 folks were aware of this problem and came up with just about the ugliest solution imaginable.

inttypes.h

In order to solve this problem, you first of all need to #include a file inttypes.h. This header file in turn includes stdint.h so that you have access to the C99 data types. If you examine this file, you will find that it consists of a large number of definitions. An example definition might look like this:

#define PRId16 __INT16_SIZE_PREFIX__ "d"

If you are like me, when I first saw this I was a little puzzled. How exactly was this supposed to help? Well I’ll give you an example of its usage, and then explain how it works.

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %" PRId16, value);
}

So what’s going on here? Well let’s assume for now that __INT16_SIZE_PREFIX__ is in turn defined to be “h”.  Our code is converted by the preprocessor into the following:

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %" "h" "d", value);
}

At compile time, the successive strings “Value = %” “h” “d” are concatenated into the single string “Value = %hd”, so that we end up with:

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %hd", value);
}

This is legal syntax for printf. More importantly, the correct format string for this implementation is now being passed to printf () for an int16_t data type.

Thus the definitions in inttypes.h allow one to write portable formatted IO while still using the C99 data types.

Naming Convention

Examination of inttypes.h shows that a consistent naming convention has been used. For output, the constant names are constructed thus:

<PRI><printf specifier><C99 modifier><number of bits> where

<PRI> is the literal characters PRI.

<printf specifier> is the list of integer specifiers we all know so well {d, i, o, u, x, X}

<C99 modifier> is one of {<empty>, LEAST, FAST, MAX, PTR}

<number of bits> is one of {8, 16, 32,64 <empty>}. <empty> only applies to the MAX and PTR C99 modifiers.

Examples:

To print a uint_fast8_t in lower case hexadecimal you would use PRIxFAST8.

To print a int_least64_t in octal you would use PRIoLEAST64.

Formatted Input

For formatted input, simply replace PRI with SCN.

Observations

While I applaud the C99 committee for providing this functionality, it can result in some dreadful looking format statements. For example here’s a string from a project I’m working on:

wr_vstr(1, 0, MAX_STR_LEN, "%-+*" PRId32 "%-+4" PRId32 "\xdf", tap_str_len, tap, angle);

Clearly a lot of this has to do with the inherently complex formatted IO syntax. The addition of the C99 formatters just makes it even worse.

Personally I’d have liked the C99 committee to have bitten the bullet and introduced a formatted IO function that had the following characteristics:

  1. Explicit support for the C99 data types.
  2. No support for octal. Does anyone ever use the octal formatter?
  3. Support for printing binary – this I do need to do from time to time.
  4. A standard defined series of reduced functionality formatted IO subsets. This way I’ll know that if I restrict myself to a particular set of format types I can use the smallest version of the formatted IO function.

PC Lint

Regular readers will also know that I’m a major proponent of using PC-Lint from Gimpel. I was surprised to discover that while Lint is smart enough to handle string concatenation with printf() etc, it doesn’t do it with user written functions that are designed to accept format strings. For example, the function wr_vstr() referenced above looks like this:

static void wr_vstr(uint_fast8_t row, uint_fast8_t col, uint_fast8_t width, char const * format, ...)
{
 va_list  args;
 char  buf[MAX_STR_LEN];

 va_start(args, format);
 (void)vsnprintf(buf, MAX_STR_LEN, format, args);     /* buf contains the formatted string */

 wr_str(row, col, buf, width);    /* Call the generic string writer */

 va_end(args);                    /* Clean up. Do NOT omit */
}

I described this technique here. Anyway, if you use the inttypes.h constants like I did above, then you will find that PC-Lint complains loudly.

Final Thoughts

Inttypes.h is very useful for writing portable formatted IO with the C99 data types. It’s ugly – but it beats the alternative. I recommend you add it to your bag of tricks.