embedded software boot camp

The absolute truth about abs()

Wednesday, February 1st, 2012 by Nigel Jones

One of the more depressing things about the C language is how often the results of various operations are undefined. A prime example of this is the abs() function that I’m fairly sure is liberally dispersed throughout your code (it is through mine). The undefined operation of the abs() function comes about if you have the temerity to use a compiler that represents negative numbers using 2′s complement notation. In this case, the most negative representable number is always numerically larger than the most positive representable number. In plain English, for 16 bit integers, the range is -32768 … + 32767. Thus if you pass -32768 to the abs() function, the result is undefined.

The problem of course in an embedded system is that undefined operations are just dangerous, so surely an embedded compiler will do something sensible, like return +32767 if you pass -32768 to abs? To test this hypothesis I whipped up the following code for my favourite 8 bit compiler (IAR’s AVR compiler).

#include <stdlib.h>
#include <inttypes.h>
#include <stdio.h>

void main(void)
{
    int16_t i;
    int16_t absi;

    i = INT16_MIN + 1;        /* Set i to one more than the most negative number */
    absi = abs(i);

    printf("Argument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);

    i--;                    /* i should now equal INT16_MIN */
    absi = abs(i);
    printf("\nArgument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);
}

If you don’t understand the printf strings, see this article. Here’s the output of this code:

Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = -32768

Clearly abs(-32768) = -32768 is not a very useful result! If I look in <stdlib.h> I find that abs() is implemented as

  int abs(int i)
  {      /* compute absolute value of int argument */
    return (i < 0 ? -i : i);
  }

Clearly no check is being made on the bounds of the parameter, and so the result that we get depends upon how the negation of the argument is performed. Thus this leads me to my first suggestion, namely write your own abs function. I’ll call this function sabs() for safe abs(). The first pass implementation for sabs() looks like this (note you’ll have to include <limits.h> to get INT_MIN and INT_MAX):

int sabs(int i)
{
    int res;

    if (INT_MIN == i)
    {
        res = INT_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

Here’s the output:

Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = 32767

I think for most embedded systems this is a better result. However what happens if you use a smaller integer than the native integer size (e.g. a 16 bit integer on a 32 bit system, or an 8 bit integer on a 16 bit system?). To test this question, I modified the code thus:

#include <stdlib.h>
#include <limits.h>
#include <inttypes.h>
#include <stdio.h>

int sabs(int i);

void main(void)
{
    int8_t i;
    int8_t absi;

    i = INT8_MIN + 1;        /* Set i to one more than the most negative number */
    absi = sabs(i);

    printf("Argument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);

    i--;                    /* i should now equal INT8_MIN */
    absi = sabs(i);
    printf("\nArgument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);
}

int sabs(int i)
{
    int res;

    if (INT_MIN == i)
    {
        res = INT_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

So in this case, the native integer size is 16 bits and I’m passing an 8 bit integer. Here’s the output:

Argument = -127. Absolute value of argument = 127
Argument = -128. Absolute value of argument = -128

Clearly, I still haven’t solved the problem, as I’d really like abs(-128) to be 127 when using 8 bit integers.  I suspect that I could come up with some fancy expression that handles all integer types.  However I’m a great believer in simple code, and so  my recommendation is that you write sabs() functions for all your integer types. Thus:

/* Safe 8 bit absolute function */
int8_t sabs8(int8_t i)
{
    int8_t res;

    if (INT8_MIN == i)
    {
        res = INT8_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}
/* Safe 16 bit absolute function */
int16_t sabs16(int16_t i)
{
    int16_t res;

    if (INT16_MIN == i)
    {
        res = INT16_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}
/* Safe 32 bit absolute function */
int32_t sabs32(int32_t i)
{
    int32_t res;

    if (INT32_MIN == i)
    {
        res = INT32_MAX;
    }
    else
    {
        res = i < 0 ? -i : i;
    }

    return res;
}

 

The above approach is all well and good, but let’s face it, I’ve added a lot of overhead for a very rare condition. So this raises the question as to whether there is a better way of doing things? Well in many cases we use the abs() function to check for some limit. For example

    if (abs(i) > SOME_LIMIT)
    {
        printf("\nLimit exceeded");
    }

In cases like these, you can use what I call a negative absolute function, aka nabs(). nabs() works with negative absolutes and so can’t overflow. To demonstrate, here’s the code:

int nabs(int i);

void main(void)
{
    int i;
    int absi;

    i = INT_MIN + 1;        /* Set i to one more than the most negative number */
    absi = nabs(i);

    printf("Argument = %d Negative absolute value of argument = %d", i, absi);

    i--;                    /* i should now equal INT_MIN */
    absi = nabs(i);
    printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);

    i = INT_MAX;
    absi = nabs(i);
    printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);
}

int nabs(int i)
{
    return i > 0 ? -i : i;
}

The output looks like this:

Argument = -32767 Negative absolute value of argument = -32767
Argument = -32768 Negative absolute value of argument = -32768
Argument = 32767 Negative absolute value of argument = -32767

Armed with this function, you merely flip your tests around, such that you have:

    if (nabs(i) < SOME_NEGATIVE_LIMIT)
    {
        printf("\nLimit exceeded");
    }

I’ll leave it to you to decide whether gaining the efficiency is worth it for the rather strange looking code.

As a final note, do a search for the abs() function on the Internet. You’ll find that most references don’t mention the undefined behavior of abs with INT_MIN as an argument. The notable exception is the always excellent GNU reference . It’s thus hardly surprising that most embedded systems use abs().

 

 

31 Responses to “The absolute truth about abs()”

  1. Boudewijn Dijkstra says:

    (unsigned int)abs(i) problem solved?

    • Nigel Jones says:

      Alas no. Since the result of abs(INT_MIN) is undefined, you can’t make any assumptions. For example, one compiler might return zero, while another might return 0xFFFF

  2. Lundin says:

    Strictly speaking, i = INT_MIN + 1; is undefined behavior too, and could theoretically result in a “trap representation” etc. Though of course in practice, most (all?) embedded systems performs integer overflow in a predictable way and turns the overflowing int into a negative two’s complement number.

    In order to get predictable, safe code I think the software needs to have coded upper and lower limits for an integer variable. Each time the embedded programmer declares and int, they make a decision. What is the largest number that this variable will ever contain? Is there a case where it must be signed? Can I afford to declare it as a 32 bit variable to block evil, bug-generating mechanics like implicit integer promotions?

    I believe that if you declared a variable as int16_t and it gets the value -32768 at some point during execution, in most of the cases it is because the programmer made the wrong decision when they picked the type, they should probably have used int32_t instead. The bug in that case is in the variable declaration and not in the abs() function.

    Now of course this 16 bit variable could be a reading from a timer, ADC etc, ie hardware that gives results over the whole 16-bit span. But in that case, it doesn’t make sense to declare it as signed, it should have been uint16_t in such a case.

    So… I can’t really come up with a real-world scenario where I would ever like to have a int16_t variable with values over the full span between -32768 and 32767. Does such a scenario exist?

    • Nigel Jones says:

      I don’t understand why you think INT_MIN +1 is undefined. Surely INT_MIN – 1 is undefined, but not INT_MIN + 1?

      While I generally agree with your point, I think the issue in the real world is more along these lines. I normally expect a variable to lie between A & B, where presumably A & B are well within the limits of +/- INT_MAX. I have code that checks for an error condition. The code looks some thing like this if (abs(foo) < SOME_LIMIT). However, if something goes wrong where the variable ends up at INT_MIN, then it's possible that the error check fails. Indeed it's precisely the rareness of the condition that makes this failure mode so insidious.

      I'd also like to mention that many ADCs produce an output in 2's complement format. If you happen to be using a 16 bit ADC, then a value of INT_MIN is highly likely.

      • Jörg Seebohn says:

        INT_MIN +1 is defined for sure.

        But I think Boudewijn Dijkstra is right, defining abs as:
        static inline unsigned abs(int i)
        {
        /* compute absolute value of int argument */
        return (unsigned) (i < 0 ? -i : i);
        }
        solves the problem. This should work in the case of two’s complement arithmetic if there is no special "trap" value
        (ones’ complement, and sign + magnitude representations not considered).

        • Nigel Jones says:

          I’m inclined to agree. I have always found it curious that abs() returns a signed value. However, it seems like the authors of the C library considered unsigned integers an annoyance, and thus used them as sparingly as possible.

          • david collier says:

            Nigel – leaving aside the lack of range-checking – is my point about unary minus valid, and not amenable to the strategies suggested for ABS ?

          • Nigel Jones says:

            I’m not sure what you are asking David. Regarding – INT_MIN. If INT_MIN is defined correctly with parentheses, then -INT_MIN becomes on a 16 bit processor -(-32768) = + 32768 , which in turn overflows a signed 16 bit integer. If the definer of INT_MIN has been sloppy and omitted the parentheses, then -INT_MIN becomes – -32768! I shudder to think what happens in this case.

          • david collier says:

            Nigel – that’s eactly what I mean. I’m trying to ask whether pointing out the risk with abs is actually moaning about a level 2 problem when there is a level one problem which is just as serious and harder to fix.

            Seriously if -x is sometimes undefined, shouldn’t we be pointing that out and waving it about, rather than looking at the more sophisticated, and therefore less ubiquitous ( but essentially identical and derivative ) issues with abs?

            After all I can write and use myabs() throughout my programs, and so fixing your problem is a simple question of a gloabl exchange and some error-handling.

            But I can’t intercept the use of a minus sign in any neat and tidy way, and I can’t detect and handle the error, at least not in C.

            So that imples to me that the real issue lies with the asymmetric value range, and it’s more difficult example is around minus, not abs.

            Not that I think you’re wrong to bring the subject up, just that I think you’ve picked the soft target, rather than flagging up the hard one.

            All comes back to the fact that C is a structred assembler, and anyone not trained to work in raw assembler isn’t trained to work in C. Trouble is that covers 90% of the people writing in C.

          • Nigel Jones says:

            You seem to be asking me to wage war with the industry’s choice of languages, rather than engage in a minor tactical skirmish! I have actually given serious thought over the years to what I’d expect to see in an ideal embedded systems language. Perhaps I should start a series of blog postings on the topic?

        • pozz says:

          Is it sufficient to write

          return (unsigned) (i < 0 ? -i : i);

          or have I to write:

          return (i < 0 ? -(unsigned)i : i);

          ??

  3. david collier says:

    pardon my ignorance…. but what is the value of -INT_MIN? Surely that is a bigger, and less obvious, source of problems than the one you so rightly outline. You can’t define your own substitute for unary minus.

    I do just want to cry when I see grown men wrestling with such trivia caused by us all being forced to use C instead of a programming language.

    VAR x INTEGER [ -32767 .. +32767 ]

    Now granted Modula2 didn’t define well how you trapped and dealt with the exception when the value turned out to be -32768… but at least you had the semblence of a controlled situation.

  4. Jeff Gros says:

    I agree with the above discussion that INT_MIN – 1 is undefined. I also agree that abs(-32768) is undefined if the input is 16 bits. I don’t agree with Lundin that INT_MIN + 1 is undefined. Perhaps that was a typo.

    Based on the comment above, I would assume that you are using abs to detect that a difference has been exceeded. For example, perhaps you are checking power supplies for validity, or checking an ADC input with two different references to make sure that the reading is valid.

    diff = A – B;
    if (abs(diff) < SOME_LIMIT)
    ScreamLoudly;

    It seems to me that you should know the size of your input parameters for which you want to perform a difference and then plug them into the abs function. For example, they may be the output of a 16 bit ADC converter.

    I would think that in this case, the most simple solution would just to move up to the next variable size. If A and B are unsigned 16 bits coming from a 16 bit ADC, then diff should be signed 32 bits (a typecast will be required during the assignment to diff, of course).

    Perhaps this is what Lundin was trying to say.

    Using this method, you have no input constraints on A or B: they can be any ADC value that the ADC can produce. But you also have no possibility of taking the absolute value of the "weird number" (-32768 for 16 bits).

    In the example that you provided with an ADC, using the functions sabs8, sabs16, etc is perfectly acceptable, even though they actually change the difference result. A one count error (INT_MAX vs. INT_MAX + 1) will not matter compared to ADC noise.

    However, the idea of it does leave a bad taste in my mouth. I just FEEL better going up to the next variable size for the difference (moving from u16 ADCs to a s32 difference). I suppose it is a matter of personal preference.

    • david collier says:

      Jeff – what’s more you have to work carefully to ensure that your FFFF in 16 bits turns into FFFFFFFF in 32 – it’s quite easy to end up with 0000FFFF.

  5. Ken says:

    I don’t see a problem with the current definition of abs. It returns a signed integer. It seems obvious to me that if a negative number is returned, there is a problem. Return values should always be checked. I believe that you are making a big deal out of something trivial.

    However, I am willing to play the game. Let’s say it was explicitly checking for INT_MIN. How should it alert the user? The only option for its prototype is to return a negative number; upon further inspection, it does just that when improper input is tried. The only way I could advocate the duplication of abs functionality is if you wanted to force the user to check for error conditions by changing the interface of the function to something along the lines of the following.

    int abs(int input, int *output);
    Where a return value of zero implies success; otherwise errno is returned.

    • Nigel Jones says:

      Thanks for comment Ken. While I agree about checking return values, in the case I’m describing it is *not* guaranteed to work. The result of passing INT_MIN to abs() is *undefined*. With the one compiler I tried it returned a negative number. Another compiler may return 0, or INT_MAX, or indeed any value between INT_MIN and INT_MAX. Thus blithely checking the return value is non-negative is pointless unless you know for sure what your compiler does and are prepared to run the risk of the code being ported to a different platform down the road. This really is a case where you need to check your inputs.

      • Ken says:

        I apologize. It is explicitly undefined per the standard. I didn’t realize that. I will start to unit test any new compiler to make sure that abs(INT_MIN) returns a negative value.

        I wonder why returning a negative number was not written into the standard. Do you have any idea as to why it wasn’t? It would certainly provide a simple solution to this issue.

        • Nigel Jones says:

          I think I know. Negation of an integer is typically handled very efficiently by the instruction set of a CPU. However the exact methodology varies such that negation of INT_MIN is literally CPU dependent. Rather than hamstring 99.9999% of the code out there, the committee took the easy way out.

        • Eric Miller says:

          Ken,

          The vast majority of processor architectures you’re likely to run across use two’s complement representation for integers. For 16-bit integers, INT_MIN = -32768 = 0×8000. The typical implementation of abs() sees a negative number and negates it by “flipping the bits and adding one.” So 0×8000 becomes (0x7FFF + 1), or 0×8000, giving us -32768 when viewed as a signed 16-bit integer.

          There’s a small chance that you’ll see a one’s complement machine out there somewhere, in which case INT_MIN = -32767 = 0×8000 (this isn’t a typo, 0×8000 is INT_MIN for both one’s complement and two’s complement, but the number represented by 0×8000 differs between the two). Negating a number on a one’s complement machine requires only “flipping the bits,” so when we negate INT_MIN on a one’s complement machine, we get 0x7FFF = 32767 = INT_MAX.

          Nigel hits the nail on the head when he alludes to efficient handling of negation by CPUs. While compiler writers are permitted by the C standard to follow their whims when implementing undefined behavior, I’ve never seen one that did anything completely whacky like making abs(INT_MIN) = 42, which, while permitted by the standard, would require extra code to implement.

          In both the two’s complement and one’s complement cases (unsigned)abs(INT_MIN) gives a sensible result.

          • That would be one interesting compiler to mess around with, if it treated every “undefined behavior” case as a time to hit the random number generator.

          • Nigel Jones says:

            It would. I’ve often wondered what it would be like to write a completely conforming compiler. However where ever the behavior is undefined, or compiler specified, for the compiler to do the wackiest thing possible. Then see how much code it breaks…

  6. Bruce Moreland says:

    Returning an undefined value for abs(-32768) in a 16-bit environment is no more wrong than it is that -32768 – 1 is a positive number.

    When you use a data type, you are agreeing to live within the constraints of that data type. Not every operation on valid values produces results you’d expect if you had more bits. You can’t add 1 to 32767 and get a value that is larger, which is a pretty fundamental violation of mathematics, but we all understand that this is a limitation of the data type, and nobody will argue that 32767 + 1 should return 32767.

    You can’t negate -32768 and get a large positive value, either. This is just another limitation of the data type.

    The solution is not to try to expect to be able to do math that results in integer overflows, i.e. simply live within the constraints of the data type, or pick a different data type.

    Returning +32767 from abs(-32768) may work for a specific application but it is extra code in the normal case, and just turns one bug into a different bug in the edge case, and this is therefore a terrible solution.

    • Nigel Jones says:

      While I’d agree it isn’t a fantastic solution, I’m not sure I agree it’s a terrible solution. I guess the question is, what do you propose? As I see it, your choices are as follows:
      1. Ignore the problem.
      2. Check the input prior to calling abs() and handle INT_MIN on a case by case basis. For what it is worth, I have *never* seen this done.
      3. Arrange things such that the argument cannot be INT_MIN. In my experience this is a lot harder than it sounds.
      4. Handle the condition the way I illustrated.

  7. Bruce Moreland says:

    The correct answer is to ignore the problem at a standard library level (we’re talking about the language), in order to allow those who operate on data that will not have this overflow problem to have full efficiency.

    I can imagine cases where you have user input that cannot be changed, and where fudging this would be correct. In that case, writing a wrapper around the function is the right way to go. You understand your data, you understand your problem, and therefore you’ve fixed it correctly.

    If you are really just having problems with program logic that causes integer limits to be exceeded, it would make more sense to have that wrapper assert, then understand and fix the underlying bug.

    The reason the proposed solution is terrible (sorry, but it is), is that it adds code to the mainline case in order to mutate a bug into a different bug.

    • david collier says:

      I do so love people who fret about “have full efficiency”

      99% o f the work we do is efficient if it works first time and keeps working under all circumstances. The speed of the code is practically irrelevent for most people on most tasks.

      It’s this stuff from die-hard C advocates that makes me despair. My efficiency is measured by the number of nearly-perfectly-and-fast-enough working facilities I can deleiver in a year. ( which isn’t helped by discussing on here :-) )

      OK if you’re writing graphics drivers, strip it down. But letting your average punter programmer drive without the seat belt of nummeric range checking, the ABS of array-bound checking, and the airbag of exception trapping and handling is sheer commercial suicide.

      • Nigel Jones says:

        I agree completely with you about efficiency, with one caveat. I do a lot of portable electronics design. In this case efficiency isn’t about speed, it’s about energy consumption. Of course if a function only gets called once in a blue moon, then even this argument is specious.

  8. coderguy says:

    what about…

    int sabs(int i)
    {
    int res;

    if (i == 1<< ( 8 * sizeof(int)) ) //shift 8*x bits where x is size in bytes
    {
    //we hit something like 0×80000000
    res = i-1; //possible UB while switching to 0x7FFF…. might require some memory hacking

    }
    else
    res = i < 0 ? -i : i;

    return res;
    }

    what about this option?

    inline unsigned int test(int x)
    {
    //see if its a positive number, clear sign bit or return number;
    if ( x & 1<<8*sizeof(int) == 0)
    return x; //it's a positive number

    //x is at least 0×8….
    int mask = ~0; //transforms into 0xFFFFFF
    mask &= 1 <<< (8*sizeof(int)-1) ; //so even 0xFFFF comes out as 0x7FFF

    return x&mask; //worst-case returns 0x7F…
    }

    I wouldn't rely on ANY preprocessor / compiler inbuilts because of porting reasons/issues.

  9. James Kosin says:

    Maybe we liked it better when we had a value for 0 and -0, and the range would have been +32767 to -32767. But, then we would all be arguing about weather 0 and -0 were real numbers; since 0 is neither positive or negative.

    Saying that the abs(-32768) = 32767 is just as wrong.

  10. Tony Leigh says:

    Nigel, I suspect the reason why abs() doesn’t return an unsigned value is the confusion that mixing signed and unsigned integers can cause. For example, the following code:

    #include

    unsigned uabs(int x)
    {
    return (unsigned) (x >= 0 ? x : -x);
    }

    int main(int argc, char **argv)
    {
    int i = -4;

    if (i/uabs(3 – 5) >= 0)
    {
    printf(“Answer is +ve\n”);
    }
    else
    {
    printf(“Answer is -ve\n”);
    }

    return 0;
    }

    prints “Answer is +ve”, although you might expect -4/2 to be negative. The usual arithmetic conversions promote i to unsigned int, meaning the expression is never negative.

  11. mspike says:

    Hi everyone,
    I do understand most of the comments, but I have to reply for this, and others like this:
    “”" The reason the proposed solution is terrible (sorry, but it is), is that it adds code to the mainline case in order to mutate a bug into a different bug. “”"

    You see it in a wrong way. What he achieved with this is converting a non defined behavior into a defined one. So with the proposed own implementation of the abs we can trust in the result. It will work in the same way on every compiler. As the author mentioned, it would anyway doing “something.” And If we stuck to the example, boundary checking, we will now, that we don’t allow the INT_MIN and that’s all. But we can be sure, that what will happen in case of INT_MIN. I think in embed world it’s very common to have critical systems, in which no one want to have undefined behavior :)

    Br,
    P.

Leave a Reply