Posts Tagged ‘unsigned’

A tutorial on signed and unsigned integers

Wednesday, August 5th, 2009 Nigel Jones

One of the interesting things about writing a blog is looking at the search terms that drive traffic to your blog. In my case, after I posted these thoughts on signed versus unsigned integers, I was amazed to see how many people were ending up here looking for basic information concerning signed and unsigned integers. In an effort to make these folks visits more successful, I thought I’d put together some basic information on this topic. I’ve done it in a question and answer format.

All of these questions have been posed to a search engine which has driven traffic to this blog. For regular readers of this blog looking for something a bit more advanced, you will find the last section more satisfactory.

Are integers signed or unsigned?

A standard C integer data type (‘int’) is signed. However, I strongly recommend that you do not use the standard ‘int’ data type and instead use the C99 data types. See here for an explanation.

How do I convert a signed integer to an unsigned integer?

This is in some ways a very elementary question and in other ways a very profound question. Let’s consider the elementary issue first. To convert a signed integer to an unsigned integer, or to convert an unsigned integer to a signed integer you need only use a cast. For example:

int  a = 6;
unsigned int b;
int  c;

b = (unsigned int)a;

c = (int)b;

Actually in many cases you can dispense with the cast. However many compilers will complain, and Lint will most certainly complain. I recommend you always explicitly cast when converting between signed and unsigned types.

OK, well what about the profound part of the question? Well if you have a variable of type int, and it contains a negative value such as -9 then how do you convert this to an unsigned data type and what exactly happens if you perform a cast as shown above? Well the basic answer is – nothing. No bits are changed, the compiler just treats the bit representation as unsigned. For example, let us assume that the compiler represents signed integers using 2’s complement notation (this is the norm – but is *not* mandated by the C language). If our signed integer is a 16 bit value, and has the value -9, then its binary representation will be 1111111111110111. If you now cast this to an unsigned integer, then the unsigned integer will have the value 0xFFF7 or 6552710. Note however that you cannot rely upon the fact that casting -9 to an unsigned type will result in the value 0xFFF7. Whether it does or not depends entirely on how the compiler chooses to represent negative numbers.

What’s more efficient – a signed integer or an unsigned integer?

The short answer – unsigned integers are more efficient. See here for a more detailed explanation.

When should I use an unsigned integer?

In my opinion, you should always use unsigned integers, except in the following cases:

  • When the entity you are representing with your variable is inherently a signed value.
  • When dealing with standard C library functions that required an int to be passed to them.
  • In certain weird cases such as I documented here.

Now be advised that many people strongly disagree with me on this topic. Naturally I don’t find their arguments persuasive.

Why should I use an unsigned integer?

Here are my top reasons:

  • By using an unsigned integer, you are conveying important information to a reader of your code concerning the expected range of values that a variable may take on.
  • They are more efficient.
  • Modulus arithmetic is completely defined.
  • Overflowing an unsigned data type is defined, whereas overflowing a signed integer type could result in World War 3 starting.
  • You can safely perform shift operations.
  • You get a larger dynamic range.
  • Register values should nearly always be treated as unsigned entities – and embedded systems spend a lot of time dealing with register values.

What happens when I mix signed and unsigned integers?

This is the real crux of the problem with having signed and unsigned data types. The C standard has an entire section on this topic that only a compiler writer could love – and that the rest of us read and wince at. Having said that, it is important to know that integers that are signed get promoted to unsigned integers. If you think about it, this is the correct thing to happen. However, it can lead to some very interesting and unexpected results. A number of years ago I wrote an article “A ā€˜Cā€™ Test:The 0x10 Best Questions for Would-be Embedded Programmers” that was published in Embedded Systems Programming magazine. You can get an updated and corrected copy at my web site. My favorite question from this test is question 12 which is reproduced below – together with its answer: What does the following code output and why?

void foo(void)
{
 unsigned int a = 6;
 int b = -20;
 (a+b > 6) ? puts("> 6") : puts("<= 6");
}

This question tests whether you understand the integer promotion rules in C – an area that I find is very poorly understood by many developers. Anyway, the answer is that this outputs “> 6”. The reason for this is that expressions involving signed and unsigned types have all operands promoted to unsigned types. Thus -20 becomes a very large positive integer and the expression evaluates to greater than 6. This is a very important point in embedded systems where unsigned data types should be used frequently (see reference 2). If you get this one wrong, then you are perilously close to not being hired.

This is all well and good, but what should one do about this? Well you can pore over the C standard, run tests on your compiler to make sure it really does conform to the standard, and then write conforming code, or you can do the following: Never mix signed and unsigned integers in an expression. I do this by the use of intermediate variables. To show how to do this, consider a function that takes an int ‘a’ and an unsigned int ‘b’. Its job is to return true if b > a, otherwise it returns false. As you shall see, this is a surprisingly difficult problem… To solve this problem, we need to consider the following:

  • The signed integer a can be negative.
  • The unsigned integer b can be numerically larger than the largest possible value representable by a signed integer
  • The integer promotion rules can really screw things up if you are not careful.

With these points in mind, here’s my stab at a robust solution

bool foo(int a, unsigned int b)
{
 bool res;

 if (a < 0)
 {
  res = true; /* If a is negative, it must be less than b */
 }
 else
 {
  unsigned int c;
  c = (unsigned int) a; /* Since a is positive, this cast is safe */
  if (b > c)            /* Now I'm comparing the same data types */
  {
   res = true;
  }
  else
  {
   res = false;
  }
 }
 return res;
}

Is this a lot of work – yes. Could I come up with a more compact implementation that is guaranteed to work for all possible values of a and b – probably. Would it be as clear – I doubt it. Perhaps regular readers of this blog would like to take a stab at producing a better implementation?

Home

Efficient C Tips #10 – Use unsigned integers

Friday, July 31st, 2009 Nigel Jones

This is the tenth in a series of tips on writing efficient C for embedded systems. Today I consider the topic of whether one should use signed integers or unsigned integers in order to produce faster code. Well the short answer is that unsigned integers nearly always produce faster code. Why is this you ask? Well there are several reasons:

Lack of signed integer support at the op code level

Many low end microprocessors lack instruction set support (i.e. op codes) for signed integers. The 8051 is a major example. I believe low end PICs are also another example. The Rabbit processor is sort of an example in that my recollection is that it lacks support for signed 8 bit types, but does have support for signed 16 bit types! Furthermore some processors will have instructions for performing signed comparisons, but only directly support unsigned multiplication.

Anyway, so what’s the implication of this? Well lacking direct instruction set support, use of a signed integer forces the compiler to use a library function or macro to perform the requisite operation. Clearly this is not very efficient. But what if you are programming a processor that does have instruction set support for signed integers? Well for most basic operations such as comparison and addition you should find no difference. However this is not the case for division…

Shift right is not the same as divide by two for signed integers

I doubt there is a compiler in existence that doesn’t recognize that division by 2N is equivalent to a right shift N places for unsigned integers. However this is simply not the case for signed integers, since the issue of what to do with the sign bit always arises. Thus when faced with performing a division by 2N on a signed integer, the compiler has no choice other than to invoke a signed divide routine rather than a simple shift operation. This holds true for every microprocessor I have ever looked at in detail.

There is a third area where unsigned integers offer a speed improvement over signed integers – but it comes about by a different mechanism…

Unsigned integers can often save you a comparison

From time to time I find myself writing a function that takes as an argument an index into an array or a file. Naturally to protect against indexing beyond the bounds of the array or file, I add protection code. If I declare the function as taking a signed integer type, then the code looks like this:

void foo(int offset)
{
 if ((offset >= 0) && (offset < ARRAY_SIZE))
 {
  //Life is good...
 }
}

However, if I declare the function as taking an unsigned integer type, then the code looks like this:

void foo(unsigned int offset)
{
 if (offset < ARRAY_SIZE)
 {
  //Life is good...
 }
}

Clearly it’s nonsensical to check whether an unsigned integer is >=0 and so I can dispense with a check. The above are examples of where unsigned integer types are significantly more efficient than signed integer types. In most other cases, there isn’t usually any difference between the types. That’s not to say that you should choose one over the other on a whim. See this for a discussion of some of the other good reasons to use an unsigned integer. Before I leave this topic, it’s worth asking whether there are situations in which a signed integer is more efficient than an unsigned integer? Off hand I can’t think of any. There are situations where I could see the possibility of this occurring. For example when performing pointer arithmetic, the C standard requires that subtraction of two pointers return the data type ptrdiff_t. This is a signed integral type (since the result may be negative). Thus if after subtracting two pointers, you needed to add an offset to the result, it’s likely that you’ll get better code if the offset is a signed integral type. Of course this touches upon the nasty topic of mixing signed and unsigned integral types in an expression. I’ll address this another day.

Next Tip

Previous Tip

Home

Signed versus unsigned integers

Saturday, May 9th, 2009 Nigel Jones

If you are looking for some basic information on signed versus unsigned integers, you may also find this post useful. That being said, on to the original post…

Jack Ganssle’s latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here’s the entire extract – I apologize for its length – but you really have to read it all to understand my horror.

“Suppose you have a “Real World (TM)” always and forever positive value. Should you represent it as unsigned?

“Well, that’s actually a bit of a step that we tend to gloss over…

“As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are…

” * unsigned has a larger range.

” * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

” * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand.

“For example I have a bit of code that generates code … and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.

“Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.

“* I do not care if my intermediate values go negative.

“* It’s hard to debug (and frightening) if they suddenly go huge.

“* the constraint is the final values must be positive.

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.

“C’s “unsigned” representation is useless as a “constrain this value to be positive” tool. E.g. A device that can only go faster or slower, never backwards:

unsigned int speed; // Must be positive.
unsigned int brake(void)
{
–speed;
}

“Was using “unsigned” above any help to creating robust error free code? NO! “speed” may now _always_ be positive… but not necessarily meaningful!

“The main decider in using “unsigned” is storage. Am I going to double my storage requirements by using int16_t’s or pack them all in an array of uint8_t’s?

“My recommendation is this…

” * For scalars use a large enough signed value. eg. int_fast32_t
” * Treat “unsigned” purely as a storage optimization.
” * Use typedef’s (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.

typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization

I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct – but to call them the real differences is completely wrong. To make my point, I’ll first of all address his specific points – and then I’ll show you where the real differences are:

Unsigned has a larger range

Yes it does. However, if this is the reason you are using an unsigned type you’ve probably got other problems.

Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

Yes it does, and au contraire – this is frequently what I want (see for example this). However, far more importantly is the question – what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything – including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft’s Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom’s compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I realized what I had done and that Watcom was perfectly within their rights to do what they did.

Note that this was not a case of porting code to a different target. This was the same target – just a different compiler.

Now let’s address his comment about modulo arithmetic. Consider the following code fragment:

uint16_t a,b,c, res;

a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow
res -= c;

Does res end up with the expected value of 0xFFFE? Yes it does – courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.

Now if we repeat the exercise using signed data types.

int16_t a,b,c, res;

a = 32767; //Max value for a int16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow - WW3 starts
res -= c;

What happens now? Who knows? On your system you may or may not get the answer you expect.

Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand

Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard ‘functions’ that return an unsigned integral type:

  • sizeof()
  • offsetof()
  • strcspn()
  • strlen()
  • strpsn()

In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities. Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types – not the other way around! So what are the real reasons to use unsigned data types? I think these reasons are high on my list:

  • Modulus operator
  • Shifting
  • Masking

Modulus Operator

One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here’s a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:

int main(void)
{
 int i;

 printf("Enter a number: ");
 scanf("%d", &i);

 if( ( i % 2 ) == 0) printf("Even");
 if( ( i % 2 ) == 1) printf("Odd");

 return 0;
}

When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line – using the modulus operator with signed integral types is a disaster waiting to happen.

Shifting

Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shift right is most certainly not the same as a divide by 2 – and will generate different code. It’s for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don’t recommend them either.

Masking

A similar class of problems occur if you attempt to perform masking operations on a signed data type.

Finally…

Before I leave this post, I just have to comment on this quote from Carter

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works”.

Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he’d rather switch to a large signed data type – and trust that everything works out OK. He obviously thinks he’s on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file – well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let’s assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:

int32_t lines, result;

lines = 46342;
result = lines * lines + some_other_expression;

Well 46342 * 46342 overflows a signed 32 bit type – and the result is undefined. The bottom line – using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer.

Home