If you are looking for some basic information on signed versus unsigned integers, you may also find this post useful. That being said, on to the original post…
Jack Ganssle’s latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here’s the entire extract – I apologize for its length – but you really have to read it all to understand my horror.
“Suppose you have a “Real World (TM)” always and forever positive value. Should you represent it as unsigned?
“Well, that’s actually a bit of a step that we tend to gloss over…
“As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are…
” * unsigned has a larger range.
” * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)
” * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand.
“For example I have a bit of code that generates code … and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.
“Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.
“* I do not care if my intermediate values go negative.
“* It’s hard to debug (and frightening) if they suddenly go huge.
“* the constraint is the final values must be positive.
“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.
“C’s “unsigned” representation is useless as a “constrain this value to be positive” tool. E.g. A device that can only go faster or slower, never backwards:
unsigned int speed; // Must be positive.
unsigned int brake(void)
{
–speed;
}
“Was using “unsigned” above any help to creating robust error free code? NO! “speed” may now _always_ be positive… but not necessarily meaningful!
“The main decider in using “unsigned” is storage. Am I going to double my storage requirements by using int16_t’s or pack them all in an array of uint8_t’s?
“My recommendation is this…
” * For scalars use a large enough signed value. eg. int_fast32_t
” * Treat “unsigned” purely as a storage optimization.
” * Use typedef’s (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.
typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization
I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct – but to call them the real differences is completely wrong. To make my point, I’ll first of all address his specific points – and then I’ll show you where the real differences are:
Unsigned has a larger range
Yes it does. However, if this is the reason you are using an unsigned type you’ve probably got other problems.
Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)
Yes it does, and au contraire – this is frequently what I want (see for example this). However, far more importantly is the question – what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything – including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft’s Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom’s compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I realized what I had done and that Watcom was perfectly within their rights to do what they did.
Note that this was not a case of porting code to a different target. This was the same target – just a different compiler.
Now let’s address his comment about modulo arithmetic. Consider the following code fragment:
uint16_t a,b,c, res;
a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;
res = a;
res += b; //Overflow
res -= c;
Does res end up with the expected value of 0xFFFE? Yes it does – courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.
Now if we repeat the exercise using signed data types.
int16_t a,b,c, res;
a = 32767; //Max value for a int16_t
b = 1;
c = 2;
res = a;
res += b; //Overflow - WW3 starts
res -= c;
What happens now? Who knows? On your system you may or may not get the answer you expect.
Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand
Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard ‘functions’ that return an unsigned integral type:
- sizeof()
- offsetof()
- strcspn()
- strlen()
- strpsn()
In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities. Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types – not the other way around! So what are the real reasons to use unsigned data types? I think these reasons are high on my list:
- Modulus operator
- Shifting
- Masking
Modulus Operator
One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here’s a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:
int main(void)
{
int i;
printf("Enter a number: ");
scanf("%d", &i);
if( ( i % 2 ) == 0) printf("Even");
if( ( i % 2 ) == 1) printf("Odd");
return 0;
}
When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line – using the modulus operator with signed integral types is a disaster waiting to happen.
Shifting
Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shift right is most certainly not the same as a divide by 2 – and will generate different code. It’s for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don’t recommend them either.
Masking
A similar class of problems occur if you attempt to perform masking operations on a signed data type.
Finally…
Before I leave this post, I just have to comment on this quote from Carter
“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works”.
Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he’d rather switch to a large signed data type – and trust that everything works out OK. He obviously thinks he’s on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file – well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let’s assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:
int32_t lines, result;
lines = 46342;
result = lines * lines + some_other_expression;
Well 46342 * 46342 overflows a signed 32 bit type – and the result is undefined. The bottom line – using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer.
Home