Archive for the ‘General C issues’ Category

Signed versus unsigned integers

Saturday, May 9th, 2009 Nigel Jones

If you are looking for some basic information on signed versus unsigned integers, you may also find this post useful. That being said, on to the original post…

Jack Ganssle’s latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here’s the entire extract – I apologize for its length – but you really have to read it all to understand my horror.

“Suppose you have a “Real World (TM)” always and forever positive value. Should you represent it as unsigned?

“Well, that’s actually a bit of a step that we tend to gloss over…

“As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are…

” * unsigned has a larger range.

” * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

” * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand.

“For example I have a bit of code that generates code … and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.

“Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.

“* I do not care if my intermediate values go negative.

“* It’s hard to debug (and frightening) if they suddenly go huge.

“* the constraint is the final values must be positive.

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.

“C’s “unsigned” representation is useless as a “constrain this value to be positive” tool. E.g. A device that can only go faster or slower, never backwards:

unsigned int speed; // Must be positive.
unsigned int brake(void)
{
–speed;
}

“Was using “unsigned” above any help to creating robust error free code? NO! “speed” may now _always_ be positive… but not necessarily meaningful!

“The main decider in using “unsigned” is storage. Am I going to double my storage requirements by using int16_t’s or pack them all in an array of uint8_t’s?

“My recommendation is this…

” * For scalars use a large enough signed value. eg. int_fast32_t
” * Treat “unsigned” purely as a storage optimization.
” * Use typedef’s (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.

typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization

I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct – but to call them the real differences is completely wrong. To make my point, I’ll first of all address his specific points – and then I’ll show you where the real differences are:

Unsigned has a larger range

Yes it does. However, if this is the reason you are using an unsigned type you’ve probably got other problems.

Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

Yes it does, and au contraire – this is frequently what I want (see for example this). However, far more importantly is the question – what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything – including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft’s Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom’s compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I realized what I had done and that Watcom was perfectly within their rights to do what they did.

Note that this was not a case of porting code to a different target. This was the same target – just a different compiler.

Now let’s address his comment about modulo arithmetic. Consider the following code fragment:

uint16_t a,b,c, res;

a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow
res -= c;

Does res end up with the expected value of 0xFFFE? Yes it does – courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.

Now if we repeat the exercise using signed data types.

int16_t a,b,c, res;

a = 32767; //Max value for a int16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow - WW3 starts
res -= c;

What happens now? Who knows? On your system you may or may not get the answer you expect.

Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand

Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard ‘functions’ that return an unsigned integral type:

  • sizeof()
  • offsetof()
  • strcspn()
  • strlen()
  • strpsn()

In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities. Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types – not the other way around! So what are the real reasons to use unsigned data types? I think these reasons are high on my list:

  • Modulus operator
  • Shifting
  • Masking

Modulus Operator

One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here’s a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:

int main(void)
{
 int i;

 printf("Enter a number: ");
 scanf("%d", &i);

 if( ( i % 2 ) == 0) printf("Even");
 if( ( i % 2 ) == 1) printf("Odd");

 return 0;
}

When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line – using the modulus operator with signed integral types is a disaster waiting to happen.

Shifting

Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shift right is most certainly not the same as a divide by 2 – and will generate different code. It’s for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don’t recommend them either.

Masking

A similar class of problems occur if you attempt to perform masking operations on a signed data type.

Finally…

Before I leave this post, I just have to comment on this quote from Carter

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works”.

Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he’d rather switch to a large signed data type – and trust that everything works out OK. He obviously thinks he’s on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file – well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let’s assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:

int32_t lines, result;

lines = 46342;
result = lines * lines + some_other_expression;

Well 46342 * 46342 overflows a signed 32 bit type – and the result is undefined. The bottom line – using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer.

Home

On the use of correct grammar in code comments

Saturday, April 11th, 2009 Nigel Jones

Back when I was in college the engineering students were fond of dismissing the liberal arts majors by doing such witty things as writing next to the toilet paper dispenser “Liberal Arts degree – please take one”. One of the better retaliatory pieces of graffito that I really liked was: “Four years ago I couldn’t spell Engineer – now I are one”. I think this appealed to me because there was more than a smidgen of truth in its sentiment. If you don’t believe me, just take a look at the comments found in most computer programs. I don’t think I’m being exactly controversial by noting that most comments :

  • Lack basic punctuation.
  • Contain numerous spelling errors.
  • Liberally use non-standard abbreviations.
  • Regularly omit verbs and / or other basic components of a sentence.

As a result, many comments are nonsensical. In fact I’ve been in situations where the comments are so badly written that it’s easier to read the code than it is the comments. Clearly this isn’t a good thing! When I question programmers about this, I typically get a shrug of the shoulders and a ‘what’s the big deal’ attitude. When pressed further, the honest ones will admit that they couldn’t be bothered to use correct grammar or spelling because it’s too much effort – and after all you can work out what they mean if you just try hard enough. They are of course correct. However taken to its logical conclusion, this is really an argument for not commenting at all – since with a bit (OK a lot) of effort it should be crystal clear what the code is doing (and why) simply by examining it.

I decided to write about this now since I recently heard from Brad Mosch concerning a pet peeve of his. He gave his permission for me to quote from his email:

I see all the time mixed occurrences of whether or not a space is used for things such as 3dB, 3 dB, 1MHz, 1 MHz, etc. I am hoping that someone in the embedded guru world propose that a space is ALWAYS used between the number and the unit of measure. That is the documented standard that was used in our technical writing at United Space Alliance and NASA. The funny thing is, even though that standard existed out there at Kennedy Space Center, not a whole lot of people knew about it because I saw the same problem in documents out there all the time. Anyway, my point is, isn’t “1 Hz” a lot more readable than “1Hz”?

I’m sure many of you may think that Brad is being overly picky. However I don’t. His real point (in the last sentence) concerns readability. If you are going to write a comment surely it should be as readable as possible? Now I consider myself a very conscientious commenter of code, and so as a matter of interest I did a quick search on my current project to see whether I was following Brad’s advice. Well I was – about 90 % of the time. I found that my style depended a bit on the units. For example, I always appended the % sign without a space, whereas mV, mA etc just about always had a space between the value and the units. You’ll be pleased to know Brad that I’ll be mending my ways!

Anyway, I’ll leave this topic for now. Next time I visit it I’ll tell you how I spell check my comments. Hopefully having read this post you’ll know why I do it.
Home

Using volatile to achieve persistence!

Sunday, January 11th, 2009 Nigel Jones

Once in a while the real world and the arcane world of language standards collide, resulting in surprising results. To see what I mean, read on …

Many of the products I design incorporate a Bootstrap Loader, so that the application firmware may be updated in the field. In most cases, the bootstrap loader is a completely different program to the main application. Despite this, I find it useful for the main application to pass information to the bootstrap loader and vice versa. Thus the question arises, how best to do this? Well in the processor family I am using, although it is technically possible to store information in Flash, EEPROM or RAM, by far the easiest and most secure way of doing it is to place the information into EEPROM. Furthermore, in order to enter the bootstrap loader it is highly desirable to force a reset of the processor by allowing the watchdog timer to time out.

Thus, the code to enter the bootstrap loader looks something like this:

__eeprom uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
 /* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}

Well, on the face of it, there is not much wrong with this code. However, if one turns on the optimizer, then the compiler examines the code, decides that no code may be executed beyond the infinite loop and thus concludes that the write to msg_for_bootloader is pointless, and promptly optimizes it away. (For a discussion on this topic, see my posting here)

Now you will note that msg_for_bootloader was qualified with __eeprom. This is a compiler extension that allows one to inform the compiler that the variable msg_for_bootloader resides in a special memory space and to be treated accordingly. Now I know that the compiler knows enough about the EEPROM space to generate the correct coding sequences such that reads and writes are performed correctly. However, in my naivete, I also assumed that the compiler knew something about the properties of EEPROM, such that it would realize writing to EEPROM without ostensibly reading it again is intrinsically useful in many applications.

Well it does not. Furthermore, on balance I think the compiler writer’s got it right and the error was completely mine.

So what to do? Well, declaring msg_for_bootloader as volatile fixes the problem. Thus my code now looks like this:

__eeprom volatile uint8_t msg_for_bootloader;

...

msg_for_bootloader = 0x42;

...

for(;;)
{
 /* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */
}

Thus I ended up in the rather bizarre situation of having to declare a variable as volatile in order to make it persistent!

Although I can appreciate the wry irony of this situation, I think it points to a larger problem. The fact is that we are all (ok, most of us) programming in a language (C) that was not designed for use in embedded systems. Indeed, when C was written, I’m not sure EEPROM even existed. As a result, the compiler vendors have added extensions to the C standard in an effort to overcome its shortcomings for embedded systems, while still desperately striving to achieve “full compliance with the standard”. Despite this, I find myself all too frequently falling into traps such as this one. What we really need is a language explicitly designed for embedded systems. It isn’t going to happen, but it doesn’t stop me wishing for it.

Home

Knowing my weaknesses

Saturday, December 6th, 2008 Nigel Jones

A few weeks ago I published what appears to have been quite a popular blog on what I called the ‘Bug Cluster Phenomenon’. Today, I’m going to extend that concept somewhat by way of a mea culpa.

Earlier this week I had to eat some very humble pie. For the last six weeks or so I had received complaints that a temperature measurement wasn’t giving accurate results. The sensor in question is measuring approximately ambient temperature, and was returning values in the 18 – 26 Celsius range, which seemed reasonable to me. I just wrote off the complaints as being due to the fact that humans have a very poor perception of absolute temperature. Well finally, at my urging, someone dragged the device out into the Winter cold, where it promptly read 18 Celsius. Thus I was faced with proof that something was wrong.

I proceeded to investigate the code, and discovered that based on the current inputs to the code, the code was generating an output with an error of about 2 degrees. How was this possible, since it was nothing more than a series of multiplies, adds and shifts – not typically fodder for a 2 degree error?

Well, further investigation showed that at a certain point I was getting numeric overflow when two numbers were being multiplied together. Now typically, when this occurs, one gets answers that have huge ‘errors’. In my case I had the misfortune that the arithmetic worked out such that the error at room temperature was barely noticeable.

Anyway, I duly fixed the code. However, before moving on I took the time to reflect on this particular bug. Was this just one of those stupid coding errors that we all make from time to time, or was there more to it? I came to the conclusion that this was not just “one of those things”. Rather I realized that this was at least the third time this year that I had written code that suffered from a numeric overflow problem. In short, I have a problem or a blind spot if you will, for a particular class of problem.

Well I’m told that recognizing ones problems is the first step in solving them. So I proceeded to do a little bit more investigating and discovered that my numeric overflow bugs always occurred when I combined multiple operators on a line. For example:

y = a * a + c;

Thus the solution seems obvious to me – only one numeric operator per line. Thus in future, I will always code like this:

y = a * a;
y += c;

The bottom line. When you encounter a bug, as well as looking for other bugs nearby (as described in the bug cluster phenomenon post), also take the time to reflect on what caused the bug in the first place, and see if you can recognize any systemic problems in your approach to coding. When it comes down to it, this is nothing more than a process of ‘continuous quality improvement’. If it works for Toyota then it might just work in the embedded systems arena.

Home

Dogging your watchdog

Tuesday, November 4th, 2008 Nigel Jones

Most embedded systems employ watchdog timers. It’s not my intention today to talk about why to use watchdog timers, or indeed how to use them. Rather I assume you know the answers to these questions. Instead, I’ll pass on some tips for how to track down those unexpected watchdog resets that can occur during the development process.

To help find these problems, it is essential to find out where the watchdog reset is occurring. Unfortunately, this isn’t easy, since by definition a watchdog reset will reset the processor, typically destroying all state information that could be used to debug the problem. To get around this problem, here are a few things you can try.

  1. Place a break point on the (watchdog) reset vector. Although this will typically not stop the processor from being reset, it will ensure that none of your variables get initialized by your start up code. As a result, you should be able to use your debugger to examine these variables – which may give you an insight into what is going wrong.
  2. Certain processor architectures allow the action of the watchdog timer to be changed between a classic watchdog (when the timer times out, the processor is reset), to a special form of timer, complete with its own interrupt vector. Although I rarely use this mode of operation in release code, it is very useful for debugging. Simply reconfigure the watchdog to generate an interrupt upon timeout, and place a break point in the watchdog’s ISR. Then when the watchdog times out, your debugger will stop at the break point. It’s then just a simple matter of stepping out of the ISR to return to the exact point in your code where the watchdog timeout occurred.
  3. If neither of the above methods are available to you, and you are genuinely clueless as to where to start looking, then a painful but workable solution is to ‘instrument’ entry into each function. This essentially consists of some code that is placed at the start of every function. The code’s job is to record the ID of the function into some form of storage that will not be affected by a watchdog reset, such that you can identify the offending function after a watchdog reset has occurred. This isn’t quite as bad as it sounds, provided you are good with macros, a scripting language such as Perl and are aware of common compiler vendor extensions such as the macro __FUNCTION__. Of course if you are that good the chances are you won’t be clueless as to why you are taking a watchdog reset!

I’ll leave it to another post to talk about the sort of code that often causes watchdog timeouts.

Home