## Archive for the ‘General C issues’ Category

### What does 0x47u mean anyway?

Saturday, July 21st, 2012 Nigel Jones

In the last couple of years I have had a large number of folks end up on this blog as a result of search terms such as “what does 0X47u mean?” In an effort to make their visit more productive, I’ll explain and also offer some thoughts on the topic.

Back in the mists of time, it was considered perfectly acceptable to write code that looks like this:

`unsigned int foo = 6;`

Indeed I’m guessing that just about every C textbook out there has just such a construct somewhere in its first few chapters. So what’s wrong with this you ask? Well, according to the C90 semantics, constants by default are of type ‘signed int’. Thus the above line of code takes a signed int and assigns it to an unsigned int. Now not so many years ago, most people would have just shrugged and got on with the task of churning out code. However, the folks at MISRA looked askance at this practice (and correctly so IMHO), and promulgated rule 10.6:

“Rule 10.6 (required): A “U” suffix shall be applied to all constants of unsigned type.”

Now in the world of computing, unsigned types don’t seem to crop up much. However in the embedded arena, unsigned integers are extremely common. Indeed IMHO you should use them. For information on doing so, see here.

Thus what has happened as MISRA adoption has spread throughout the embedded world, is you are starting to see code that looks like this:

`unsigned int foo = 6u;`

So this brings me to the answer to the question posed in the title – what does 0x47u mean? It means that it is an unsigned hexadecimal constant of value 47 hex = 71 decimal. If the ‘u’ is omitted then it is a signed hexadecimal constant of value 47 hex.

### Some observations

You actually have three ways that to satisfy rule 10.6. Here are examples of the three methods.

`unsigned int foo = 6u;`
`unsigned int foo = 6U;`
`unsigned int foo = (unsigned int)6;`

Let’s dispense with the third method first. I am not a fan of casting, mainly because casting makes code hard to read and can inadvertently cover up all sorts of coding mistakes. As a result, any methodology that results in increased casts in code is a bad idea. If that doesn’t convince you, then consider initializing an unsigned array using casts:

`unsigned int bar[42] = {(unsigned int)89, (unsigned int)56, (unsigned int)12, ... };`

The result is a lot of typing and a mess to read. Don’t do it!

What then of the first methods? Should you use a lower case ‘u’ or an upper case ‘U’. Well I have reluctantly come down in favor of using an upper case ‘U’. Aesthetically I think that the lower case ‘u’ works better, in that the lower case letter is less intrusive and keeps your eye on the digits (which after all is what’s really important). Here’s what I mean:

`unsigned int bar[42] = {89u, 56u, 12u, ... };`
`unsigned int bar[42] = {89U, 56U, 12U, ... };`

So why do I use upper case ‘U’? Well it’s because ‘U’ isn’t the only modifier that one can append to an integer constant. One can also append an ‘L’  or ‘l’ meaning that the constant is of type ‘long’. They can also be combined as in ‘UL’, ‘ul’, ‘LU’ or ‘lu’, to signify an unsigned long constant. The problem is that a lower case ‘l’ looks an awful lot like a ‘1’ in most editors. Thus if you write this:

`long bar = 345l;`

Is that 345L or 3451? To really see what I mean, try these examples in a standard text editor. Anyway as a result, I always use upper case ‘L’ to signify a long constant – and thus to be consistent I use an upper case ‘U’ for unsigned. I could of course use ‘uL’ – but that just looks weird to me.

Incidentally based upon the code I have looked at over the last decade or so, I’d say that I’m in the minority on this topic, and that more people use the lower case ‘u’. I’d be interested to know what the readers of this blog do – particularly if they have a reason for doing so rather than whim!

### The Crap Code Conundrum

Friday, June 29th, 2012 Nigel Jones

Listed below are three statements. Based on my nearly thirty years in the embedded space I can confidently state that: One I have never heard stated. Another I have rarely heard stated, and the third I hear a lot. Here they are in order:

1. I write crap code.
2. You know so-and-so. (S)he writes really good code.
3. This code is complete crap.

If your experience comports with mine, then it leads to what I have coined the ‘crap code conundrum’. In short, crap code is everywhere – but no one admits to or realizes they are writing it! So how can this be? Well I see several possibilities:

1. In fact a lot of so called crap code is labeled as such because the author did things differently to the way the reader would have done it. I think it’s important to recognize this before summarily dismissing some code. Notwithstanding this, I all too often find myself saying ‘this code is complete crap’ – because it is!
2. This is related to point 1, and essentially comes down to different people have different ideas about what constitutes good code. For example I think wrapping code in complex macros is an invitation for disaster. Others see it as a perfectly good way of simplifying things. (I’m right :-))
3. The code started out being pretty good and has degenerated over time because the author hasn’t been allowed the time to perform the necessary refactoring. I think this does explain a lot of the bad code I see.
4. The folks that write crap code are completely oblivious to the fact they are doing it. Indeed it’s only the self aware / self critical types that would even bother to ask themselves the question ‘is my code any good?’ Indeed, the first step to improving ones code is to ask oneself the question – how can I improve my code?

My gut feel is that point 4 is most likely the main cause. Now if you are so self-absorbed that you wouldn’t even dream to ask yourself the question ‘do I write crap code?’, then I seriously doubt whether you’d be reading this article. However if you have crossed this hurdle, then how can you determine if the code you are writing is any good? Well I took a stab at this a while back with this article . However some of the commenters pointed out that it’s quite easy to write code that has good metrics – yet is still complete crap. So clearly the code metrics approach is part of the story – but not the entire story.

So a couple of weeks ago I found myself in a bar in San Francisco having a beer with Michael Barr and a very smart  guy Steve Loudon. The topic of crap code came up and I posed the question ‘how can you tell code is crap?’ After all I think that crap code is a bit like pornography – you know it when you see it. After a spirited debate, the most pithy statement we could come up with is this:

If it’s hard to maintain, it’s crap.

Clearly there are all sorts of exceptions and qualifications, but at the end of the day I think this statement pretty much says it all. Thus if you are wondering if you write crap code, just ask yourself the question – how hard is this code to maintain? If you don’t like the answer, then it’s time to make a change.

### Optimizing for the CPU / compiler

Sunday, June 3rd, 2012 Nigel Jones

It is well known that standard C language features map horribly on to the architecture of many processors. While the mapping is obvious and appalling for some processors (low end PICs, 8051 spring to mind), it’s still not necessarily great at the 32 bit end of the spectrum where processors without floating point units can be hit hard with C’s floating point promotion rules. While this is all obvious stuff, it’s essentially about what those CPUs are lacking. Where it gets really interesting in the embedded space is when you have a processor that has all sorts of specialized features that are great for embedded systems – but which simply do not map on to the C language view of the world. Some examples will illustrate my point.

### Arithmetic vs. Logical shifting

The C language does of course have support for performing shift operations. However, these are strictly arithmetic shifts. That is when bits get shifted off the end of an integer type, they are simply lost. Logical shifting, sometimes known as rotation, is different in that bits simply get rotated back around (often through the carry bit but not always). Now while arithmetic shifting is great for, well arithmetic operations, there are plenty of occasions in which I find myself wanting to perform a rotation. Now can I write a rotation function in C – sure – but it’s a real pain in the tuches.

If you have ever had to design and implement an integer digital filter, I am sure you found yourself yearning for an addition operator that will saturate rather than overflow. [In this form of arithmetic, if the integral type would overflow as the result of an operation, then the processor simply returns the minimum or maximum value as appropriate].  Processors that the designers think might be required to perform digital filtering will have this feature built directly into their instruction sets.  By contrast the C language has zero direct support for such operations, which must be coded using nasty checks and masks.

### Nibble swapping

Swapping the upper and lower nibbles of a byte is a common operation in cryptography and related fields. As a result many processors include this ever so useful instruction in their instruction sets. While you can of course write C code to do it, it’s horrible looking and grossly inefficient when compared to the built in instruction.

### Implications

If you look over the examples quoted I’m sure you noticed a theme:

1. Yes I can write C code to achieve the desired functionality.
2. The resultant C code is usually ugly and horribly inefficient when compared to the intrinsic function of the processor.

Now in many cases, C compilers simply don’t give you access to these intrinsic functions, other than resorting to the inline assembler. Unfortunately, using the inline assembler causes a lot of problems. For example:

1. It will often force the compiler to not optimize the enclosing function.
2. It’s really easy to screw it up.
3. It’s banned by most coding standards.

As a result, the intrinsic features can’t be used anyway. However, there are embedded compilers out there that support intrinsic functions. For example here’s how to swap nibbles using IAR’s AVR compiler:

`foo = __swap_nibbles(bar);`

1. Because it’s a compiler intrinsic function, there are no issues with optimization.
2. Similarly because one works with standard variable names, there is no particular likelihood of getting this wrong.
3. Because it looks like a function call, there isn’t normally a problem with coding standards.

This then leads to one of the essential quandaries of embedded systems. Is it better to write completely standard (and hence presumably portable) C code, or should one take every advantage of neat features that are offered by your CPU (and if it is any good), your compiler?

I made my peace with this decision many years ago and fall firmly into the camp of take advantage of every neat feature offered by the CPU / compiler – even if it is non-standard. My rationale for doing so is as follows:

1. Porting code from one CPU to another happens rarely. Thus to burden the bulk of systems with this mythical possibility seems weird to me.
2. End users do not care. When was the last time you heard someone extoll the use of standard code in the latest widget? Instead end users care about speed, power and battery life. All things that can come about by having the most efficient code possible.
3. It seems downright rude not to use those features that the CPU designer built in to the CPU just because some purist says I should not.

Having said this, I do of course understand completely if you are in the business of selling software components (e.g. an AES library), where using intrinsic / specialized instructions could be a veritable pain. However for the rest of the industry I say use those intrinsic functions! As always, let the debate begin.

### The absolute truth about abs()

Wednesday, February 1st, 2012 Nigel Jones

One of the more depressing things about the C language is how often the results of various operations are undefined. A prime example of this is the abs() function that I’m fairly sure is liberally dispersed throughout your code (it is through mine). The undefined operation of the abs() function comes about if you have the temerity to use a compiler that represents negative numbers using 2’s complement notation. In this case, the most negative representable number is always numerically larger than the most positive representable number. In plain English, for 16 bit integers, the range is -32768 … + 32767. Thus if you pass -32768 to the abs() function, the result is undefined.

The problem of course in an embedded system is that undefined operations are just dangerous, so surely an embedded compiler will do something sensible, like return +32767 if you pass -32768 to abs? To test this hypothesis I whipped up the following code for my favourite 8 bit compiler (IAR’s AVR compiler).

```#include <stdlib.h>
#include <inttypes.h>
#include <stdio.h>

void main(void)
{
int16_t i;
int16_t absi;

i = INT16_MIN + 1;        /* Set i to one more than the most negative number */
absi = abs(i);

printf("Argument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);

i--;                    /* i should now equal INT16_MIN */
absi = abs(i);
printf("\nArgument = %" PRId16 ". Absolute value of argument = %" PRId16, i, absi);
}```

If you don’t understand the printf strings, see this article. Here’s the output of this code:

```Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = -32768```

Clearly abs(-32768) = -32768 is not a very useful result! If I look in <stdlib.h> I find that abs() is implemented as

```  int abs(int i)
{      /* compute absolute value of int argument */
return (i < 0 ? -i : i);
}```

Clearly no check is being made on the bounds of the parameter, and so the result that we get depends upon how the negation of the argument is performed. Thus this leads me to my first suggestion, namely write your own abs function. I’ll call this function sabs() for safe abs(). The first pass implementation for sabs() looks like this (note you’ll have to include <limits.h> to get INT_MIN and INT_MAX):

```int sabs(int i)
{
int res;

if (INT_MIN == i)
{
res = INT_MAX;
}
else
{
res = i < 0 ? -i : i;
}

return res;
}```

Here’s the output:

```Argument = -32767. Absolute value of argument = 32767
Argument = -32768. Absolute value of argument = 32767```

I think for most embedded systems this is a better result. However what happens if you use a smaller integer than the native integer size (e.g. a 16 bit integer on a 32 bit system, or an 8 bit integer on a 16 bit system?). To test this question, I modified the code thus:

```#include <stdlib.h>
#include <limits.h>
#include <inttypes.h>
#include <stdio.h>

int sabs(int i);

void main(void)
{
int8_t i;
int8_t absi;

i = INT8_MIN + 1;        /* Set i to one more than the most negative number */
absi = sabs(i);

printf("Argument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);

i--;                    /* i should now equal INT8_MIN */
absi = sabs(i);
printf("\nArgument = %" PRId8 ". Absolute value of argument = %" PRId8, i, absi);
}

int sabs(int i)
{
int res;

if (INT_MIN == i)
{
res = INT_MAX;
}
else
{
res = i < 0 ? -i : i;
}

return res;
}```

So in this case, the native integer size is 16 bits and I’m passing an 8 bit integer. Here’s the output:

Argument = -127. Absolute value of argument = 127
Argument = -128. Absolute value of argument = -128

Clearly, I still haven’t solved the problem, as I’d really like abs(-128) to be 127 when using 8 bit integers.  I suspect that I could come up with some fancy expression that handles all integer types.  However I’m a great believer in simple code, and so  my recommendation is that you write sabs() functions for all your integer types. Thus:

`/* Safe 8 bit absolute function */`
```int8_t sabs8(int8_t i)
{
int8_t res;

if (INT8_MIN == i)
{
res = INT8_MAX;
}
else
{
res = i < 0 ? -i : i;
}

return res;
}```
`/* Safe 16 bit absolute function */`
```int16_t sabs16(int16_t i)
{
int16_t res;

if (INT16_MIN == i)
{
res = INT16_MAX;
}
else
{
res = i < 0 ? -i : i;
}

return res;
}```
`/* Safe 32 bit absolute function */`
```int32_t sabs32(int32_t i)
{
int32_t res;

if (INT32_MIN == i)
{
res = INT32_MAX;
}
else
{
res = i < 0 ? -i : i;
}

return res;
}```

The above approach is all well and good, but let’s face it, I’ve added a lot of overhead for a very rare condition. So this raises the question as to whether there is a better way of doing things? Well in many cases we use the abs() function to check for some limit. For example

```    if (abs(i) > SOME_LIMIT)
{
printf("\nLimit exceeded");
}```

In cases like these, you can use what I call a negative absolute function, aka nabs(). nabs() works with negative absolutes and so can’t overflow. To demonstrate, here’s the code:

```int nabs(int i);

void main(void)
{
int i;
int absi;

i = INT_MIN + 1;        /* Set i to one more than the most negative number */
absi = nabs(i);

printf("Argument = %d Negative absolute value of argument = %d", i, absi);

i--;                    /* i should now equal INT_MIN */
absi = nabs(i);
printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);

i = INT_MAX;
absi = nabs(i);
printf("\nArgument = %d Negative absolute value of argument = %d", i, absi);
}

int nabs(int i)
{
return i > 0 ? -i : i;
}```

The output looks like this:

```Argument = -32767 Negative absolute value of argument = -32767
Argument = -32768 Negative absolute value of argument = -32768
Argument = 32767 Negative absolute value of argument = -32767```

Armed with this function, you merely flip your tests around, such that you have:

```    if (nabs(i) < SOME_NEGATIVE_LIMIT)
{
printf("\nLimit exceeded");
}```

I’ll leave it to you to decide whether gaining the efficiency is worth it for the rather strange looking code.

As a final note, do a search for the abs() function on the Internet. You’ll find that most references don’t mention the undefined behavior of abs with INT_MIN as an argument. The notable exception is the always excellent GNU reference . It’s thus hardly surprising that most embedded systems use abs().