In the last couple of years I have had a large number of folks end up on this blog as a result of search terms such as “what does 0X47u mean?” In an effort to make their visit more productive, I’ll explain and also offer some thoughts on the topic.
Back in the mists of time, it was considered perfectly acceptable to write code that looks like this:
unsigned int foo = 6;
Indeed I’m guessing that just about every C textbook out there has just such a construct somewhere in its first few chapters. So what’s wrong with this you ask? Well, according to the C90 semantics, constants by default are of type ‘signed int’. Thus the above line of code takes a signed int and assigns it to an unsigned int. Now not so many years ago, most people would have just shrugged and got on with the task of churning out code. However, the folks at MISRA looked askance at this practice (and correctly so IMHO), and promulgated rule 10.6:
“Rule 10.6 (required): A “U” suffix shall be applied to all constants of unsigned type.”
Now in the world of computing, unsigned types don’t seem to crop up much. However in the embedded arena, unsigned integers are extremely common. Indeed IMHO you should use them. For information on doing so, see here.
Thus what has happened as MISRA adoption has spread throughout the embedded world, is you are starting to see code that looks like this:
unsigned int foo = 6u;
So this brings me to the answer to the question posed in the title – what does 0x47u mean? It means that it is an unsigned hexadecimal constant of value 47 hex = 71 decimal. If the ‘u’ is omitted then it is a signed hexadecimal constant of value 47 hex.
Some observations
You actually have three ways that to satisfy rule 10.6. Here are examples of the three methods.
unsigned int foo = 6u;
unsigned int foo = 6U;
unsigned int foo = (unsigned int)6;
Let’s dispense with the third method first. I am not a fan of casting, mainly because casting makes code hard to read and can inadvertently cover up all sorts of coding mistakes. As a result, any methodology that results in increased casts in code is a bad idea. If that doesn’t convince you, then consider initializing an unsigned array using casts:
unsigned int bar[42] = {(unsigned int)89, (unsigned int)56, (unsigned int)12, ... };
The result is a lot of typing and a mess to read. Don’t do it!
What then of the first methods? Should you use a lower case ‘u’ or an upper case ‘U’. Well I have reluctantly come down in favor of using an upper case ‘U’. Aesthetically I think that the lower case ‘u’ works better, in that the lower case letter is less intrusive and keeps your eye on the digits (which after all is what’s really important). Here’s what I mean:
unsigned int bar[42] = {89u, 56u, 12u, ... };
unsigned int bar[42] = {89U, 56U, 12U, ... };
So why do I use upper case ‘U’? Well it’s because ‘U’ isn’t the only modifier that one can append to an integer constant. One can also append an ‘L’ or ‘l’ meaning that the constant is of type ‘long’. They can also be combined as in ‘UL’, ‘ul’, ‘LU’ or ‘lu’, to signify an unsigned long constant. The problem is that a lower case ‘l’ looks an awful lot like a ‘1’ in most editors. Thus if you write this:
long bar = 345l;
Is that 345L or 3451? To really see what I mean, try these examples in a standard text editor. Anyway as a result, I always use upper case ‘L’ to signify a long constant – and thus to be consistent I use an upper case ‘U’ for unsigned. I could of course use ‘uL’ – but that just looks weird to me.
Incidentally based upon the code I have looked at over the last decade or so, I’d say that I’m in the minority on this topic, and that more people use the lower case ‘u’. I’d be interested to know what the readers of this blog do – particularly if they have a reason for doing so rather than whim!