embedded software boot camp

What does 0x47u mean anyway?

Saturday, July 21st, 2012 by Nigel Jones

In the last couple of years I have had a large number of folks end up on this blog as a result of search terms such as “what does 0X47u mean?” In an effort to make their visit more productive, I’ll explain and also offer some thoughts on the topic.

Back in the mists of time, it was considered perfectly acceptable to write code that looks like this:

unsigned int foo = 6;

Indeed I’m guessing that just about every C textbook out there has just such a construct somewhere in its first few chapters. So what’s wrong with this you ask? Well, according to the C90 semantics, constants by default are of type ‘signed int’. Thus the above line of code takes a signed int and assigns it to an unsigned int. Now not so many years ago, most people would have just shrugged and got on with the task of churning out code. However, the folks at MISRA looked askance at this practice (and correctly so IMHO), and promulgated rule 10.6:

“Rule 10.6 (required): A “U” suffix shall be applied to all constants of unsigned type.”

Now in the world of computing, unsigned types don’t seem to crop up much. However in the embedded arena, unsigned integers are extremely common. Indeed IMHO you should use them. For information on doing so, see here.

Thus what has happened as MISRA adoption has spread throughout the embedded world, is you are starting to see code that looks like this:

unsigned int foo = 6u;

So this brings me to the answer to the question posed in the title – what does 0x47u mean? It means that it is an unsigned hexadecimal constant of value 47 hex = 71 decimal. If the ‘u’ is omitted then it is a signed hexadecimal constant of value 47 hex.

Some observations

You actually have three ways that to satisfy rule 10.6. Here are examples of the three methods.

unsigned int foo = 6u;
unsigned int foo = 6U;
unsigned int foo = (unsigned int)6;

Let’s dispense with the third method first. I am not a fan of casting, mainly because casting makes code hard to read and can inadvertently cover up all sorts of coding mistakes. As a result, any methodology that results in increased casts in code is a bad idea. If that doesn’t convince you, then consider initializing an unsigned array using casts:

unsigned int bar[42] = {(unsigned int)89, (unsigned int)56, (unsigned int)12, ... };

The result is a lot of typing and a mess to read. Don’t do it!

What then of the first methods? Should you use a lower case ‘u’ or an upper case ‘U’. Well I have reluctantly come down in favor of using an upper case ‘U’. Aesthetically I think that the lower case ‘u’ works better, in that the lower case letter is less intrusive and keeps your eye on the digits (which after all is what’s really important). Here’s what I mean:

unsigned int bar[42] = {89u, 56u, 12u, ... };
unsigned int bar[42] = {89U, 56U, 12U, ... };

So why do I use upper case ‘U’? Well it’s because ‘U’ isn’t the only modifier that one can append to an integer constant. One can also append an ‘L’  or ‘l’ meaning that the constant is of type ‘long’. They can also be combined as in ‘UL’, ‘ul’, ‘LU’ or ‘lu’, to signify an unsigned long constant. The problem is that a lower case ‘l’ looks an awful lot like a ‘1’ in most editors. Thus if you write this:

long bar = 345l;

Is that 345L or 3451? To really see what I mean, try these examples in a standard text editor. Anyway as a result, I always use upper case ‘L’ to signify a long constant – and thus to be consistent I use an upper case ‘U’ for unsigned. I could of course use ‘uL’ – but that just looks weird to me.

Incidentally based upon the code I have looked at over the last decade or so, I’d say that I’m in the minority on this topic, and that more people use the lower case ‘u’. I’d be interested to know what the readers of this blog do – particularly if they have a reason for doing so rather than whim!

 

32 Responses to “What does 0x47u mean anyway?”

  1. Jeff Gros says:

    I’ve been using the exact same practices as you, and for the exact same reasons! We’re in perfect agreement, so keep preaching the truth to all those heathens out there! =p

  2. Miro Samek says:

    The MISRA-C:2004, Rule 10.6(required) says: A “U” suffic shall be applied to all constants of unsigned type.

    Please note that specifically the upper-case “U” is used in this rule and the whole document consistently uses only the uppercase “U” suffix in all examples (and never the lower-case suffix).

    Even though there is no rule about the “L” suffix, the MISRA-C:2004 document uses consistently only the upper-case “L” and never the lower case “l”. When the “U” and “L” suffixes are combined, all MISRA-C:2004 examples use the “UL” order.

    see: MISRA-C:2004 Guidelines for the Use of the C language in Critical Systems, MISRA, October 2004,
    ISBN: 978-0-9524156-2-6 paperback
    ISBN: 978-0-9524156-4-0 PDF

    • Nigel Jones says:

      Yes. However the MISRA checkers I have used all treat upper and lower case as equally acceptable. It is unclear to me from the wording whether lower case u does meet the consortium’s intent. Looking through the forums I have seen MISRA’s position that casting is also acceptable.

      • Miro Samek says:

        The issue of casting in numerical constants is very interesting and important. Frankly, I find myself struggling with MISRA compliance, and even more with the “strong typing” compliance of PC-Lint. I ended up constantly casting numerical constants. For example, how do you define a zero value that is specifically uint8_t? In C, this is the ugly cast (uint8_t)0, but in C++ it is even uglier static_cast(0).

        So, how do you handle this mess? Perhaps another post?

        • Nigel Jones says:

          It is a tough issue. In general my approach is to avoid casting as I think (I know!) it creates more problems than it solves. Thus when I’m assigning constants to a uint8_t I simply append a ‘U’ and don’t worry about the length per se. The reason I do this is when I inadvertently try to assign too large a number to a uint8_t. For example uint8_t foo = 347U. Thankfully most compilers and Lint quite rightly complain. However if I use a cast, uint8_t foo = (uint8_t)347 then in too many cases the compiler just shrugs and says – OK and gives me the wrong answer. Thus I’m of the opinion that the cure is worse than the disease in this case.

    • Bob Paddock says:

      There is also this advice on the MISRA Bulletin Board about unsigned zeros:

      http://www.misra.org.uk/forum/viewtopic.php?f=66&t=1044&p=2033#p2033

      “Rule 10.1 addresses a different issue. It demands, among other things, that an expression which is assigned to an unsigned variable should itself be unsigned. This means that any constant or constant expression should itself be of “unsigned” type – including the constant ‘0’. The rationale behind this is that it is helpful to maintain consistent signedness when constructing arithmetic expressions, even if the omission of a ‘U’ suffix makes no difference to the result.”

    • Lundin says:

      In the new MISRA 2012 draft, it looks like the ‘l’ suffix will be banned, since it looks like a one on some fonts. So it is good practice to use ‘L’ if you are concerned about future MISRA compliance.

      Personally I avoid the upper case U, because it makes hex literals look strange, with some comic potential. 0xBULL, 0xDULL, 0xFULL…

    • Rich says:

      Who calls what MISRA says? They don’t dictate the C Standard.

  3. Anonymous says:

    OK Nigel, can you help me sort this one out?

    typedef unsigned long long u64 /* let’s establish that we mean 64-bit ints irrespective of the compiler */

    u64 fred = (u64)123456789012;

    I haven’t attempted to work out what the standard says, but in my fear and ignorance I can see the compiler looking at the digits and attempting to turn them into a system signed int ( which will normallty be 32-bits at most. then SUSEQUENTLY converting that to an unsigned 64-bit value.

    Clearly that process will not give us what we want, since it will fail in the first stage. The only way it gives us what we hope is if the cast “influences” the processing of the literal. I can see that would be the “right” way for it to work, but I’ve no idea if it does.

    Now if that’s right, it rules out the use of casts in SOME circumstances, which to my mind means it rules them out in ALL, since we need consistency.

    SO – if that means we have to use literals with L or LL on the end, I’m now faced with the issue that I don’t know, across all compilers, what they mean. An L can mean 32 bits or 64 or, well even 16 on come toy C.

    I assume that this is obvious to anyone serious. I know C99 addresses some of this stuff, but does C99 include a way to represent numeric literals which guarantee their interpretation as a particular number of bits?

    Does MISRA have anything helpful to say about this area?

    • Jörg Seebohn says:

      The draft c11 standard says:

      6.4.4.1 Integer constants

      5 The type of an integer constant is the first of the corresponding list in which its value can
      be represented. … (list is omitted) …

      6 If an integer constant cannot be represented by any type in its list, it may have an
      extended integer type, if the extended integer type can represent its value. …

      So ((uint64_t) 0x1122334455667788) works as expected cause after 6.4.4.1.5 the type of
      0x1122334455667788 is at least of type int64_t (or extended) — if the compiler supports 64 bits —
      and casting from from int64_t to uint64_t is safe.

    • Lundin says:

      If you use not suffix nor typecasts, the C compiler will do this:

      Does the integer literal fit in a (signed) int?
      If not, does it fit in an unsigned int?
      If not, does it fit in a signed long?
      If not, does it fit in a unsigned long?

      And so on. This is well-defined even on old C90 compilers.

  4. david collier says:

    Thanks Jörg,
    Which makes me wonder why we’d bother appending ULL to the thing ever….

    But it does mean it won’t fall in a heap, and will retain the significant bits for me to cast it to something else.

    TVM
    David

    • Lundin says:

      Assume that int is 16 bits and long 32 bits. If you don’t use any literal suffix and write code like this:

      long i = INT_MAX + 1;

      then you will get a weird negative number even though long is large enough to hold any int result. This is because INT_MAX is a signed int (defined as 0x7FFF in limits.h) and 1 is a signed int. Both operands are of the same type, so no implicit conversions are needed. The result will be in type int, which will overflow, then that result is stored inside a long.

      1U or 1L or 1UL would have prevented that bug, as it would have enforced an implicit type promotion of the other operand (through balancing, aka “the usual arithmetic conversions”).

  5. justinx says:

    Out of interest what about the [U]INT[N]_C macros? Wouldn’t they be the correct tools to use when declaring a manifest constant or when assigning a numerical constant to a variable i.e.:

    #define MYCONSTANT UINT16_C(65532U);
    or
    uint16_t const myconstant = UINT16(65532U);

    This avoids the pitfalls of casting and also means from one platform to the next the engineer does not have to worry about the size of a U, L, UL, ULL etc changing..

    • Eric Miller says:

      justinx,

      I was thinking the same thing as I read through the comments.

      The only thing I’d change from your example is that I’d drop the U at the end of the constants.

      The two compilers I use most frequently differ in their implementations of UINT16_C():

      #define UINT16_C(c) c
      #define UINT16_C(x) (x ## u)

      So if you say “UINT16_C(123U)”, the compiler may expand it to “123Uu” (with an invalid double-U suffix).

      Jörg’s July 23 comment describe’s why it’s safe to drop the U.

    • Lundin says:

      I never use those macros but I believe that they would actually be incorrect, since according to the standard (C11 7.20.4.1), they expand into int_leastN_t. So in your case, the macro would be equivalent to:

      uint16_t const myconstant = (uint_least16_t)65532U;

      Which doesn’t solve anything, but potentially creates bugs if the constant fits in uint_least16_t but not in uint16_t.

      • Eric Miller says:

        The exact-width uintN_t types are optional. They’re only available on machines that support integers of width N with no padding.

        The minimum-width uint_leastN_t types are required, but can contain padding.

        If a system supports uint16_t, uint_least16_t will have the same representation as uint16_t, so the problem Lundin describes (constant fits in uint_least16_t but not in uint16_t) is impossible by definition.

  6. kalpak dabir says:

    Have not understood the impact of not using U (or u).
    With reference to the following
    uint8_t some_var = 6;

    what will get loaded in the variable some_var?
    If my LHS is clearly indicating that the variable is unsigned and of the size of 8 bits, then what can happen due to the missing U/ u?

  7. jeroen boonen says:

    We always use casts for constants, except for array indexing and the value 0.

    So it is always clear what the wanted type is. In the past we used also u and ul,…but is ul (unsigned long), what is long? 16bit, 32bit or 64bit
    e.g.:
    Unsigned64 fr = (Unsigned64)0x12345678AABBCCDD;
    d = (Float32)-2*x + (Float32)12*y + (Float32)3;

    Also we never use int, long, short,…in our code, because it is compiler dependent. We use:
    typedef unsigned char Boolean;
    typedef unsigned char Unsigned8;
    typedef unsigned short Unsigned16;
    typedef unsigned long Unsigned32;
    typedef unsigned long long Unsigned32;

    I love C, but for this topic ADA is the way to go…

  8. Scott Whitney says:

    Nigel, it appears that PC-Lint is really picky about the MISRA rules, and will complain about the lower-case “u”, so I have gotten in the practice of using the upper-case “U”. It gets a little messier when we start working with 3rd-party code such as a particular OS kernel, which defines its constants with the lower-case “u”… then we have to start suppressing warnings there, too, or treat the entire kernel as if it’s library code.

    For example, Micrium has
    #define OS_ERR_NONE 0u

    and I have to use //lint -esym(1960, 2-13-4) in code that checks to make sure that the returned error code is OS_ERR_NONE.

    Hope this helps someone else!

    Scott

  9. Clark says:

    Somewhat arbitrarily, and for reasons unknown, I use ‘u’ for unsigned, but ‘L’ for long. I guess it was just something that I picked up from my training and books (different books, different authors, etc.). Kinda like the way some people say “alls”. I guess they think all means everything, and the extra ‘s’ makes it all-encompassing. That is a huge pet-peeve of mine. I guess the upper-case ‘L’ makes it stronger, more all-encompassing… 😉

  10. Jerry says:

    0x47u doesn’t seem to be of much use in the first place, given that while decimal-represented numerical constants are signed by default, non-decimal-represented (i.e. octal-represented or hexadecimal-represented) are UNsigned by default. One can easily verify that by observing that the expressions

    0x80000000 > 0x7FFFFFFF
    020000000000 > 017777777777

    both evaluate to TRUE (they clearly would evaluate to FALSE if hex constants were signed). Thusly, while the u suffix is useful for decimal int constants, it is completely superfluous for octal and hex ones.

    • Nigel Jones says:

      I agree. However I never put a ‘U’ on a hexadecimal constant until I had to write code that was MISRA compliant. I ran the code through two separate checkers and both complained about the ‘U’ missing on hexadecimal constants. Since then I do it.

      • Jerry says:

        Interestingly enough, it actually turns out that my above statement was incorrect (therefore I owe an apology and hope not to have misled anyone).

        As a matter of fact, hex and octal constants which fit into the (signed int) range (such as 0x7FFFFFFF) are of (signed int) type. A hex or octal constant has the type (unsigned int) only if the positive number it represents exceeds the range of (signed int), which is the case with 0x80000000. A way to verify the actual type of a numerical constant is to try to assign it oddishly (suche as to a pointer-type variable without an explicit cast), the resulting compiler warning will reveal the constant’s type.

        The confusion which has even some arguably autoritative sources* believe hex- and octal-represented integer constants were always of a signed type may stem from the fact that they are never interpreted as negative numbers – if we have a hex or octal constant which _could_ represent a negative number (i.e. with the most significant bit set), it is still considered positive, and of unsigned type so that positive value can “fit”.

        Thusly:

        0x7FFFFFFFu

        makes sense, because without the ‘u’ suffix the constant would be of type (signed int), while in

        ox80000000u

        the ‘u’ suffix is superfluous in that the constant would be of type (unsigned int) without it anyway. Further, we can come up with an example use for the “signed” keyword (which is sometimes assumed to be entirely useless); namely, as we don’t have a suffix to specify a signed type, we can only do so by the means of an explicit cast, so for a constant such as 0x80000000 to be of strictly signed type we need to use

        (signed)0x80000000

        __
        * in Stephen Prata’s “C Primer Plus”, 5th ed., the answers to the chapter 3 review questions claim the constants 0xAA, 0x3, 012, and 0x44 to be of type (unsigned int), which, as per the above, they really aren’t.

        • Jerry says:

          Correction to the above post: “(…) some arguably autoritative sources* believe hex- and octal-represented integer constants were always of a signed type (…)” – obviously for anyone following along (I think), that sentence should say “unsigned type” rather than “signed type”, as the former is what some sources (incorrectly) believe (as I did too until just recently), and which is the whole point of the post. Whew.

  11. Jerry says:

    Also, if one want to get anal about types to the point of considering

    unsigned int i = 6;

    incorrect, then one would have to consider

    char c = ‘a’;

    as incorrect as well (because “character constants” are really of type signed int) and would have to write

    char c = (char)’a’;

    instead. Honestly I cannot say I have seen this done anywhere.

  12. Bruce Williamson says:

    The U or u is completely unnecessary.

    unsigned int someint = 12134u;

    Reads as unsigned integer someint is assigned the value 1234 unsigned. Well duh we’ve ALREADY stated that it was unsigned.

    It seems that people that like to write specification like to make other type more. 🙂

  13. Leroy105 says:

    My god, you have no idea how much this been bugging me as I am learning ARM Cortex M headers. “WHAT IS THE U FOR!!!!!!!!!!”

  14. snakedoctor says:

    After everything I have read on here, the *suffix* is ONLY really needed when you have an expression and an assignment on a single line. Otherwise it’s ONLY needed for people who don’t truly understand how the range of the datatype they are using, and if that’s the case — they probably shouldn’t be writing code. I only came to try and get an better understanding of this feature. We have an const int x = 0u; in our code which i think is totally ridiculous. Most of what I read here is a total overkill in most instances. I am beginning to lose confidence in the MISRA standard and beginning to think the mistakes made in source have little to do with what’s defined in the standard.

Leave a Reply

You must be logged in to post a comment.