embedded software boot camp

Binary Literals in C

Wednesday, September 30th, 2009 by Michael Barr

A couple of years ago, Netrino engineer Dan Smith was writing stepper motor control firmware that interfaced to lots of registers with binary fields and sub-fields. After struggling a bit with the usual error-prone “off by 1 bit shift” masking and conversion from binary to hexadecimal literals in C, he happened across a useful post on a forum.

In a nutshell, the “binary literal” technique involves the following set of C preprocessor macros:

// Internal Macros
#define HEX__(n) 0x##n##LU
#define B8__(x) ((x&0x0000000FLU)?1:0) \
+((x&0x000000F0LU)?2:0) \
+((x&0x00000F00LU)?4:0) \
+((x&0x0000F000LU)?8:0) \
+((x&0x000F0000LU)?16:0) \
+((x&0x00F00000LU)?32:0) \
+((x&0x0F000000LU)?64:0) \
+((x&0xF0000000LU)?128:0)

// User-visible Macros
#define B8(d) ((unsigned char)B8__(HEX__(d)))
#define B16(dmsb,dlsb) (((unsigned short)B8(dmsb)<<8) + B8(dlsb))
#define B32(dmsb,db2,db3,dlsb) \
(((unsigned long)B8(dmsb)<<24) \
+ ((unsigned long)B8(db2)<<16) \
+ ((unsigned long)B8(db3)<<8) \
+ B8(dlsb))

Here are some examples of the usage of these macros:
B8(01010101) // 85
B16(10101010,01010101) // 43,605
B32(10000000,11111111,10101010,01010101) // 2,164,238,933

So if you had a memory-mapped 8-bit control register of the format XXXYYZZZ (where XXX, YY, and ZZZ are subfields), you could initialize it like so:

*p_reg = ( (B8(010) << 5) | (B8(11) << 3) | (B8(101) << 0) )

which sets the XXX bits to 010, YY to 11, and ZZZ to 101. If I ever needed to change XXX to 011, just change a single 0 to a 1 in the source code, and everything magically changes. Best of all, it’s all done at compile-time. No error-prone conversion to hexadecimal necessary, no figuring out which bits belong to which nibbles, etc.

What is that old saying? — “good programmers write good code; great programmers steal great code

Tags: ,

13 Responses to “Binary Literals in C”

  1. Nigel Jones says:

    Great post (or perhaps steal :-)). I've always wanted a way to express binary numbers in C, and now I have one! I took the code and experimented a bit with it. If you turn optimization completely off, them my IAR ARM compiler did include some redundant code. However, the lowest level of optimization fixed that problem.The MISRA compliance checker did of course have an absolute fit over the use of these macros, generating a large number of complaints. I think this is yet another example where one has to make an intelligent trade off. Should I use a macro that is unsafe by MISRA standards, yet allows me to eliminate an entire class of errors?Perhaps in my copious free time I'll see if I can restructure the macros to make MISRA compliance possible.

  2. Nick says:

    MISRA compliance allows exceptions for exactly this kind of thingThis is making your code safer and not less safe:The macros encapsulate individually unsafe constructs but package them up with a very safe interface.You just need a project-wide MISRA deviation specialised to these three macros and you can retain full MISRA compliance, both in the spirit and the letter of the law.Different MISRA checkers handle deviations in different ways but PC-Lint handles macro-specific deviations very nicely, for one example.

  3. Kenneth says:

    I'm trying to learn the finer points of C and how to be creative with it. :)I don't understand something the binary macros. —————————-/* 8-bit conversion function */#define B8__(x) ((x&0x0000000FLU)?1:0) \+((x&0x000000F0LU)?2:0) \+((x&0x00000F00LU)?4:0) \+((x&0x0000F000LU)?8:0) \+((x&0x000F0000LU)?16:0) \+((x&0x00F00000LU)?32:0) \+((x&0x0F000000LU)?64:0) \+((x&0xF0000000LU)?128:0)——————————-If we take part of the first line:(x&0x0000000FLU)It would appear to me that x is being anded with the constant 0x0000000FLU.Its with this constant that I get lost. To me 0x0000000F = 15 (in decimal). The "LU" as I underatand it, and I could be wrong, signifies the constant is an Unsigned Long.I understand how the macro is supposed to work. My problem, I guess is with the syntax.Can someone shed some light?Thanks,Kennykenl@anspach.com

  4. Nigel Jones says:

    Sure can. By way of example, lets look at B8(1011).First off the HEX__ macro converts the argument 1011 into 0x1011UL. It does this via the stringize operator ##. (This is a pretty obscure part of the preprocessor – you'll find more information online).Now let's see what happens with the B8__(x) macro. It consists of 8 lines, each of which effectively tests one bit position. The first line is (x & 0x0000000Flu) ? 1 : 0. (Note that I've added spaces and made the LU lower case, thus making things a little clearer IMHO).Anyway, this becomes: 0x1011ul & 0x0000000Ful = 0x1. This evidently evaluates to true and so the ternary operator returns 1.The second line of the B8__() macro is (x & 0x000000F0lu) ? 2 : 0). Thus this expands to 0x1011 & 0x000000F0 = 0x10. This evidently evaluates to true and so the ternary operator returns 2, which is added to the previous result, giving a total of decimal 3.The third line of the B8__() macro is (x & 0x00000F00lu) ? 4 : 0). Thus this expands to 0x1011 & 0x00000F0 = 0x0. This evidently evaluates to false and so the ternary operator returns 0, which is added to the previous result, keeping our total at decimal 3.The fourth line of the B8__() macro is (x & 0x0000F000lu) ? 8 : 0). Thus this expands to 0x1011 & 0x0000F000 = 0x1000. This evidently evaluates to true and so the ternary operator returns 8, which is added to the previous result, giving a total of decimal 11 – and our final result.The B16 and B32 macros simply build upon this technique by using appropriate casts and shifts.Hope this helps. If not post again and I'll try and expand upon the explanation.

  5. Miro Samek says:

    Indeed, MISRA checker does not particularly like the macros, because they violate the following MISRA rules:macro HEX__(n) Violates MISRA Required Rule 98, Multiple use of '#/##' operators in macro definitionmacro B8__ Violates MISRA Rule 7, Trigraphs shall not be usedmacro B8__ Violates MISRA Rule 96, Expression-like macro '' not parenthesizedBut the macros can be restructured as follows to comply with MISRA:#define HEX__(n_) ((uint32_t)0x##n_)#define B8__(x_) \ ( ((x_) & 0x1LU) \ | (((x_) & 0x10LU) >> 3) \ | (((x_) & 0x100LU) >> 6) \ | (((x_) & 0x1000LU) >> 9) \ | (((x_) & 0x10000LU) >> 12) \ | (((x_) & 0x100000LU) >> 15) \ | (((x_) & 0x1000000LU) >> 18) \ | (((x_) & 0x10000000LU) >> 21))#define B8(b0_) ((uint8_t)B8__(HEX__(b0_)))#define B16(b1_,b0_) \ ((uint16_t)(B8__(HEX__(b0_)) | (B8__(HEX__(b1_)) << 8)))#define B32(b3_,b2_,b1_,b0_) \ ((uint32_t)(B8__(HEX__(b0_)) \ | (B8__(HEX__(b1_)) << 8) \ | (B8__(HEX__(b2_)) << 16) \ | (B8__(HEX__(b3_)) << 24)))While the macros are MISRA compliant, the actual use can sometimes lead to violating MISRA rule 19 (octal constant used), if you start your binary literal with a zero. Miro Samek

    • Lundin says:

      There are no trigraphs in this code. It would seem that your MISRA checker is a bit too trigger-happy (they all are). Upon manual code inspection, I can spot the following MISRA violations: use of function-like macros (19.7), only one occurance of ## in one macro (19.12), ## should not be used (19.13), U suffix required on integer literals (10.6), no implicit type conversions (10.1), octal integer literals should not be used (7.1), uintn_t should be used instead of the default integer data types (6.3), C99 should not be used (1.1).

  6. […] topics, if you have not read Mike Barr’s recent posting on binary literals, then I strongly recommend that you do so. It would have fitted very nicely into […]

  7. Lundin says:

    I never quite understood the need of binary literals. Back in school they wouldn’t let us write our first “hello world” before we knew binary and hex. I think you should be able to safely assume that every programmer can read hex. Apart from that, I personally think that *p_reg = ( (B8(010) << 5) | (B8(11) << 3) | (B8(101) << 0) ) is far harder to read than *p_reg = (2<<5) | (3<<5) | (5<<0);, not to mention *p_reg = X| Y | Z; where X Y Z are bit mask constants. In a real application, those bit masks would of course have meaningful names instead.

    • Peter P says:

      B32( 11111100 ,
      11000000 ,
      11000000 ,
      11111100 ),
      B32( 11000000 ,
      11000000 ,
      11000000 ,
      00000000 )

      obviously is an ‘F’ character image. Which you wouldn’t see if it were written “0xFCC0C0FC, 0xC0C0C000”.

      • Ricky says:

        If you want to do character images, you might want to use these macros:

        enum
        {
        ________,_______O,______O_,______OO,_____O__,_____O_O,_____OO_,_____OOO,
        ____O___,____O__O,____O_O_,____O_OO,____OO__,____OO_O,____OOO_,____OOOO,
        ___O____,___O___O,___O__O_,___O__OO,___O_O__,___O_O_O,___O_OO_,___O_OOO,
        ___OO___,___OO__O,___OO_O_,___OO_OO,___OOO__,___OOO_O,___OOOO_,___OOOOO,
        __O_____,__O____O,__O___O_,__O___OO,__O__O__,__O__O_O,__O__OO_,__O__OOO,
        __O_O___,__O_O__O,__O_O_O_,__O_O_OO,__O_OO__,__O_OO_O,__O_OOO_,__O_OOOO,
        __OO____,__OO___O,__OO__O_,__OO__OO,__OO_O__,__OO_O_O,__OO_OO_,__OO_OOO,
        __OOO___,__OOO__O,__OOO_O_,__OOO_OO,__OOOO__,__OOOO_O,__OOOOO_,__OOOOOO,
        _O______,_O_____O,_O____O_,_O____OO,_O___O__,_O___O_O,_O___OO_,_O___OOO,
        _O__O___,_O__O__O,_O__O_O_,_O__O_OO,_O__OO__,_O__OO_O,_O__OOO_,_O__OOOO,
        _O_O____,_O_O___O,_O_O__O_,_O_O__OO,_O_O_O__,_O_O_O_O,_O_O_OO_,_O_O_OOO,
        _O_OO___,_O_OO__O,_O_OO_O_,_O_OO_OO,_O_OOO__,_O_OOO_O,_O_OOOO_,_O_OOOOO,
        _OO_____,_OO____O,_OO___O_,_OO___OO,_OO__O__,_OO__O_O,_OO__OO_,_OO__OOO,
        _OO_O___,_OO_O__O,_OO_O_O_,_OO_O_OO,_OO_OO__,_OO_OO_O,_OO_OOO_,_OO_OOOO,
        _OOO____,_OOO___O,_OOO__O_,_OOO__OO,_OOO_O__,_OOO_O_O,_OOO_OO_,_OOO_OOO,
        _OOOO___,_OOOO__O,_OOOO_O_,_OOOO_OO,_OOOOO__,_OOOOO_O,_OOOOOO_,_OOOOOOO,
        O_______,O______O,O_____O_,O_____OO,O____O__,O____O_O,O____OO_,O____OOO,
        O___O___,O___O__O,O___O_O_,O___O_OO,O___OO__,O___OO_O,O___OOO_,O___OOOO,
        O__O____,O__O___O,O__O__O_,O__O__OO,O__O_O__,O__O_O_O,O__O_OO_,O__O_OOO,
        O__OO___,O__OO__O,O__OO_O_,O__OO_OO,O__OOO__,O__OOO_O,O__OOOO_,O__OOOOO,
        O_O_____,O_O____O,O_O___O_,O_O___OO,O_O__O__,O_O__O_O,O_O__OO_,O_O__OOO,
        O_O_O___,O_O_O__O,O_O_O_O_,O_O_O_OO,O_O_OO__,O_O_OO_O,O_O_OOO_,O_O_OOOO,
        O_OO____,O_OO___O,O_OO__O_,O_OO__OO,O_OO_O__,O_OO_O_O,O_OO_OO_,O_OO_OOO,
        O_OOO___,O_OOO__O,O_OOO_O_,O_OOO_OO,O_OOOO__,O_OOOO_O,O_OOOOO_,O_OOOOOO,
        OO______,OO_____O,OO____O_,OO____OO,OO___O__,OO___O_O,OO___OO_,OO___OOO,
        OO__O___,OO__O__O,OO__O_O_,OO__O_OO,OO__OO__,OO__OO_O,OO__OOO_,OO__OOOO,
        OO_O____,OO_O___O,OO_O__O_,OO_O__OO,OO_O_O__,OO_O_O_O,OO_O_OO_,OO_O_OOO,
        OO_OO___,OO_OO__O,OO_OO_O_,OO_OO_OO,OO_OOO__,OO_OOO_O,OO_OOOO_,OO_OOOOO,
        OOO_____,OOO____O,OOO___O_,OOO___OO,OOO__O__,OOO__O_O,OOO__OO_,OOO__OOO,
        OOO_O___,OOO_O__O,OOO_O_O_,OOO_O_OO,OOO_OO__,OOO_OO_O,OOO_OOO_,OOO_OOOO,
        OOOO____,OOOO___O,OOOO__O_,OOOO__OO,OOOO_O__,OOOO_O_O,OOOO_OO_,OOOO_OOO,
        OOOOO___,OOOOO__O,OOOOO_O_,OOOOO_OO,OOOOOO__,OOOOOO_O,OOOOOOO_,OOOOOOOO,
        };

        #define G8(n0) ((uint8_t) (n0))
        //!< Build a byte image

        #define G16(n1, n0) ((uint16_t) (((n1) << 8) | (n0)))
        //!< Build a halfword image

        #define G32(n3, n2, n1, n0)
        ((uint32_t) ((G16 ((n3), (n2)) << 16) | G16 ((n1), (n0))))
        //!< Build a word image

        #define G64(n7, n6, n5, n4, n3, n2, n1, n0)
        ((uint64_t) ((G32 ((n7), (n6), (n5), (n4)) * 0x100000000lu)
        | G32 ((n3), (n2), (n1), (n0))))
        //!< Build a long image

        Now you can do an F like so:

        G8 (OOOOOO__);
        G8 (OO______);
        G8 (OO______);
        G8 (OOOOOO__);
        G8 (OO______);
        G8 (OO______);
        G8 (OO______);
        G8 (________);

        By the way, upper case Os are used in the macros, not zeros.

    • Peter Painter says:

      It of course doesn’t make sense in the examples you give. However, sometimes a bit pattern is exactly that: a pattern. Imagine font data. If written as binary literals, you can ‘see’ the font letters (and any mistakes) while hex values are meaningless letters.
      Whenever a value represents some visual pattern, binary literals are an invaluable help.
      Actually, that’s my main use for binary literals.

  8. Julian Day says:

    Hmm, I’m not sure quite whether I like these or not. When working in C/C++, I’m very used to not having binary numeric literals and to suddenly see this slightly throws me. When working in Ada and other languages with binary numeric literals, I can hardly remember either using them or seeing them used.

    I’d be interested in seeing a strong case for using them, since I agree with Lundin above re the *p_reg example, but would actually use a macro or const for each of the fields rather than numeric literals.

    In any case, from a MISRA 2004 perspective, I think the biggest obstacle is rule 13.7, which states that “Boolean operations whose results are invariant shall not be permitted.” That’s the whole point of these macros, to take advantage of this invariance to generate a numeric literal from this binary representation at compile time. You can put disable warnings about the macro definitions and make the expansions MISRA type compliant, but you would need to ensure that 13.7 is not in effect at the point where the macro is expanded, since the expansion itself is not MISRAble, if that is the correct adjective.

    For reference, here’s a version of the macro that could be used with the GHS compiler’s built in rules checking:

    #ifndef BIN_H_
    #define BIN_H_

    #include

    /* Internal Macros */

    #pragma ghs startnomisra

    #define HEX__(n) 0x##n##LU
    #define B8__(x) ((uint8_t)(((x&0x0000000FLU)!=0UL)?1UL:0UL) \
    +(((x&0x000000F0LU)!=0UL)?2UL:0UL) \
    +(((x&0x00000F00LU)!=0UL)?4UL:0UL) \
    +(((x&0x0000F000LU)!=0UL)?8UL:0UL) \
    +(((x&0x000F0000LU)!=0UL)?16UL:0UL) \
    +(((x&0x00F00000LU)!=0UL)?32UL:0UL) \
    +(((x&0x0F000000LU)!=0UL)?64UL:0UL) \
    +(((x&0xF0000000LU)!=0UL)?128UL:0UL))

    /* User-visible Macros */
    #define B8(d) ((uint8_t)B8__(HEX__(d)))
    #define B16(dmsb,dlsb) ((uint16_t)((uint16_t)(B8(dmsb)<<8) + ((uint16_t)B8(dlsb))))
    #define B32(dmsb,db2,db3,dlsb) \
    ((uint32_t)((uint32_t)B16(dmsb,db2)<<16) + ((uint32_t)B16(db3,dlsb)))

    #pragma ghs endnomisra

    #endif /* BIN_H_ */

    The GHS compiler would evaluate the binary expresssions as the equivalent numeric literal at any optimisation level, which after all is the point of them.

    • Julian Day says:

      I’ve just noticed that the #include in the above is missing the stdint.h part of the line, probably because of the angle brackets.

Leave a Reply to Julian Day