embedded software boot camp

Three byte integers

Wednesday, June 10th, 2009 by Nigel Jones

One of the enduring myths about the C language is that it is good for use on embedded systems. I’ve always been puzzled by this. While it is true that many other languages are dreadful for use on embedded systems, this merely means that C is less dreadful rather than ‘good’. While I have a host of issues with C, the one that constantly galls me is the lack of 3 byte integers. Using C99 notation these would be the uint24_t and int24_t data types. Now a quick web search indicates that there may be the odd compiler out there that supports 3 byte integers – but the vast majority do not.

So why exactly do I want a 3 byte integer? Well, there are two main reasons:

Intermediate results

When I look through my code, I find a huge number of incidences where I am performing an arithmetic operation on a 16 bit value, where intermediate values overflow 16 bits, yet the final value is 16 bits. For example:

uint16_t a, b;

a = (b * 51) / 64;

In this case, the code will fail if (b * 51) overflows 16 bits. As a result, I am forced to write:

a = ((uint32_t)b * 51) / 64;

However, examination of this code shows that (b * 51) could never overflow 24 bits for all 16 bit b. Thus I’d much rather write:

a = ((uint24_t)b * 51) / 64;

Now obviously on a 32 bit processor there would be zero benefit to doing this (indeed there may be a penalty). However on an 8 bit (and probably a 16 bit) processor, there would be a dramatic benefit to such a construct.

Real world values

I regularly find myself needing a static variable that requires more than 16 bits of range. However when I look at these variables they almost never require the staggering range of a 32 bit variable. Instead 24 bits would do very nicely. Needless to say I am forced to allocate 32 bits even though I know that the most significant byte will never take on anything other than zero. This is particularly galling when these variables are stored in EEPROM – with its associated cost and long write times.

Taking these two together across all the 8/16 bit embedded systems out there, the cost in wasted instruction cycles, memory, stack size and energy must be truly staggering. We could probably save a power plant or two world wide with all the energy being wasted!

So why don’t most compiler vendors support a 24 bit integer? I don’t know for sure, but I suspect it is some combination of:

  • No one has been asking for it.
  • They are more concerned with being C89 / C99 compliant than they are with being useful.
  • No one has ever implemented a compiler benchmark where support for a 3 byte integer would be useful.

If you happen to agree with me that a 3 byte integer would be very useful, then next time you see your friendly compiler vendor – complain (or at least point them to this blog). Who knows, change may yet come!

Home

9 Responses to “Three byte integers”

  1. Uhmmmm says:

    In the first example of the temporary value overflowing, there's no reason (in theory) that the compiler couldn't determine that a 24 bit temporary is enough in many cases. Certainly for a constant coefficient, it should be able to determine it.Not that I know of any compiler that does it. But in theory it's possible without a new type.

  2. Tom Evans says:

    > Now a quick web search indicates> that there may be the odd compiler> out there that supports 3 byte> integers – but the vast majority> do not.I know of one that did, sort of. But they fixed that bug.The Renesas M32C/80 series has four 16-bit data registers (with R0 and R1 only supporting byte ops, and R0R2 and R1R3 supporting 32-bit ops) and two 24-bit “Address Registers”. It also has 24-bit SB, FP, PC and so on.One version of the compiler preferred to perform all pointer arithmetic in the 3-byte address registers. This was a good use of the resources (leaving the two 32-bit registers free for data ops). The next version of the compiler generated completely different code, performing all address arithmetic in the 4-byte R0R2 and R1R3 registers, and then copying to A0 and A1 for access. I don't know if this was to pass some “standard C test code” or to share compiler technology with other CPUs, but this is an example of “efficient use of 3-byte arithmetic” that was then removed.If a compiler did support “int24_t” then it would make the integral promotion rules a lot more interesting.

  3. Nigel Jones says:

    Very interesting Tom. When I wrote the blog posting, I was thinking exclusively about pure integers. However, your comment reminds me that many processors have an address space that fits nicely into 24 bits – and thus that pointer types are often represented as 3 byte types. Thus if a compiler can support three byte pointer arithmetic, surely it would easy enough to add support for three byte integers?

  4. hth313 says:

    I do not think it is all that easy.You need many more operations for integers compared to pointers. You need to add; * / << >> switch, signed compare and more cast rules.That is just to get it to work, then you may need to make it generate good code as well, and that may cost a lot more.I think you hit the real reasons with your original post, lack of people asking for it and lack of code base (for benchmarking) that suggest it would be good. If done by a compiler vendor, it may just end up being regarded as something odd that is a potential source of portability problems.You will also end up in a funny situation with integral promotion. Will it be promoted to something smaller or bigger (or even same) as int24_t? Implementations will be all over the place.With short, you know int will be at least as large, and long may equal or larger than int.With int24_t, you would have no firm rule on which side of int it will be, add in standard integral promotion and you will be in for some interesting portability issues going between 16 and 32 bit ints.What about printf-formatting? Should int24_t be promoted to int if it fits inside it, or long if it does not?But in general I agree with the article, a 24 bit size makes sense. 16-bit is sometimes small and you seldom need all the bits in a 32-bit.

  5. Nigel Jones says:

    A very thoughtful comment hth313. I must say I'm always impressed by the quality of the comments I get on this blog. I take your point that adding support for 3 byte integers is non trivial. In fact I'm sure a compiler writer could probably come up with a few more objections. I guess my main thought is that an embedded C compiler is supposed to be a useful tool to allow us to get our jobs done. Instead it's sub-optimal for reasons that have virtually nothing to do with our industry. Given the size of the embedded market place, surely we deserve tools that are optimal for us? Why do we always have to play second fiddle to the general computing market place?

  6. Jens Bauer says:

    Another reason would be that working with 24-bit images would be straightforward.
    I often wish I had a 24-bit integer for this purpose, so you could index your image like this:

    uint32_t *destination;
    uint24_t *source;
    for(i = 0; i < width * height; i++)
    {
    destination[i] = source[i];
    }

    -That's a simple example. Of course I'm thinking about manipulating the data as well.
    But the above would make the source code a lot easier to look at, when comparing to the alternative!

  7. Shameless coder says:

    Another good reason for (u)int24_t types is compression. You sometimes have a large amount of data which you know to be under 2^24 in value.

    This can happen in databases (where columns have certain constraints and can get really long), or in graphics (RGB) and so on.

  8. João Baptista says:

    Bitfields to the rescue! (note: you NEED to use #pragma pack for this to be effectful) Use this:

    #pragma pack(push,1)
    struct uint24_t { unsigned long v:24; };
    #pragma pack(pop)

    Well, it’s more clumsy (you need to append .v everytime, but it seems to have some effect).

  9. Emilia Sims says:

    It’s not portable, but the AVR port of GCC (of Arduino fame) supports 24-bit integers, as __int24 and __uint24.

    (This is just exposing something the compiler had to implement internally because code pointers are 3 bytes long on models with >64K of ROM.)

Leave a Reply