embedded software boot camp

Efficient C Tips #1 – Choosing the correct integer size

Sunday, June 15th, 2008 by Nigel Jones

From time to time I write articles for Embedded Systems Design magazine. A number of these articles have concentrated on how to write efficient C for an embedded target. Whenever I write these articles I always get emails from people asking me two questions:

1. How did you learn this stuff?
2. Is there somewhere I can go to learn more?

The answer to the first question is a bit long winded and consists of:
1. I read compiler manuals (yes, I do need a life).
2. I experiment.
3. Whenever I see a strange coding construct, I ask the author why they are doing it that way. From time to time I pick up some gems.
4. I think hard about what the compiler has to do in order to satisfy a particular coding construct. It’s really helpful if you know assembly language for this stage.

The answer to the second question is short: No!

To help rectify this, in my copious free time I’ll consider putting together a one day course on how to write efficient C for embedded systems. If this is of interest to you then please contact me .

In the interim, I’d like to offer up my first tip on how to choose the correct integer size.

In my experience in writing programs for both embedded systems and computers, I’d say that greater than 95% of all the integers used by those programs could fit into an 8 bit variable. The question is, what sort of integer should one use in order to make the code the most efficient? Most computer programmers who use C will be puzzled by this question. After all the data type ‘int’ is supposed to be an integer type that is at least 16 bits that represents the natural word length of the target system. Thus, one should simply use the ‘int’ data type.

In the embedded world, however, such a trite answer will quickly get you into trouble – for at least three reasons.
1. For 8 bit microcontrollers, the natural word length is 8 bits. However you can’t represent an ‘int’ data type in 8 bits and remain C99 compliant. Some compiler manufacturer’s eschew C99 compliance and make the ‘int’ type 8 bits (at least one PIC compiler does this), while others simply say we are compliant and if you are stupid enough to use an ‘int’ when another data type makes more sense then that’s your problem.
2. For some processors there is a difference between the natural word length of the CPU and the natural word length of the (external) memory bus. Thus the optimal integer type can actually depend upon where it is stored.
3. The ‘int’ data type is signed. Much, indeed most, of the embedded world is unsigned, and those of us that have worked in it for a long time have found that working with unsigned integers is a lot faster and a lot safer than working with signed integers, or even worse a mix of signed and unsigned integers. (I’ll make this the subject of another blog post).

Thus the bottom line is that using the ‘int’ data type can get you into a world of trouble. Most embedded programmers are aware of this, which is why when you look at embedded code, you’ll see a veritable maelstrom of user defined data types such as UINT8, INT32, WORD, DWORD etc. Although these should ensure that there is no ambiguity about the data type being used for a particular construct, it still doesn’t solve the problem about whether the data type is optimal or not. For example, consider the following simple code fragment for doing something 100 times:

TBD_DATATYPE i;

for (i = 0; i < 100; i++)
{
 // Do something 100 times
}

Please ignore all other issues other than what data type should the loop variable ‘i’ be?Well evidently, it needs to be at least 8 bits wide and so we would appear to have a choice of 8,16,32 or even 64 bits as our underlying data type. Now if you are writing code for a particular CPU then you should know whether it is an 8, 16, 32 or 64 bit CPU and thus you could make your choice based on this factor alone. However, is a 16 bit integer always the best choice for a particular 16 bit CPU? And what about if you are trying to write portable code that is supposed to be used on a plethora of targets? Finally, what exactly do we mean by ‘optimal’ or ‘efficient’ code?I wrestled with these problems for many years before finally realizing that the C99 standards committee has solved this problem for us. Quite a few people now know that the C99 standard standardized the naming conventions for specific integer types (int8_t, uint8_t, int16_t etc). What isn’t so well known is that they also defined data types which are “minimum width” and also “fastest width”. To see if your compiler is C99 compliant, open up stdint.h. If it is compliant, as well as the uint8_t etc data types, you’ll also see at least two other sections – minimum width types and fastest minimum width types.

An example will help clarify the situation:

Fixed width unsigned 8 bit integer: uint8_t

Minimum width unsigned 8 bit integer: uint_least8_t

Fastest minimum width unsigned 8 bit integer: uint_fast8_t

Thus a uint8_t is guaranteed to be exactly 8 bits wide. A uint_least8_t is the smallest integer guaranteed to be at least 8 bits wide. An uint_fast8_t is the fastest integer guaranteed to be at least 8 bits wide. So we can now finally answer our question. If we are trying to consume the minimum amount of data memory, then our TBD_DATATYPE should be uint_least8_t. If we are trying to make our code run as fast as possible then we should use uint_fast8_t. Thus the bottom line is this. If you want to start writing efficient, portable embedded code, the first step you should take is start using the C99 data types ‘least’ and ‘fast’. If your compiler isn’t C99 compliant then complain until it is – or change vendors. If you make this change I think you’ll be pleasantly surprised at the improvements in code size and speed that you’ll achieve.

Next Tip

Home

28 Responses to “Efficient C Tips #1 – Choosing the correct integer size”

  1. Uhmmmm says:

    You mentioned that a lot of programmers use their own defined types: UINT8, INT32, WORD, DWORD, and such.Before C99, I could understand that. But I see this still happen in projects that are using C99 compliant toolchains. Do you have any idea why people continue to do this instead of use the C99 types? Because I sure don’t – in some cases, they simply did “typedef uint32_t UINT32;”, so they certainly knew about the existence of the C99 type in the first place …

    • Francescomm says:

      If they have to port their code on a non C99 compliant compiler, they just have to change their own UINT8, UINT32,.. definitions. If they suddenly decide to make their UINT8 16 bits for some platform, it’s ugly and crazy but it may work. If they don’t want to include stdint.h, they can. It is like a layer of i-do-what-i-wantedness..

  2. Nigel Jones says:

    I think it’s partly in house coding standards and partly personal preference. The former is understandable; the latter is inexcusable. Personally I hate the syntax that the C99 committee came up with. However I still use it.

  3. Anonymous says:

    HiThanks for the tip, I didn't know about these optimized types. I like C99 syntax, unfortunately many companies uses its own data types. This makes code less readable, portable and source of bugs as well.Best Regards,Vlad

  4. Nigel Jones says:

    Indeed they do. The sooner that everyone starts using the C99 syntax the better IMHO.

  5. ashleigh says:

    I've written a lot of embedded code over the years, in come cases code that has to compile on many platforms (one case – the code compiles on at least 12 different processors, with toolchains ranging from 8 bit embedded, 16 bit, 32 bit, some using off the shelf compilers, some gcc, some borland c, some linux gcc…)In this case, C99 compliance can't be assured. The only way to go is a bit fat file full of #if's that detect the particular compiler, and define the in-house standards int8u, int8s, int16u, etc etc (including a fast_int type as well).This works exceptionally well when you want extremely portable code and can't assuse C99 for anything.

  6. Anonymous says:

    One thing overlooked in this enthusiasm for C99 is code validation: Let's say the compiler uses a two byte native operand for the "fast" type in the example. And lets say someone comes back and creates a bug by extended the loop limit to 300. Maybe the compiler notices this and generates a warning based on the compare, but probably it doesn't. And the code tests great. So now we think the code is solid. But, of course, it breaks as soon as we port it to "fast is 8 bits"–possibly in a non-obvious way.Mike Layton

    • Charles says:

      I would think the compiler should be able to generate a warning:
      we would be testing a variable of 8bit width against an out of bound constant
      shouldn’t it?

      • Kevin Granade says:

        One problem with this is that if you are targeting multiple platforms, it may compile on some and not on others at that point. You’ve asked for “at least 8 bits”, will it generate a warning or error on a given architecture even if uint_fast8_t is actually 16 bits? I’m pretty sure it wouldn’t, as the definition of uint_fast8_t on my system is “typedef unsigned char uint_fast8_t;”. If it happened to be a 16 bit variable instead, it would do bounds checking against a 16 bit variable, so could miss the constraint violation.

        Regardless, with gcc the following code did not generate an error or warning, and performed as expected, infinite loop:

        #include

        int main()
        {
        uint_fast8_t i;

        for(i = 0; i < 3000; ++i)
        {
        ;
        }
        return 0;
        }

        • Anonymous says:

          The engineer who modified the hard coded constant should have been checking for side affects… however.

          The original cautions engineer probably would have written in the first place:

          // In some header file…
          #define MY_VALUE 3000 // It was 100. But i need more!

          int main()
          {
          uint_fast8_t i;

          // Pre condition
          #if MY_VALUE > UINT_FAST8_MAX
          #error “Houston, we have a problem.”
          #endif

          for (i = 0; i < MY_VALUE; ++i) {

          }
          return 0;
          }

  7. Nigel Jones says:

    Mike – I like your observation. It is indeed a weakness of these C99 data types that I had not previously considered.

  8. Anonymous says:

    "Indeed they do. The sooner that everyone starts using the C99 syntax the better IMHO.1/25/2010 2:33 PM"Nigel, I understand your enthusiasm, but C99 was adopted in 2000, your comment is about 10 years later – perhaps it's time to think of why everyone hasn't started using it and get back to coding.JW

  9. Steve Karg says:

    ashleigh: for my portable code using C99 standard integers, I try not to litter my code with #if's, but instead just have a ports/xx directory that includes a stdint.h for the compiler that is lacking C99 support for standard integers.Nigel: as for using fast/least standard integers in a project, I have I have two rules about optimizing that I learned while studying Extreme Programming:Rule 1: Don't do it.Rule 2: (for experts only). Don't do it yet – that is, not until you have a perfectly clear and unoptimized solution.

  10. Greg Nelson says:

    Steve Karg said (basically): "Extreme Programming teaches you 'don't optimize.'"In my opinion, this simply proves that Extreme Programming is for C# programmers on 64-bit 3GHz windows boxes with infinite swap space.In one of my most recent programs, I had to build my own version of "printf("%02d")" because the call to printf made my application no longer fit into the 256 bytes of available RAM. "Don't optimize" under these circumstances equates to "don't get the job done." Not an option.

    • Charles says:

      With 256 bytes, I’d go back to assembly
      Like Nigel, I started with assembly and very little memory available and it was great fun; now our processors have evolved and our code base supports 3 (cached!) architectures, I find it ugly!
      What I found more difficult when writing assembly was very often we were hitting non documented bugs of the processor and had to resort to NOP insertion after lengthy discussion with architecture team (“it’s the software!”, “it’s the hardware!”)

  11. Nigel Jones says:

    I couldn't agree more. Not withstanding the issues you've raised, IMHO not using the optimizer leaves you vulnerable to not finding mistakes such as failing to declare appropriate variables as volatile, failure to notice that certain code is useless etc.

  12. KP says:

    As always nice post. Thanks.

    Btw, what does uint_fast8_t mean? I mean what does ‘fastest integer’ mean?

    • Nigel Jones says:

      In a nutshell, there are certain architectures in which there is an integer size which is faster to use than other sizes. For example consider a 32 bit processor which handles 32 bit integers optimally and 8 bit integers inefficiently. If you declare a variable as being uint_fast8_t for that architecture, then the compiler will likely use a 32 bit integer. Conversely if you specify the variable as being uint8_t, then you are forcing the compiler to use an inefficient data type. The bottom line is that use of the ‘fast’ types will usually result in the most optimal results. However be aware of pitfalls. For example you implement code with a uint_fast8_t which actually requires the variable to have 16 bits of resolution. The code will work on one architecture where uint8_fast8_t is implemented with at least 16 bits, but not on another architecture where it’s implemented as 8 bits.

  13. Markus says:

    Hello Nigel, thanks a lot for your articles!
    I’m currently brushing up on my C and didn’t know about the “least” and “fast” types before. I’m wondering what’s the point of having both uint8_t and uint_least8_t? uint8_t is supposed to give me an integer which definitely is 8 bits wide. uint_least8_t is supposed to give me the smallest type to store 8 bits. Are there architectures which can’t address single bytes and where uint_least8_t could be, say, a 16 bit word (in which case there couldn’t be a true uint8_t)?

    • Nigel Jones says:

      Apologies for the delay in approving your comment. I’ve been receiving bucket loads of spam and I’m afraid your comment got lost in the deluge. I understand you being perplexed. I regularly use uint8_t and uint_fast8_t but have struggled to understand the real value of uint_least8_t. The ‘least’ data types make a bit more sense with the larger word sizes. For example consider some code that you are writing for a 32 bit integer CPU. If you are concerned that it might be used at some point on a 16 bit integer machine then using uint_least32_t might make sense.

  14. Markus says:

    No worries. I see, so it’s more like a hint to the programmer. I like the idea. Thanks for explaining!

  15. mustafa eral says:

    Warn me if I am wrong but assume my cpu is 32 bit (its word size is 4 byte), and for the for loop example you give,
    if I use unsigned 8 bit, then compiler will try to mask its remaining 24 bit because word size is 32 bit and even if I use
    char it will consume 32 bit memory with upper 24 bit masked(zero padded) and this will require more instructions
    and reduce the code speed

  16. […] All, Last night I read an article on choosing correct integer size https://embeddedgurus.com/stack-overf…-integer-size/. Before reading this article, I was unaware of these three keywords. These keywords are: 1) Fixed […]

  17. […] yazı Nigel Jones’un EmbeddedGurus da yazdığı bir yazıdır. Yazının orjinal haline bu linkten […]

  18. Honey says:

    Hello,

    Nice article. Dealing with fast data types make so much sense, however how to deal with type incompatibility that would require explicit casts in the code and not adding them would result in MISRA violations.

    A simple example is that for a 32 bit MCU, the compiler would define uint_8_fast typedefed as an unsigned long i.e. 32bit integer.

    so an assignment of type uint_8_fast to a uint8 variable shall require an explicit cast.

    Any suggestions how to get rid of such a scenario.

    Thanks.

  19. Mikel says:

    Thanks a lot for such a helpful article!
    Would you please let me know how I can modify the code for a 2-D lookup table?
    Regards

  20. Richard Nelson says:

    I view fast_xx as internal data.
    Least_xx as internal product API.
    intxx_t as external to internal ABI,wich has to be much more stringently enforced.

Leave a Reply

You must be logged in to post a comment.