embedded software boot camp

Computing your stack size

Sunday, March 1st, 2009 by Nigel Jones

Many of the folks that come to this blog by way of search engines do so because they are having problems with stack overflow. I’ve already given my take on the likely causes of a stack overflow. Today I’d like to offer some hints on a related topic – how to set about computing the stack size for your application. This is an extremely difficult problem, which can be approached in one of three ways – experimentally, analytically or randomly. The latter is by far the most common technique, which consists essentially of choosing a number and seeing whether it works! In an effort to reduce the use of the random approach, I’ll try and summarize the other two methods.

Experimentally

In the experimental method, a typically very large stack size is selected. The stack is then filled with an arbitrary bit pattern such as 0xFC. The code is then executed for a ‘suitable’ amount of time, and then the stack is examined to see how far it has grown. Most people will typically take this number, add a safety margin and call it the required stack size. The main advantage of this approach is that it’s easy to do (indeed many good debuggers have this feature built in to them). It also has the advantage of being ‘experimental data’. However, there are two big problems with this approach, which will catch the unwary.

The biggest single problem with the experimental approach is the implicit assumption that the experiment that is run is representative of the worst case conditions. What do I mean by the worst case conditions? Well, the maximum stack size occurs in an embedded system when an interrupt that uses the most stack size occurs at a point in the code that the foreground application is also using the maximum stack size. On the assumption that most interrupts are asynchronous to the foreground application, the problem should be clear. How exactly do you know after your testing whether or not the interrupt that uses the most stack size did indeed trigger at the worst (best?) possible moment? Thus even if your testing had 100% code coverage, it still isn’t possible to know for sure whether you have covered all possible scenarios. If, as is the normal case, you don’t even begin to approach full code coverage, it should be clear to you that testing tends to reveal the typical ‘worst-case’ condition, rather than the genuine worst case condition.

The second major issue with testing is that it tends to be done when the code is close to being completed, rather than when it is completed. The problem is that small changes in the source code can have a huge impact on the required stack size. For example, let’s say that during testing it is discovered that an interrupt service routine is taking so long to complete that another interrupt is being occasionally missed. A ‘quick fix’ is to simply enable interrupts in the long interrupt handler, so that the other interrupts can do their thing. This one line change can lead to a dramatic increase in stack usage. (If you aren’t cognizant of the stack usage of interrupt handlers, you should read this article I wrote).

Analytically

In the analytical approach, the idea is to examine the source code and from the analysis work out the maximum stack usage of the foreground application, and then to add to this the worst case interrupt handler usage. This is obviously a daunting task for anything but the simplest of applications. You will not be surprised to hear that computer programs have been written to perform this analysis. Indeed good quality linkers will now do this for you as a matter of course. Furthermore, my favorite third party tool, PC-Lint from Gimpel, will also now do this starting with version 9. However be warned that it takes a lot of work to set up PC-Lint to perform the analysis.

Although analysis can theoretically give an accurate answer, it does have several problems.

Recursion

It’s almost impossible for an analytical approach to compute the stack usage of a program that uses recursion. Indeed it’s because of the unbounded effect on stack size that recursion is a really bad idea in embedded systems. Indeed MISRA bans it, and I personally banned it about twenty years ago.

Indirect Function Calls

Pointers to functions are something that I use extensively and heartily recommend (for a discussion see this article I wrote). Although they don’t have a deleterious effect on stack size, they do make it quite difficult for analysis programs to track what is going on. Indeed PC-Lint cannot handle pointers to functions when it comes to computing stack usage. Thus if you use an analytical approach and you use pointers to functions, then make absolutely sure that the analysis program can track all the indirect calls.

Optimizers

Code optimization can play havoc with the stack usage. Some optimizations reduce stack usage (by e.g. placing function parameters in registers), while others can increase stack usage. I should note that it’s only third party tools that should be bamboozled by the optimizer. The linker that makes up part of the compilation package should be aware of everything that the compiler has done.

Complexity

Even if you have a linker that will compute stack usage, interpreting the output of the linker is always a daunting task. For example, the linker from IAR will compute your stack usage. However, it isn’t nice enough to simply say: You need 279 bytes of stack space. Instead you have to study the linker output carefully to glean the requisite information.

A Practical Approach

It’s clear from the above that it isn’t easy to determine the stack size for an application. So how exactly does one set about this in practice? Well here’s what I do.

  1. Locate the stack at the beginning / end of memory (depending upon how the stack grows) and place all variables at the other end of the memory. This essentially means that you are implicitly allowing the maximum amount of memory possible for the stack. Note that many good compilers / linkers will do this automatically for you.
  2. As a starting point, I allocate 10% of the available memory for stack use. If I know I will be using functions that are huge users of the stack (such as printf, scanf and their brethren), then I’ll typically set it to 20% of available memory.
  3. I set up the debug environment from day 1 to monitor and report stack usage. This way as I progress through the development process I get a very good feel for the application’s stack consumption. This also helps in spotting changes to the code that have big impacts on the required stack size.
  4. Once I have ‘all the code written’, I start to make use of the information in the linker report. The more tight I am on memory, the closer I examine the linker output. In particular, what I often find is that there is one and only one function call chain that leads to a stack usage that is much greater than all the other call chains. In which case, I look to see if I can restructure that call chain so as to bring the maximum stage usage more in line with the typical stack usage.

If you stumbled upon this blog courtesy of a search engine, then I hope you found the above useful. I invite you to check out some of my other posts, which you may find useful. If you are a regular reader, then as always, thanks for stopping by.

Home

15 Responses to “Computing your stack size”

  1. Michael Barr says:

    Since you can never “know” for certain how deep your stack might go (excepting simple programs with no recursion or pointers to functions), it is also smart to add stack checking to your watchdog task. I plan to blog about this at http://www.embeddedgurus.net/barr-code shortly.

  2. Nigel Jones says:

    I look forward to hearing what you have to say on this Mike.

  3. Alan Bowens says:

    Nice article on function pointers. I’ve been using exactly the same state machine approach for a while now, but you give a cogent explanation that I can hand to newbies to quickly get them up to speed.

  4. GregK says:

    My advise:Is worth to keep your compiler options constant during development. Small change from -O2 to -O3 could cause big different in stack(s) consumptions.

  5. Nigel Jones says:

    GregK is quite correct. I touched upon this topic here. Like many other developers, I build both a debug and a release target. With the debug target, I typically have optimization set to a minimum. With the release target, I always have optimization set to the maximum. One of the reasons is because changing it can dramatically affect your stack size. Incidentally, I have had the odd occasion in which I have had to switch from speed to size optimization. As and when you do this, it’s essential to recompute your stack size.

  6. Scotty says:

    Hey, thanks for the tips. I wish I would’ve come across this at the beginning of the project.

  7. Nigel Jones says:

    I'm glad you found them useful.

  8. balaji says:

    Hi Nigel,

    Can you give more information on examining the linker output when we are tight on memory.

    • Nigel Jones says:

      Sure. All good linkers allow you to request a call tree report. Different vendors call it different things. In IAR’s case, they call it a “static overlay map”. What this does is show you all the functions called by a function and the size of the stack at each stage of the call. Here is a link to an example of the output.

      By carefully examining this information it is possible to very accurately determine your stack usage. It’s also very useful when you are in a pinch for determining where to restructure your code so as to minimize stack usage. For example, you may decide to disable a stack consuming interrupt in the worst case function so as to minimize the peak stack usage.
      I hope this helps.

  9. GADDOUR zied says:

    I begin to develop a program for LPC2129 with IAR EWARM 5.4
    I found that there is a file. ICF when I must to configure for that the linker
    to the IAR generates. hex specific to my application.
    i have problem to understand how i can configure its parameters.

    i found an example of file .ICF in an example of project with IAR.

    program_in_flash.icf file in leds example provided with iar contains these
    details.

    /*-Memory Regions-*/
    define symbol __ICFEDIT_region_ROM_start__ = 0×00000044;
    define symbol __ICFEDIT_region_ROM_end__ = 0x0003FFFF;
    define symbol __ICFEDIT_region_RAM_start__ = 0×40000040;
    define symbol __ICFEDIT_region_RAM_end__ = 0x40003FFF;
    /*-Sizes-*/
    define symbol __ICFEDIT_size_cstack__ = 0×200;
    define symbol __ICFEDIT_size_svcstack__ = 0×0;
    define symbol __ICFEDIT_size_irqstack__ = 0×100;
    define symbol __ICFEDIT_size_fiqstack__ = 0×0;
    define symbol __ICFEDIT_size_undstack__ = 0×0;
    define symbol __ICFEDIT_size_abtstack__ = 0×0;
    define symbol __ICFEDIT_size_heap__ = 0×200;

    according datasheet ROM_start = 0×00000000. why is declared 0×00000044 ?
    same RAM_start = 0×40000000. I don’t understand why is declared 0×40000040 ?

    and how to initialize the sizes of the stacks?
    what criteria of choice are initialized these stacks?
    I find in other examples:

    they are different parameters of the first example

    /*-Sizes-*/
    define symbol __ICFEDIT_size_cstack__ = 0×2000;
    define symbol __ICFEDIT_size_svcstack__ = 0×100;
    define symbol __ICFEDIT_size_irqstack__ = 0×100;
    define symbol __ICFEDIT_size_fiqstack__ = 0×100;
    define symbol __ICFEDIT_size_undstack__ = 0×100;
    define symbol __ICFEDIT_size_abtstack__ = 0×100;
    define symbol __ICFEDIT_size_heap__ = 0×8000;

    I do not understand what I put as the size for each stack

    • Nigel Jones says:

      I haven’t worked with the LPC2129. I will guess that ROM_start__ is defined to be at 0×44 because the first 0×44 bytes are reserved for the interrupt vector table. I don’t know about the RAM – possibly that the first 0×40 bytes are actually consumed by special function registers? As for what to set the various stack sizes to. Well clearly in the first example a lot of the stacks aren’t being used. Based on their names it appears that one stack is reserved for a fast interrupt (for example). If you aren’t using a fast interrupt then you don’t need stack space for it. Conversely if you are using a fast interrupt then you will need to reserve stack space for it. The amount depends on what the fast interrupt handler does.

  10. Ashleigh says:

    In everything I do, I use the approach of (1) taking a best guess based on program structure and general knowledge of the program (which strange enough ends up around 10% to 20% of RAM). Then (2) examine linker output, though not quite as systematically as I should. Then (3) I use the method of filling RAM with a known value and checking RAM contents using the debugger after running a while, to see how much stack was actually used. And then finally, I place a WORD with a known value in it, located by linker directives directly below the stack (assuming stack grows down). My background executive task then examines this word, checking for the known correct value, and does this about once / second. The placement of a known value using a const __root __noinit declaration is very easy using IAR, as is the checking. This method gives an ongoign runtime check. A failure of the check leads to a reboot (and I usually also store a forced-reboot / stack-overflow-detected counter in non-volatile memory so that it can be checked later if needed). This last step might be paranoia but I regard it as a good practice in something that must run for years.

    • Dushyant says:

      I follow the similar approach but at the last i have run time check a bit differently as mentioned by Ashleigh. In my main loop I use to keep checking the SP against the (stack base address in RAM – required stack usage in current task). If it doesn’t match i request software reboot after logging the cause as stack mismatch. This log is monitored through the development cycle & ever after release in logging information. I find it easy way to know my stack usage.

      Ashleigh says:
      January 4, 2013 at 7:35 pm

      In everything I do, I use the approach of (1) taking a best guess based on program structure and general knowledge of the program (which strange enough ends up around 10% to 20% of RAM). Then (2) examine linker output, though not quite as systematically as I should. Then (3) I use the method of filling RAM with a known value and checking RAM contents using the debugger after running a while, to see how much stack was actually used.

  11. Anonymous says:

    What is CStack and Rstack in the call-tree ?

  12. [...] Nigel Jones, “Computing Your Stack Size: Stack Overflow” 2. John Regehr, “Say no to stack overflow,” EE Times Design, 2004.3. Carnegie Mellon [...]

Leave a Reply