embedded software boot camp

Computing your stack size

March 1st, 2009 by Nigel Jones

Many of the folks that come to this blog by way of search engines do so because they are having problems with stack overflow. I’ve already given my take on the likely causes of a stack overflow. Today I’d like to offer some hints on a related topic – how to set about computing the stack size for your application. This is an extremely difficult problem, which can be approached in one of three ways – experimentally, analytically or randomly. The latter is by far the most common technique, which consists essentially of choosing a number and seeing whether it works! In an effort to reduce the use of the random approach, I’ll try and summarize the other two methods.

Experimentally

In the experimental method, a typically very large stack size is selected. The stack is then filled with an arbitrary bit pattern such as 0xFC. The code is then executed for a ‘suitable’ amount of time, and then the stack is examined to see how far it has grown. Most people will typically take this number, add a safety margin and call it the required stack size. The main advantage of this approach is that it’s easy to do (indeed many good debuggers have this feature built in to them). It also has the advantage of being ‘experimental data’. However, there are two big problems with this approach, which will catch the unwary.

The biggest single problem with the experimental approach is the implicit assumption that the experiment that is run is representative of the worst case conditions. What do I mean by the worst case conditions? Well, the maximum stack size occurs in an embedded system when an interrupt that uses the most stack size occurs at a point in the code that the foreground application is also using the maximum stack size. On the assumption that most interrupts are asynchronous to the foreground application, the problem should be clear. How exactly do you know after your testing whether or not the interrupt that uses the most stack size did indeed trigger at the worst (best?) possible moment? Thus even if your testing had 100% code coverage, it still isn’t possible to know for sure whether you have covered all possible scenarios. If, as is the normal case, you don’t even begin to approach full code coverage, it should be clear to you that testing tends to reveal the typical ‘worst-case’ condition, rather than the genuine worst case condition.

The second major issue with testing is that it tends to be done when the code is close to being completed, rather than when it is completed. The problem is that small changes in the source code can have a huge impact on the required stack size. For example, let’s say that during testing it is discovered that an interrupt service routine is taking so long to complete that another interrupt is being occasionally missed. A ‘quick fix’ is to simply enable interrupts in the long interrupt handler, so that the other interrupts can do their thing. This one line change can lead to a dramatic increase in stack usage. (If you aren’t cognizant of the stack usage of interrupt handlers, you should read this article I wrote).

Analytically

In the analytical approach, the idea is to examine the source code and from the analysis work out the maximum stack usage of the foreground application, and then to add to this the worst case interrupt handler usage. This is obviously a daunting task for anything but the simplest of applications. You will not be surprised to hear that computer programs have been written to perform this analysis. Indeed good quality linkers will now do this for you as a matter of course. Furthermore, my favorite third party tool, PC-Lint from Gimpel, will also now do this starting with version 9. However be warned that it takes a lot of work to set up PC-Lint to perform the analysis.

Although analysis can theoretically give an accurate answer, it does have several problems.

Recursion

It’s almost impossible for an analytical approach to compute the stack usage of a program that uses recursion. Indeed it’s because of the unbounded effect on stack size that recursion is a really bad idea in embedded systems. Indeed MISRA bans it, and I personally banned it about twenty years ago.

Indirect Function Calls

Pointers to functions are something that I use extensively and heartily recommend (for a discussion see this article I wrote). Although they don’t have a deleterious effect on stack size, they do make it quite difficult for analysis programs to track what is going on. Indeed PC-Lint cannot handle pointers to functions when it comes to computing stack usage. Thus if you use an analytical approach and you use pointers to functions, then make absolutely sure that the analysis program can track all the indirect calls.

Optimizers

Code optimization can play havoc with the stack usage. Some optimizations reduce stack usage (by e.g. placing function parameters in registers), while others can increase stack usage. I should note that it’s only third party tools that should be bamboozled by the optimizer. The linker that makes up part of the compilation package should be aware of everything that the compiler has done.

Complexity

Even if you have a linker that will compute stack usage, interpreting the output of the linker is always a daunting task. For example, the linker from IAR will compute your stack usage. However, it isn’t nice enough to simply say: You need 279 bytes of stack space. Instead you have to study the linker output carefully to glean the requisite information.

A Practical Approach

It’s clear from the above that it isn’t easy to determine the stack size for an application. So how exactly does one set about this in practice? Well here’s what I do.

  1. Locate the stack at the beginning / end of memory (depending upon how the stack grows) and place all variables at the other end of the memory. This essentially means that you are implicitly allowing the maximum amount of memory possible for the stack. Note that many good compilers / linkers will do this automatically for you.
  2. As a starting point, I allocate 10% of the available memory for stack use. If I know I will be using functions that are huge users of the stack (such as printf, scanf and their brethren), then I’ll typically set it to 20% of available memory.
  3. I set up the debug environment from day 1 to monitor and report stack usage. This way as I progress through the development process I get a very good feel for the application’s stack consumption. This also helps in spotting changes to the code that have big impacts on the required stack size.
  4. Once I have ‘all the code written’, I start to make use of the information in the linker report. The more tight I am on memory, the closer I examine the linker output. In particular, what I often find is that there is one and only one function call chain that leads to a stack usage that is much greater than all the other call chains. In which case, I look to see if I can restructure that call chain so as to bring the maximum stage usage more in line with the typical stack usage.

If you stumbled upon this blog courtesy of a search engine, then I hope you found the above useful. I invite you to check out some of my other posts, which you may find useful. If you are a regular reader, then as always, thanks for stopping by.

Home

Effective C Tips #2 – Defining buffer sizes

February 22nd, 2009 by Nigel Jones

This is the second in a series of tips on writing what I call effective C. Today I’m addressing something that just about every embedded system has – a buffer whose length is a power of two.

In order to make many buffer operations more efficient, it is common practice to make the buffer size a power of two so that simple masking operations may be performed on them, rather than explicit length checks. This is particularly true of communications buffers where data are received under interrupt. As a result, it is common to see code that looks something like this:

#define RX_BUF_SIZE (32)
static uint8_t Rx_Buf[RX_BUF_SIZE]; /* Receive buffer */

__interrupt void RX_interrupt(void)
{
 static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
 uint8_t rx_char;

 rx_char = HW_REG;          /* Get the received character */

 RxHead &= RX_BUF_SIZE - 1; /* Mask the offset into the buffer */
 Rx_Buf[RxHead] = rx_char;  /* Store the received char */
 ++RxHead;                  /* Increment offset */
}

The first thing I do to make this code more flexible, is to allow the size of the buffer to be overridden on the command line. Thus my declaration for the buffer size now looks like this:

#ifndef RX_BUF_SIZE
 #define RX_BUF_SIZE (32)
#endif

This is a useful extension because it allows me to control the resources used by the code without having to edit the code per se. However, this flexibility comes at a cost. What happens if someone was to inadvertently pass a non power of 2 buffer size on the command line? Well as it stands – disaster. However, the fix is quite easy.

#ifndef RX_BUF_SIZE
 #define RX_BUF_SIZE (32)
#endif
#define RX_BUF_MASK  (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
 #error Rx buffer size is not a power of 2
#endif

What I’ve done is define another manifest constant, RX_BUF_MASK to be equal to one less than the buffer size. I then test using a bit-wise AND of the two manifest constants. If the result is non zero, then evidently the buffer size is not a power of two and compilation is halted by use of the #error statement. If you aren’t familiar with the #error statement, you’ll find this article I wrote a few years back to be helpful.

Although this is evidently a big improvement, it still isn’t quite good enough. To see, why, consider what happens if RX_BUF_SIZE is zero. Zero is of course a power of two, and so will pass the check. Now most C90 compliant compilers will complain about declaring an array with zero length. However this is legal in C99 compilers in general and GNU compilers in particular. Thus, we also need to protect against this case. Furthermore as Yevheniy was kind enough to point out in the comments, we also have to protect against a buffer size of 1 (as 1 & 0 = 0). So we now get:

#ifndef RX_BUF_SIZE
 #define RX_BUF_SIZE (32)
#endif
#if RX_BUF_SIZE < 2
 #error Rx buffer must be a minimum length of 2
#endif
#define RX_BUF_MASK  (RX_BUF_SIZE - 1)
#if ( RX_BUF_SIZE & RX_BUF_MASK )
 #error Rx buffer size is not a power of 2
#endif

As a final comment, note that the definition of RX_BUF_MASK has an additional benefit in that it can be used in the mask operation in place of (RX_BUF_SIZE – 1), so that my interrupt handler now becomes:

__interrupt void RX_interrupt(void)
{
 static uint8_t RxHead = 0; /* Offset into Rx_Buf[] where next character should be written */
 uint8_t rx_char;

 rx_char = HW_REG;          /* Get the received character */

 RxHead &= RX_BUF_MASK;     /* Mask the offset into the buffer */
 Rx_Buf[RxHead] = rx_char;  /* Store the received char */
 ++RxHead;                  /* Increment offset */
}

So is this effective C? I think so. It’s efficient, it’s flexible and its robustly protected against the sorts of bone headed mistakes that we all make from time to time.

Next Effective C Tip

Previous Effective C Tip

Home

Efficient C Tips #6 – Don’t use the ternary operator

February 18th, 2009 by Nigel Jones

I have to confess that I like the ternary operator. K&R obviously liked it, as it is heavily featured in their seminal work. However after running experiments on a wide range of compilers I have concluded that with the optimizer turned on, you are better off with a simple if-else statement. Thus next time you write something like this:

y = (a > b) ? c : d;

be aware that as inelegant as it is in comparison, this will usually compile to better code:

if (a > b)
{
 y = c;
}
else
{
 y = d;
}

I find this frustrating, as I’ve consumed 8 lines doing what is more easily and elegantly performed in 1 line.

I can’t say that I have any particular insight as to why the ternary operator performs so poorly. Perhaps if there is a compiler writer out there, they could throw some light on the matter?

Next Tip
Previous Tip
Home

Horner’s rule addendum

February 15th, 2009 by Nigel Jones

A few weeks ago I wrote about using Horner’s rule to evaluate polynomials. Well today I’m following up on this posting because I made a classic mistake when I implemented it. On the premise that one learns more from one’s mistakes than one’s successes, I thought I’d share it with you. First, some background. I had some experimental data on the behavior of a sensor against temperature. I needed to be able to fit a regression curve through the data, and so after some experimentation I settled on a quadratic polynomial fit. This is what the data and the curve looked like:

On the face of it, everything looks OK. However, if you look carefully, you will notice two things:

  • The bulk of the experimental data cover the temperature range of 5 – 48 degrees.
  • There is a very slight hook on the right hand side of the graph

So where’s the mistake? Well actually I made two mistakes:

  • I assumed that my experimental data covered the entire expected operating temperature range.
  • I failed to check at run time that the temperature was indeed bounded to the experimental input range.

Why is this important? Well, what happened, was that in some circumstances the sensor would experience temperatures somewhat higher than I expected when the experimental data was gathered, e.g. 55 degrees. Well that doesn’t sound too bad – until you take the polynomial and extend it out a bit. This is what it looks like:

You can see that at 55 degrees, the polynomial generates a value which is about the same as at 25 degrees. Needless to say, things didn’t work too well! So what advice can I offer?

  • Ensure that when fitting a polynomial to experimental data, that the experimental data covers all the possible range of values that can be physically realized.
  • Always plot the polynomial to see how it performs outside your range of interest. In particular, if it ‘takes off’ in a strange manner, then treat it very warily.
  • At run time, ensure that the data that you are feeding into the polynomial is bounded to the range over which the polynomial is known to be valid.

The maddening thing about this for me, was that I ‘learned’ this lesson about polynomial fits many years ago. I just chose to ignore it this time. Before I leave this topic, I’d like to offer one other insight. If you search for Horner’s rule, you’ll find a plethora of articles. The more detailed ones will opine on topics such as evaluation stability, numeric overflow issues and so on. However, it’s rare that you’ll find this sort of information on polynomial evaluation posted. I think it’s because we tend to get wrapped up in the details of the algorithm while losing sight of the underlying mathematics of what is going on. The bottom line, the next time you find a neat algorithm posted on the web for ‘solving’ your problem, take a big step back and think hard about what is really going on and what are the inherent weaknesses in what you are doing. Home

Effective C Tips #1 – Using vsprintf()

February 10th, 2009 by Nigel Jones

I’ve been running a series of tips on Efficient C for a while now. I thought I’d broaden the scope by also offering a series of tips on what I call Effective C. These will be tips that while not necessarily allowing you to write tighter code, will allow you to write better code. I’m kicking the series off on the rarely used standard library function, vsprintf(). First, some preamble…

One of the perverse things I tend to do is look through the C standard library and examine functions that on the face of it seem, well, useless. I do this because I think the folks that worked on this stuff were in general very smart and thus had a very good reason for including some of these ‘weird’ functions. One of these is the function ‘vsprintf’. If you go and look up the definition of this function, e.g. here , then you’ll find a rather brain ache inducing description. Now back when I was a lad I’d look at descriptions such as this and simply shrug and walk away. However, about ten years ago I started to make a concerted effort to see if a function such as vsprintf has a real benefit in embedded systems. Here’s what I discovered in this case:

If you are working on a product that contains a VFD or LCD, then you will almost certainly have code that contains a function for writing a string to the display at a specified position. For example:

static void display_Write(uint8_t row, uint8_t col, char const * buf)
{
 /* Send formatted string to display - hardware dependent*/
}

Then you will also have a plethora of functions that essentially do the same thing. That is accept some data, allocate a buffer on the stack, use sprintf to write formatted data into the buffer, and then call the function that actually writes the buffer to the display at the required position. Here’s some examples:

void display_Temperature(float ambient_temperature)
{
 char buf[10;

 sprintf(buf,"%5.2f", ambient_temperature);
 display_Write(6, 8, buf);
}

...

void display_Time(int hours, int minutes, int seconds)
{
 char buf [12];

 sprintf(buf,"%02d:%02d:%02d", hours, minutes, seconds);
 display_Write(3, 9, buf);
}

There’s nothing really wrong with this approach. However, there is a better way, courtesy of vsprintf().

What one does is to modify display_Write() to take a variable length argument list. Then within display_Write() use vsprintf() to process the variable length argument list and to generate the requisite string. The basic structure for the function is as follows:

void display_Write(uint8_t row, uint8_t column, char const * format, ...)
{
 va_list  args;
 char  buf[MAX_STR_LEN];

 va_start(args, format);
 vsprintf(buf, format, args); /* buf contains the formatted string */

 /* Send formatted string to display - hardware dependent */

 va_end(args);                /* Clean up. Do NOT omit */
}

My objective here is not to explain how to use variadic arguments or indeed how vsprintf() works – there are dozens of places on the web that will do that. Instead I’m interested in showing you the benefit of this approach. The display_Write() function has evidently become more complex; however the functions that call display_Write have become dramatically simplified, as they are now just:

void display_Temperature(float ambient_temperature)
{
 display_Write(6, 8, "%5.2f", ambient_temperature);
}

void display_Time(int hours, int minutes, int seconds)
{
 display_Write(3, 9, "%02d:%02d:%02d", hours, minutes, seconds);
}

Is this more Effective code? I think so, for the following reasons.

  • The higher level functions are now much cleaner and easier to follow.
  • All the heavy lifting is localized in one place, which typically dramatically reduces the probability of errors.

Finally, you’ll typically end up with a nice reduction in code size (even though this wasn’t my objective). All in all, not bad for one obscure function.

Next Tip
Home