Archive for the ‘Compilers / Tools’ Category

Setting a bad example – part 2

Thursday, August 5th, 2010 Nigel Jones

This is the second part in a series of  articles in which I’m ‘highlighting’ what I consider to be lousy code provided by IAR in their NXP LPC1764 development kit.

The following function is taken from the LCD_Demo project.

/*************************************************************************
 * Function Name: Dly100us
 * Parameters: void *arg
 * Return: void
 *
 * Description: Delay [100us]
 *        
 *************************************************************************/
void Dly100us(void *arg)
{
volatile Int32U Dly = (Int32U)arg, Dly100;
 for(;Dly;Dly--)
  for(Dly100 = 500; Dly100; Dly100--);
}

An example of the invocation of this function is:

#define GLCD_RESET_DLY                       50     // 5ms
...
Dly100us((void *)GLCD_RESET_DLY);

Quite frankly there is so much to offend one about this function that I’m not too sure where to begin.

Software Delay Loops

Anyone that works on embedded systems works out very quickly that software delay loops are a disaster. Now for low end processors, it is possible to craft a working delay function provided one knows the CPU clock frequency. However in this case, we are dealing with a Cortex 3 processor complete with the usual programmable PLL clock multiplier. Throw in a programmable number of wait states for Flash access, a 3-stage pipeline and of course different compiler optimization settings and it quickly becomes almost impossible to craft a software delay loop that can withstand changes in its environment. The author of this code has made no attempt to accommodate these sorts of changes and as a result this code will fail as soon as almost anything gets changed.

What is particularly bizarre about this example is that the processor has a huge number of hardware timers available – and yet they are not used.

void * arguments

Examination of the code shows that the function is declared as taking a (void *) argument. In crude terms the function expects to receive a pointer to anything. However as the example invocation shows, it is actually being passed an integer constant that has been cast to (void *). Furthermore, within the function, the code then casts the (void *) argument to Int32U. One wonders why on earth Dly100us() isn’t simply defined to accept an Int32u argument? Furthermore, since I believe it’s legal to cast just about anything to (void *) the author has insured that almost any parameter coding construct will be accepted by the compiler – including egregious errors.

Formatting

Within the limits of HTML formatting I have attempted to recreate the indentation used by the author. That is the ‘volatile’ line starts in the left most column and subsequent lines are indented two spaces. There are no blank lines. Now while I’m aware that indentation and the use of white space is very much a question of personal preference IMHO the author’s style leads very much to be desired. In particular I found the code hard to read because of the lack of obvious indentation and also the lack of blank lines between e.g. the variable declaration and the actual code.

Loops without braces

Neither of the for loops is enclosed with braces. While I’m aware this is perfectly legal syntax, just about every coding standard I have ever seen, together with MISRA requires that all loops have braces. This isn’t a case of the language police enforcing their prejudices upon you. Countless studies have shown that requiring braces helps stop lots of problems. As such it’s the smart thing to do

Putting a semi-colon at the end of a for loop

In a similar vein to the previous remark, putting a semi-colon at the end of a for loop statement is just asking for trouble.  If you really can’t find the energy to add a couple of braces then at least do this:

for(Dly100 = 500; Dly100; Dly100--)
    ;

It’s ugly as anything – but at least its existence is obvious.

Poor function name

You will note that the function is called Dly100us(). This is a complete misnomer. The function is actually intended to delay N * 100us where N is the value of the argument passed to the function. Thus the function is incorrectly and misleadingly named.  Most coding standards require that functions have meaningful names.

Poor variable names

The author has used two variables that are very similarly named – Dly and Dly100, Maybe it’s just me, but I find code much harder to read when variable names are similar. Code that is hard to read is hard to maintain.

Poor variable data types

If we ignore the (void *) mess described above, it’s apparent that the argument to the function is intended to be an Int32U. However does this make sense? If the inner loop does indeed delay for 100 microseconds, then with an Int32U argument, the total delay possible is 2^32 * 100e-6 = 429496.7296 seconds or 119.3 hours. Personally in the rare case that I do use software delay loops, the delays are at most milliseconds – and not hours. As a result it seems to me that it would make a lot more sense if the argument to Dly100() was an 8 bit or possibly a 16 bit variable. By making this change you could help prevent stupid coding mistakes that result in ridiculous length delays.

What about the watchdog?

Most well designed embedded systems encompass a watchdog. This code as written completely ignores the implication of delaying beyond the watchdog interval. Thus a small change to the delay time could easily result in the system taking a watchdog reset.

Summary

All in all a dreadfully written function. I shudder to think that there are people out there using this code as a template for their systems.

Setting a bad example – part 1

Wednesday, August 4th, 2010 Nigel Jones

About a month ago I bought an ARM evaluation board from my favorite compiler vendor, IAR  Systems. Like most people I bought the board in part because it comes with a lot of example code showing how to configure the peripherals on the NXP LPC1768. Upon receiving the board, I  hooked it up and started browsing the example code that came with it. To my dismay I discovered a plethora of what I consider to be shoddy coding practices. Now I am well aware that one person’s shoddy practices is another person’s standard operating procedure, and so I intend to steer clear of areas that are just a matter of taste. Notwithstanding this, over the next few posts I will describe some of the things I found in the code – and more importantly why I think they are shoddy.

Before I describe the first example, I will comment on why I’m doing this. Three years ago, almost to the day, I wrote about the lousy code that hardware manufacturers supply as part of their application notes. When I wrote the post I had in the back of my mind the idea that hardware manufacturers aren’t really interested in code and so what should one expect? However when it comes to compiler vendors, what possible excuse do they have? To put it another way, where should engineers go to look for great examples of how to write embedded systems code if not the example code provided by compiler vendors? Do they not have a responsibility to publish code that uses best practices and sets a great example? Furthermore, even if they contracted the code out to some third party to write then I still think it is incumbent upon them to demand (and check) that the code is indeed the sort of code that they want to put their name on?

With all that being said, I will start with an example that is not particularly egregious!

Right associativity of the = operator

One of the lesser appreciated aspects of the C programming language is that the assignment (=) operator associates right to left. As a result the following is perfectly legal code.

uint16_t a,b,c;

a = b = c = 6;

This code has the effect of setting all three variables to the value of 6. With that as a preamble, I found that the author of the evaluation board code loves this construct – even when it is inappropriate. For example consider this excerpt.

/*************************************************************************
 * Function Name: GpioInit
 * Parameters: void
 * Return: void
 *
 * Description: Reset all GPIO pins to default: primary function
 *
 *************************************************************************/
void GpioInit(void)
{
 // Set to inputs
 FIO0DIR = \
 FIO1DIR = \
 FIO2DIR = \
 FIO3DIR = \
 FIO4DIR = 0;

 // clear mask registers
 FIO0MASK =\
 FIO1MASK =\
 FIO2MASK =\
 FIO3MASK =\
 FIO4MASK = 0;

 // Reset all GPIO pins to default primary function
 PINSEL0 =\
 PINSEL1 =\
 PINSEL2 =\
 PINSEL3 =\
 PINSEL4 =\
 PINSEL7 =\
 PINSEL8 =\
 PINSEL9 =\
 PINSEL10 = 0;
}

So why don’t I like this code? Well I have several complaints.  The main problem is that in my opinion one should only make use of the right associative property of the assignment operator if it is essential that multiple variables have the same starting value. That is, if I have a piece of code that can only possibly work if two variables start with the same value, then I will make use of this construct. For example:

uint16_t foo, bar;
foo = bar = 21;     /* foo and bar must be equal to each other and start at 21 */

If instead it is merely coincidental that two variables start with the same value, then I will put their assignment on separate lines.

uint16_t foo, bar;
foo = 14;
bar = 14;

In the IAR code there is no requirement that the various registers be locked together with a starting value.

My second complaint is that because there is no requirement that the various registers be locked together this code is now a pain to maintain. For example, consider the case where it became necessary to change the value of FIO2DIR to 0x12345678; One would now have to do this:

// Set to inputs
 FIO0DIR = \
 FIO1DIR = 0;
 FIO2DIR = 0x12345678;
 FIO3DIR = \
 FIO4DIR = 0;

In other words I would have to edit an unrelated line (FIO1DIR) – which is a recipe ripe for disaster. To drive home this point, consider the case where rather than having to edit FIO1DIR I wanted to change just FIO4DIR. I would say that the likelihood of someone doing this is very high:

 // Set to inputs
 FIO0DIR = \
 FIO1DIR = \
 FIO2DIR = \
 FIO3DIR = \
 FIO4DIR = 0x12345678;

In other words they might not notice the line continuation operator and simply change the value of the register they are interested in. The code would compile; the results would be disastrous.

While I am at it I will opine on the use of the line continuation operator.  While this exists for good reason, it almost never leads to easy to read, easy to maintain code. In short I never use it unless I absolutely have no choice. To use it in a casual manner such as this is not good.

As a final point, I am sure that many people that purchase the evaluation board are quite inexperienced. The GpioInit function excerpted above makes unnecessary use of two advanced constructs (right associativity and the line continuation character). Furthermore the use of these constructs doesn’t make the code better – it makes it worse! I am quite sure that a novice C programmer would look at this code and be baffled as to what is going on – possibly triggering a call to IAR technical support.

Tools to help lower power consumption

Tuesday, June 29th, 2010 Nigel Jones

Regular readers will know that low power designs are an interest of mine. Indeed one of the very first blog posts I made lamented how difficult it is to ascertain how much energy it takes to perform various tasks typical to an embedded system. Thus it was a pleasant surprise to receive an IAR newsletter today announcing a tool (‘Power debugging’) that is explicitly designed to help one lower a system’s power consumption. The tool isn’t available yet, but if the propaganda is to be believed it should be a very interesting adjunct to the debugging arsenal. The sign up procedure to beta test the tool doesn’t seem to work properly, but on the assumption that I made it onto the beta tester list I will  post a review once I get my hands on it.

BTW I have to admit I found the name of the article / tool (‘Power debugging’) a bit confusing in the sense that I interpreted power in the vernacular sense (e.g. ‘power walking’, ‘power breakfast’) rather than the engineering sense. I guess I’m just a victim of so much marketing hyperbole that I can’t recognize plain talk any more. Oh well!

Efficient C Tip #12 – Be wary of switch statements

Saturday, April 10th, 2010 Nigel Jones

This is the twelfth in a series of tips on writing efficient C for embedded systems. Like the previous topic, I suspect that this will be a bit controversial. As the title suggests, if you are interested in writing efficient C, you need to be wary of switch statements. Before I explain why, a little background will be useful. I did all of my early embedded systems programming in assembly language. This wasn’t out of some sense of machismo, it was simply a reflection of the fact that there were no high level languages available (with the possible exception of PL/M). Naturally as well as programming embedded systems I also did computer programming, initially in Pascal and BASIC, and later in C. One of the major differences I found in using the HLL was the wonderful switch / case statement. I found it to be a beautiful tool – with a few lines of source code I could do all sorts of powerful things that were simply very difficult or tedious to do in assembly language. Fast forward a number of years and C compilers began to become available for small embedded systems and so I naturally started using them, together with of course the attendant switch statement. All was well in paradise until the day I used a switch statement in an interrupt service routine and found to my horror that the ISR was taking about ten times longer to execute than I thought was reasonable.

This precipitated an investigation into how exactly switch statements are implemented by the compiler. When I did this, I discovered a number of things that should give one pause.

Heuristic Algorithms

The first thing I discovered is that compilers typically have a number of ways of implementing a switch statement. They seem to be loosely divided into the following trichotomy:

  1. An if-else-if-else-if chain. In this implementation, the switch statement is treated as syntactic sugar for an if-else-if chain.
  2. Some form of jump or control tables, or as they are sometimes called a computed goto. This is a favorite technique of assembly language programmers and the compiler writers can use it to great effect.
  3. A hybrid of 1 & 2.

Where it gets interesting is how the compiler decides which approach to use. If the case values are contiguous (e.g. zero through ten), then it’s likely the compiler will use some form of jump table. Conversely if the case values are completely disjointed (e.g. zero, six, twenty, four hundred and a thousand) then an if-else implementation is likely. However what does the compiler do when, for example, you have a bifurcated set of ranges such as zero-ten and ninety – one hundred? Well the answer is, that each compiler seems to have some form of heuristic algorithm for determining what is the ‘best’ way of implementing a given set of cases. Although some compilers allow you to force a particular implementation, for the most part you are at the mercy of the compiler.

Comparative Execution Speeds

If you think about it, it should become apparent that a jump table approach is likely to give a highly consistent time of execution through the decision tree, whereas the if-else -if chain has a highly variable time of execution depending upon the particular value of the switched variable.  Notwithstanding this, the jump table approach has a certain amount of execution overhead associated with it. This means that although its  mean execution time (which is normally the same as its worst and best execution time) may be dramatically better than the mean execution time of the if-else-if chain, the if-else-if chain’s best execution time may be considerably better. So what you say! Well in some cases, a particular value is far more likely to occur than the other values, thus it would be very nice if this value was tested first. However, as you will now see, this isn’t guaranteed…

Order of Execution

For many years I wrote switch statements under the assumption that the case values would be evaluated from top to bottom. That is, if the compiler chose to implement the switch statement as an if-else-if chain, then it would first test the first case, then the second case and so on down to the default case at the bottom of my source code. Well it turns out that my assumption was completely wrong. The compiler is under no such obligation, and indeed will often evaluate the values bottom to top. Furthermore, the compiler will often evaluate the default value first. For example consider a defaulted switch statement with contiguous case values in the range zero to ten. If the index variable is an unsigned int, then there are at least 65525 possible values handled by the default case, and so it makes sense to eliminate them first. Now if you know that the index variable can only possibly take on the values zero to ten, then you can of course eliminate the default statement – and then get excoriated by the coding standards / MISRA folks.

Maintenance

This is the area where I really get worried. Consider the case where you have a switch statement in an ISR. The code is working with no problems until one day it is necessary to make a change to the switch statement – by for example adding an additional case value. This simple change can cause the compiler to completely change the implementation of the switch statement. As a result, you may find that:

  1. The worst case execution time has jumped dramatically.
  2. The mean execution time has jumped dramatically.
  3. The stack space required by the ISR has jumped dramatically.

Any of these three possibilities can cause your program to fail catastrophically. Now of course one could argue ‘that’s why you test all changes’. However, in my opinion it’s far better to be proactive and to avoid putting yourself in this situation in the first place.

I’d also be remiss in not noting the dreaded missing break statement maintenance problem. However as a religious user of Lint, I’m not normally too concerned about this.

Switch statement alternatives

If performance and stability is your goal then I strongly recommend that you implement your code, the way you want it executed. This means either explicitly use an if-else-if chain or use function pointers. If function pointers scare you, then you might want to read this article I wrote on the subject.

Recommendations

Based on my experience, I have a number of things that I do when it comes to switch statements. If you find my analysis compelling, you may want to adopt them:

  1. Switch statements should be the last language construct you reach for – and not the first.
  2. Learn how to use function pointers. Once you do you’ll find a lot of the reasons for using switch statements go away.
  3. Try to keep all the case values contiguous.
  4. If you can’t keep the case values contiguous, go to the other extreme and make them disparate – that way you are less likely to have the compiler change the algorithm on you.
  5. If your compiler supports it, consider using pragmas to lock in a particular implementation.
  6. Be very wary of using switch statements in interrupt service routines or any other performance critical code.
  7. Use Lint to guard against missing break statements.

Comments welcome!

Next Tip

Previous Tip

Home

Reading a register for its side effects in C and C++

Monday, March 15th, 2010 Nigel Jones

Although today’s post is the first real post on the new EmbeddedGurus, it’s special for another reason. This post is being jointly written with John Regehr. John is an Associate Professor of Computer Science at the University of Utah and maintains an excellent blog, Embedded in Academia which I heartily recommend. This blog posting grew out of a lengthy email exchange which started with John alerting me to some blatant plagiarism of my work and then evolved (dissolved?) into what you find here. John is also posting this article on his blog.

Anyway, enough preamble, on to the topic at hand.

Once in awhile one finds oneself having to read a device register, but without needing nor caring what the value of the register is. A typical scenario is as follows. You have written some sort of asynchronous communications driver. The driver is set up to generate an interrupt upon receipt of a character. In the ISR, the code first of all examines a status register to see if the character has been received correctly (e.g. no framing, parity or overrun errors). If an error has occurred, what should the code do? Well, in just about every system we have worked on, it is necessary to read the register that contains the received character — even though the character is useless. If you don’t perform the read, then you will almost certainly get an overrun error on the next character. Thus you find yourself in the position of having to read a register even though its value is useless. The question then becomes, how does one do this in C? In the following examples, assume that SBUF is the register holding the data to be discarded and that SBUF is understood to be volatile. The exact semantics of the declaration of SBUF vary from compiler to compiler.

If you are programming in C and if your compiler correctly supports the volatile qualifier, then this simple code suffices:

void cload_reg1 (void)
{
   SBUF;
}

This certainly looks a little strange, but it is completely legal C and should generate the requisite read, and nothing more. For example, at the -Os optimization level, the MSP430 port of GCC gives this code:

cload_reg1:
    mov &SBUF, r15
    ret

Unfortunately, there are two practical problems with this C code. First, quite a few C compilers incorrectly translate this code, although the C standard gives it an unambiguous meaning. We tested the code on a variety of general-purpose and embedded compilers, and present the results below. These results are a little depressing.

The second problem is even scarier. The problem is that the C++ standard is not 100% clear about what the code above means. On one hand, the standard says this:

In general, the semantics of volatile are intended to be the same in C++ as they are in C.

A number of C++ compilers, including GCC and LLVM, generate the same code for cload_reg1() when compiling in C++ mode as they do in C mode. On the other hand, several high-quality C++ compilers, such as those from ARM, Intel, and IAR, turn the function cload_reg1() into object code that does nothing. We discussed this issue with people from the compiler groups at Intel and IAR, and both gave essentially the same response. Here we quote (with permission) from the Intel folks:

The operation that turns into a load instruction in the executable code is what the C++ standard calls the lvalue-to-rvalue conversion; it converts an lvalue (which identifies an object, which resides in memory and has an address) into an rvalue (or just value; something whose address can’t be taken and can be in a register). The C++ standard is very clear and explicit about where the lvalue-to-rvalue conversion happens. Basically, it happens for most operands of most operators – but of course not for the left operand of assignment, or the operand of unary ampersand, for example. The top-level expression of an expression statement, which is of course not the operand of any operator, is not a context where the lvalue-to-rvalue conversion happens.

In the C standard, the situation is somewhat different. The C standard has a list of the contexts where the lvalue-to-rvalue conversion doesn’t happen, and that list doesn’t include appearing as the expression in an expression-statement.

So we’re doing exactly what the various standards say to do. It’s not a matter of the C++ standard allowing the volatile reference to be optimized away; in C++, the standard requires that it not happen in the first place.

We think the last sentence sums it up beautifully. How many readers were aware that the semantics for the volatile qualifier are significantly different between C and C++? The additional implication is that as shown below, GCC, the Microsoft compiler, and Open64, when compiling C++ code, are in error.

We asked about this on the GCC mailing list and received only one response which was basically “Why should we change the semantics, since this will break working code?” This is a fair point. Frankly speaking, the semantics of volatile in C are a bit of mess and C++ makes the situation much worse by permitting reasonable people to interpret it in two totally different ways.

Experimental Results

To test C and C++ compilers, we compiled the following two functions to object code at a reasonably high level of optimization:

extern volatile unsigned char foo;
void cload_reg1 (void)
{
   foo;
}
void cload_reg2 (void)
{
   volatile unsigned char sink;
   sink = foo;
}

For embedded compilers that have built-in support for accessing hardware registers, we tested two additional functions where as above, SBUF is understood to be a hardware register defined by the semantics of the compiler under test:

void cload_reg3 (void)
{
   SBUF;
}

void cload_reg4 (void)
{
   volatile unsigned char sink;
   sink = SBUF;
}

The results were as follows.

GCC

We tested version 4.4.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() . No warnings are generated. The C++ compiler shows the same behavior as the C compiler.

Intel Compiler

We tested icc version 11.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler emits code loading from foo for both cload_reg1() and cload_reg2(), without giving any warnings. The C++ compiler emits a warning “expression has no effect” for cload_reg1() and this function does not load from foo. cload_reg2() does load from foo and gives no warnings.

Sun Compiler

We tested suncc version 5.10, hosted on x86 Linux and also targeting x86 Linux, using optimization level -O. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

x86-Open64

We tested opencc version 4.2.3, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

LLVM / Clang

We tested subversion rev 98508, which is between versions 2.6 and 2.7, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() .
A warning about unused value is generated for cload_reg1(). The C++ compiler shows the same behavior as the C compiler.

CrossWorks for MSP430

We tested version 2.0.8.2009062500.4974, hosted on x86 Linux, using optimization level -O. This compiler supports only C. foo was not loaded in cload_reg1(), but it was loaded in cload_reg2().

IAR for AVR

We tested version 5.30.6.50191, hosted on Windows XP, using maximum speed optimization. The C compiler performed the load in all four cases. The C++ compiler did not perform the load for cload_reg1() or cload_reg3(),
but did for cload_reg2() and cload_reg4().

Keil 8051

We tested version 8.01, hosted on Windows XP, using optimization level 8, configured to favor speed. The Keil compiler failed to generate the required load in cload_reg1() (but did give at least give a warning), yet did perform the load in all other cases including cload_reg3() suggesting that for the Keil compiler, its IO register (SFR) semantics are treated differently to volatile variable semantics.

HI-TECH for PIC16

We tested version 9.70, hosted on Windows XP, using Global optimization level 9, configured to favor speed. This was very interesting in that the results were almost a mirror image to the Keil compiler. In this case the load was performed in all cases except cload_reg3(). Thus the HI-TECH semantics for IO registers and volatile variables also appears to be different – just the opposite to Keil! No warnings was generated by the Hi-TECH compiler when it failed to generate code.

Microchip Compiler for PIC18

We tested version 3.35, hosted on Windows XP, using full optimization level. This rounded out the group of embedded compilers quite nicely in that it didn’t perform the load in either cload_reg1() or cload_reg3() – but did in the rest. It also failed to warn about the statements having no effect. This was the worst performing of all the compilers we tested.

Summary

The level of non-conformance with the C compilers, together with the genuine uncertainty as to what the C++ compilers should do provides a real quandary. If you need the most efficient code possible, then you have no option other than to investigate what your compiler does. If you are looking for a generally reliable and portable solution, then the methodology in cload_reg2() is probably your best bet. However it would be just that: a bet. Naturally, we (and the other readers of this blog) would be very interested to hear what your compiler does. So if you have a few minutes, please run the sample code through your compiler and let us know the results.

Acknowledgments

We’d like to thank Hans Boehm at HP, Arch Robison at Intel, and the compiler groups at both Intel and IAR for their valuable feedback that helped us construct this post. Any mistakes are, of course, ours.
Home