Posts Tagged ‘volatile’

A volatile tempest

Monday, September 27th, 2010 Nigel Jones

Regular readers will know that I often comment on the use of volatile in embedded systems. As a result I am occasionally contacted about my opinion on whether a compiler is generating correct code – particularly when hardware is being accessed. Well I was contacted last week by Ratish Punoose who had a classic problem in that his code compiled okay on GCC but not on IAR. He had contacted IAR, who in turn basically said the compiler is correct – and here is the explanation. Ratish turned to me and John Regehr for our opinions. Well John and I came to similar opinions – namely:

  1. Ratish’s code was a bit weird, but not dramatically so.
  2. The explanation from IAR made no sense.
  3. It did indeed appear to be a compiler bug.

Ratish then posted his issue to the Msp430 forum on Yahoo. You can read his post and the responses here.

I’m sure many of you are at this point thinking that IAR is in for another round of bashing from me. Well you’d be wrong. One of the first responders to Ratish’s post was Paul Curtis of Rowley compilers. Paul gives an admirable explanation as to why Ratish’s code is wrong (and by extension so am I). Now I’m sure that IAR and Rowley are fierce competitors, and so Paul is also to be commended for leaping to the defense of IAR.

Furthermore, later in the thread Anders Lindgren of IAR chimes in and adds his detailed and compelling explanation.

Having read the posts from Paul and Anders I think they are right and I’m wrong. So thanks Gentlemen for:

  1. Setting me straight
  2. Proving that in the wonderful world of volatile accesses, there is always something more to learn.

I think there are several other lessons to be learned from this episode. However I think I’ll save them for another post.

Reading a register for its side effects in C and C++

Monday, March 15th, 2010 Nigel Jones

Although today’s post is the first real post on the new EmbeddedGurus, it’s special for another reason. This post is being jointly written with John Regehr. John is an Associate Professor of Computer Science at the University of Utah and maintains an excellent blog, Embedded in Academia which I heartily recommend. This blog posting grew out of a lengthy email exchange which started with John alerting me to some blatant plagiarism of my work and then evolved (dissolved?) into what you find here. John is also posting this article on his blog.

Anyway, enough preamble, on to the topic at hand.

Once in awhile one finds oneself having to read a device register, but without needing nor caring what the value of the register is. A typical scenario is as follows. You have written some sort of asynchronous communications driver. The driver is set up to generate an interrupt upon receipt of a character. In the ISR, the code first of all examines a status register to see if the character has been received correctly (e.g. no framing, parity or overrun errors). If an error has occurred, what should the code do? Well, in just about every system we have worked on, it is necessary to read the register that contains the received character — even though the character is useless. If you don’t perform the read, then you will almost certainly get an overrun error on the next character. Thus you find yourself in the position of having to read a register even though its value is useless. The question then becomes, how does one do this in C? In the following examples, assume that SBUF is the register holding the data to be discarded and that SBUF is understood to be volatile. The exact semantics of the declaration of SBUF vary from compiler to compiler.

If you are programming in C and if your compiler correctly supports the volatile qualifier, then this simple code suffices:

void cload_reg1 (void)

This certainly looks a little strange, but it is completely legal C and should generate the requisite read, and nothing more. For example, at the -Os optimization level, the MSP430 port of GCC gives this code:

    mov &SBUF, r15

Unfortunately, there are two practical problems with this C code. First, quite a few C compilers incorrectly translate this code, although the C standard gives it an unambiguous meaning. We tested the code on a variety of general-purpose and embedded compilers, and present the results below. These results are a little depressing.

The second problem is even scarier. The problem is that the C++ standard is not 100% clear about what the code above means. On one hand, the standard says this:

In general, the semantics of volatile are intended to be the same in C++ as they are in C.

A number of C++ compilers, including GCC and LLVM, generate the same code for cload_reg1() when compiling in C++ mode as they do in C mode. On the other hand, several high-quality C++ compilers, such as those from ARM, Intel, and IAR, turn the function cload_reg1() into object code that does nothing. We discussed this issue with people from the compiler groups at Intel and IAR, and both gave essentially the same response. Here we quote (with permission) from the Intel folks:

The operation that turns into a load instruction in the executable code is what the C++ standard calls the lvalue-to-rvalue conversion; it converts an lvalue (which identifies an object, which resides in memory and has an address) into an rvalue (or just value; something whose address can’t be taken and can be in a register). The C++ standard is very clear and explicit about where the lvalue-to-rvalue conversion happens. Basically, it happens for most operands of most operators – but of course not for the left operand of assignment, or the operand of unary ampersand, for example. The top-level expression of an expression statement, which is of course not the operand of any operator, is not a context where the lvalue-to-rvalue conversion happens.

In the C standard, the situation is somewhat different. The C standard has a list of the contexts where the lvalue-to-rvalue conversion doesn’t happen, and that list doesn’t include appearing as the expression in an expression-statement.

So we’re doing exactly what the various standards say to do. It’s not a matter of the C++ standard allowing the volatile reference to be optimized away; in C++, the standard requires that it not happen in the first place.

We think the last sentence sums it up beautifully. How many readers were aware that the semantics for the volatile qualifier are significantly different between C and C++? The additional implication is that as shown below, GCC, the Microsoft compiler, and Open64, when compiling C++ code, are in error.

We asked about this on the GCC mailing list and received only one response which was basically “Why should we change the semantics, since this will break working code?” This is a fair point. Frankly speaking, the semantics of volatile in C are a bit of mess and C++ makes the situation much worse by permitting reasonable people to interpret it in two totally different ways.

Experimental Results

To test C and C++ compilers, we compiled the following two functions to object code at a reasonably high level of optimization:

extern volatile unsigned char foo;
void cload_reg1 (void)
void cload_reg2 (void)
   volatile unsigned char sink;
   sink = foo;

For embedded compilers that have built-in support for accessing hardware registers, we tested two additional functions where as above, SBUF is understood to be a hardware register defined by the semantics of the compiler under test:

void cload_reg3 (void)

void cload_reg4 (void)
   volatile unsigned char sink;
   sink = SBUF;

The results were as follows.


We tested version 4.4.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() . No warnings are generated. The C++ compiler shows the same behavior as the C compiler.

Intel Compiler

We tested icc version 11.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler emits code loading from foo for both cload_reg1() and cload_reg2(), without giving any warnings. The C++ compiler emits a warning “expression has no effect” for cload_reg1() and this function does not load from foo. cload_reg2() does load from foo and gives no warnings.

Sun Compiler

We tested suncc version 5.10, hosted on x86 Linux and also targeting x86 Linux, using optimization level -O. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.


We tested opencc version 4.2.3, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

LLVM / Clang

We tested subversion rev 98508, which is between versions 2.6 and 2.7, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() .
A warning about unused value is generated for cload_reg1(). The C++ compiler shows the same behavior as the C compiler.

CrossWorks for MSP430

We tested version, hosted on x86 Linux, using optimization level -O. This compiler supports only C. foo was not loaded in cload_reg1(), but it was loaded in cload_reg2().


We tested version, hosted on Windows XP, using maximum speed optimization. The C compiler performed the load in all four cases. The C++ compiler did not perform the load for cload_reg1() or cload_reg3(),
but did for cload_reg2() and cload_reg4().

Keil 8051

We tested version 8.01, hosted on Windows XP, using optimization level 8, configured to favor speed. The Keil compiler failed to generate the required load in cload_reg1() (but did give at least give a warning), yet did perform the load in all other cases including cload_reg3() suggesting that for the Keil compiler, its IO register (SFR) semantics are treated differently to volatile variable semantics.


We tested version 9.70, hosted on Windows XP, using Global optimization level 9, configured to favor speed. This was very interesting in that the results were almost a mirror image to the Keil compiler. In this case the load was performed in all cases except cload_reg3(). Thus the HI-TECH semantics for IO registers and volatile variables also appears to be different – just the opposite to Keil! No warnings was generated by the Hi-TECH compiler when it failed to generate code.

Microchip Compiler for PIC18

We tested version 3.35, hosted on Windows XP, using full optimization level. This rounded out the group of embedded compilers quite nicely in that it didn’t perform the load in either cload_reg1() or cload_reg3() – but did in the rest. It also failed to warn about the statements having no effect. This was the worst performing of all the compilers we tested.


The level of non-conformance with the C compilers, together with the genuine uncertainty as to what the C++ compilers should do provides a real quandary. If you need the most efficient code possible, then you have no option other than to investigate what your compiler does. If you are looking for a generally reliable and portable solution, then the methodology in cload_reg2() is probably your best bet. However it would be just that: a bet. Naturally, we (and the other readers of this blog) would be very interested to hear what your compiler does. So if you have a few minutes, please run the sample code through your compiler and let us know the results.


We’d like to thank Hans Boehm at HP, Arch Robison at Intel, and the compiler groups at both Intel and IAR for their valuable feedback that helped us construct this post. Any mistakes are, of course, ours.

Using volatile to achieve persistence!

Sunday, January 11th, 2009 Nigel Jones

Once in a while the real world and the arcane world of language standards collide, resulting in surprising results. To see what I mean, read on …

Many of the products I design incorporate a Bootstrap Loader, so that the application firmware may be updated in the field. In most cases, the bootstrap loader is a completely different program to the main application. Despite this, I find it useful for the main application to pass information to the bootstrap loader and vice versa. Thus the question arises, how best to do this? Well in the processor family I am using, although it is technically possible to store information in Flash, EEPROM or RAM, by far the easiest and most secure way of doing it is to place the information into EEPROM. Furthermore, in order to enter the bootstrap loader it is highly desirable to force a reset of the processor by allowing the watchdog timer to time out.

Thus, the code to enter the bootstrap loader looks something like this:

__eeprom uint8_t msg_for_bootloader;


msg_for_bootloader = 0x42;


 /* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */

Well, on the face of it, there is not much wrong with this code. However, if one turns on the optimizer, then the compiler examines the code, decides that no code may be executed beyond the infinite loop and thus concludes that the write to msg_for_bootloader is pointless, and promptly optimizes it away. (For a discussion on this topic, see my posting here)

Now you will note that msg_for_bootloader was qualified with __eeprom. This is a compiler extension that allows one to inform the compiler that the variable msg_for_bootloader resides in a special memory space and to be treated accordingly. Now I know that the compiler knows enough about the EEPROM space to generate the correct coding sequences such that reads and writes are performed correctly. However, in my naivete, I also assumed that the compiler knew something about the properties of EEPROM, such that it would realize writing to EEPROM without ostensibly reading it again is intrinsically useful in many applications.

Well it does not. Furthermore, on balance I think the compiler writer’s got it right and the error was completely mine.

So what to do? Well, declaring msg_for_bootloader as volatile fixes the problem. Thus my code now looks like this:

__eeprom volatile uint8_t msg_for_bootloader;


msg_for_bootloader = 0x42;


 /* Wait for watchdog to generate a reset and force entry in to the bootstrap loader */

Thus I ended up in the rather bizarre situation of having to declare a variable as volatile in order to make it persistent!

Although I can appreciate the wry irony of this situation, I think it points to a larger problem. The fact is that we are all (ok, most of us) programming in a language (C) that was not designed for use in embedded systems. Indeed, when C was written, I’m not sure EEPROM even existed. As a result, the compiler vendors have added extensions to the C standard in an effort to overcome its shortcomings for embedded systems, while still desperately striving to achieve “full compliance with the standard”. Despite this, I find myself all too frequently falling into traps such as this one. What we really need is a language explicitly designed for embedded systems. It isn’t going to happen, but it doesn’t stop me wishing for it.