embedded software boot camp

Reading a register for its side effects in C and C++

Monday, March 15th, 2010 by Nigel Jones

Although today’s post is the first real post on the new EmbeddedGurus, it’s special for another reason. This post is being jointly written with John Regehr. John is an Associate Professor of Computer Science at the University of Utah and maintains an excellent blog, Embedded in Academia which I heartily recommend. This blog posting grew out of a lengthy email exchange which started with John alerting me to some blatant plagiarism of my work and then evolved (dissolved?) into what you find here. John is also posting this article on his blog.

Anyway, enough preamble, on to the topic at hand.

Once in awhile one finds oneself having to read a device register, but without needing nor caring what the value of the register is. A typical scenario is as follows. You have written some sort of asynchronous communications driver. The driver is set up to generate an interrupt upon receipt of a character. In the ISR, the code first of all examines a status register to see if the character has been received correctly (e.g. no framing, parity or overrun errors). If an error has occurred, what should the code do? Well, in just about every system we have worked on, it is necessary to read the register that contains the received character — even though the character is useless. If you don’t perform the read, then you will almost certainly get an overrun error on the next character. Thus you find yourself in the position of having to read a register even though its value is useless. The question then becomes, how does one do this in C? In the following examples, assume that SBUF is the register holding the data to be discarded and that SBUF is understood to be volatile. The exact semantics of the declaration of SBUF vary from compiler to compiler.

If you are programming in C and if your compiler correctly supports the volatile qualifier, then this simple code suffices:

void cload_reg1 (void)
{
   SBUF;
}

This certainly looks a little strange, but it is completely legal C and should generate the requisite read, and nothing more. For example, at the -Os optimization level, the MSP430 port of GCC gives this code:

cload_reg1:
    mov &SBUF, r15
    ret

Unfortunately, there are two practical problems with this C code. First, quite a few C compilers incorrectly translate this code, although the C standard gives it an unambiguous meaning. We tested the code on a variety of general-purpose and embedded compilers, and present the results below. These results are a little depressing.

The second problem is even scarier. The problem is that the C++ standard is not 100% clear about what the code above means. On one hand, the standard says this:

In general, the semantics of volatile are intended to be the same in C++ as they are in C.

A number of C++ compilers, including GCC and LLVM, generate the same code for cload_reg1() when compiling in C++ mode as they do in C mode. On the other hand, several high-quality C++ compilers, such as those from ARM, Intel, and IAR, turn the function cload_reg1() into object code that does nothing. We discussed this issue with people from the compiler groups at Intel and IAR, and both gave essentially the same response. Here we quote (with permission) from the Intel folks:

The operation that turns into a load instruction in the executable code is what the C++ standard calls the lvalue-to-rvalue conversion; it converts an lvalue (which identifies an object, which resides in memory and has an address) into an rvalue (or just value; something whose address can’t be taken and can be in a register). The C++ standard is very clear and explicit about where the lvalue-to-rvalue conversion happens. Basically, it happens for most operands of most operators – but of course not for the left operand of assignment, or the operand of unary ampersand, for example. The top-level expression of an expression statement, which is of course not the operand of any operator, is not a context where the lvalue-to-rvalue conversion happens.

In the C standard, the situation is somewhat different. The C standard has a list of the contexts where the lvalue-to-rvalue conversion doesn’t happen, and that list doesn’t include appearing as the expression in an expression-statement.

So we’re doing exactly what the various standards say to do. It’s not a matter of the C++ standard allowing the volatile reference to be optimized away; in C++, the standard requires that it not happen in the first place.

We think the last sentence sums it up beautifully. How many readers were aware that the semantics for the volatile qualifier are significantly different between C and C++? The additional implication is that as shown below, GCC, the Microsoft compiler, and Open64, when compiling C++ code, are in error.

We asked about this on the GCC mailing list and received only one response which was basically “Why should we change the semantics, since this will break working code?” This is a fair point. Frankly speaking, the semantics of volatile in C are a bit of mess and C++ makes the situation much worse by permitting reasonable people to interpret it in two totally different ways.

Experimental Results

To test C and C++ compilers, we compiled the following two functions to object code at a reasonably high level of optimization:

extern volatile unsigned char foo;
void cload_reg1 (void)
{
   foo;
}
void cload_reg2 (void)
{
   volatile unsigned char sink;
   sink = foo;
}

For embedded compilers that have built-in support for accessing hardware registers, we tested two additional functions where as above, SBUF is understood to be a hardware register defined by the semantics of the compiler under test:

void cload_reg3 (void)
{
   SBUF;
}

void cload_reg4 (void)
{
   volatile unsigned char sink;
   sink = SBUF;
}

The results were as follows.

GCC

We tested version 4.4.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() . No warnings are generated. The C++ compiler shows the same behavior as the C compiler.

Intel Compiler

We tested icc version 11.1, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler emits code loading from foo for both cload_reg1() and cload_reg2(), without giving any warnings. The C++ compiler emits a warning “expression has no effect” for cload_reg1() and this function does not load from foo. cload_reg2() does load from foo and gives no warnings.

Sun Compiler

We tested suncc version 5.10, hosted on x86 Linux and also targeting x86 Linux, using optimization level -O. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

x86-Open64

We tested opencc version 4.2.3, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler does not load from foo in cload_reg1(), nor does it emit any warning. It does load from foo in cload_reg2(). The C++ compiler has the same behavior as the C compiler.

LLVM / Clang

We tested subversion rev 98508, which is between versions 2.6 and 2.7, hosted on x86 Linux and also targeting x86 Linux, using optimization level -Os. The C compiler loads from foo in both cload_reg1() and cload_reg2() .
A warning about unused value is generated for cload_reg1(). The C++ compiler shows the same behavior as the C compiler.

CrossWorks for MSP430

We tested version 2.0.8.2009062500.4974, hosted on x86 Linux, using optimization level -O. This compiler supports only C. foo was not loaded in cload_reg1(), but it was loaded in cload_reg2().

IAR for AVR

We tested version 5.30.6.50191, hosted on Windows XP, using maximum speed optimization. The C compiler performed the load in all four cases. The C++ compiler did not perform the load for cload_reg1() or cload_reg3(),
but did for cload_reg2() and cload_reg4().

Keil 8051

We tested version 8.01, hosted on Windows XP, using optimization level 8, configured to favor speed. The Keil compiler failed to generate the required load in cload_reg1() (but did give at least give a warning), yet did perform the load in all other cases including cload_reg3() suggesting that for the Keil compiler, its IO register (SFR) semantics are treated differently to volatile variable semantics.

HI-TECH for PIC16

We tested version 9.70, hosted on Windows XP, using Global optimization level 9, configured to favor speed. This was very interesting in that the results were almost a mirror image to the Keil compiler. In this case the load was performed in all cases except cload_reg3(). Thus the HI-TECH semantics for IO registers and volatile variables also appears to be different – just the opposite to Keil! No warnings was generated by the Hi-TECH compiler when it failed to generate code.

Microchip Compiler for PIC18

We tested version 3.35, hosted on Windows XP, using full optimization level. This rounded out the group of embedded compilers quite nicely in that it didn’t perform the load in either cload_reg1() or cload_reg3() – but did in the rest. It also failed to warn about the statements having no effect. This was the worst performing of all the compilers we tested.

Summary

The level of non-conformance with the C compilers, together with the genuine uncertainty as to what the C++ compilers should do provides a real quandary. If you need the most efficient code possible, then you have no option other than to investigate what your compiler does. If you are looking for a generally reliable and portable solution, then the methodology in cload_reg2() is probably your best bet. However it would be just that: a bet. Naturally, we (and the other readers of this blog) would be very interested to hear what your compiler does. So if you have a few minutes, please run the sample code through your compiler and let us know the results.

Acknowledgments

We’d like to thank Hans Boehm at HP, Arch Robison at Intel, and the compiler groups at both Intel and IAR for their valuable feedback that helped us construct this post. Any mistakes are, of course, ours.
Home

Tags: , ,

26 Responses to “Reading a register for its side effects in C and C++”

  1. Colin says:

    Interesting article, in particular because I have, on occasion, compiled C-code with a C++-compiler in order to integrate into a C++ project … pitfalls galore! Two suggestions: (1) Present the results from the different compilers in table form (it’s currently a bit hard to get an overview), and (2) how does explicit pointer dereferencing stack up? E.g. “* static_cast( foo );”, or “* (volatile char *)foo” in C.

    • Nigel Jones says:

      Hi Colin. We tried putting the results in table form – and decided it was even worse than what we have right now. If other people report their results, then perhaps we will rethink this. Using a pointer de-reference is an interesting question. I do not know but I suspect that the differences between volatile variables and registers with Keil / HI-TECH might be explained in part by the fact that the underlying semantics for the registers may be based upon pointer de-referencing. I’ll look into this when I get some time.

  2. Tyler Doering says:

    John Regehr also has a paper regarding this worth the read. Volatiles Are Miscompiled, and What to Do about It.

  3. plinth says:

    I’m surprised you didn’t try
    unsigned char cload_reg() { return SBUF; }
    which, if not inlined, should force a read in all circumstances, no?

    • Nigel Jones says:

      Interesting suggestion. In general I’m not too fond of code that relies upon the compiler not doing something (e.g. not inlining). However, in this case, if this is what it takes to get your compiler to do the requisite read then I’m all for it.

  4. Keith says:

    For embedded systems it seems like cload_reg4() is the way to go. Where there any cases where this (or cload_reg2()) failed? It also makes the most sense. I find that short cutting operations in C, though they may be “allowed,” is almost always a bad habit to get into.

  5. Ken Smith says:

    I realize this is tangential to the nature of volatile that is under discussion here but I want to show you that there is light at the end of the MMIO tunnel.

    While C++ taketh away with one hand, C++ also giveth with the other. Both Martin’s and my articles are linked from his website.

    http://www.eld.leidenuniv.nl/~moene/Home/publications/accu/overload95-register/

    And here’s a direct link to mine. (Download the pdf at the end of the post.)

    http://yogiken.wordpress.com/2010/02/10/on-publishing/

    We both discuss new approaches of working with hardware registers that may help you skirt the issues with volatile. I only have experience with my technique with GCC which seems to do what I want all the time so far.

    My paper focuses on thinking about register subfields as independent things. If in the example under scrutiny in this article were defined as a subfield of a register, the compiler would have no choice but to read it because we shift and mask the contents and return the value from an inline function. Or at least this is what I have convinced myself happens by looking at the assembly that GCC generates. However, just reassigning the value to another volatile is almost certainly faster. Food for thought.

  6. xilun says:

    You can’t do register access with just volatiles in the general case. This is because the standards (C or C++) do not define what happen on the “bus” (even less on the various buses you can find in a modern architecture), do not care about architecture of modern CPUs (memory model, OOO, etc.) and so over.

    This is a well known fact of operating system writers.

    See Documentation/volatile-considered-harmful.txt in Linux for example.

    Of course, register access through volatile is a non portable feature offered on some simple embedded architecture. But this does not matter much for GCC, Intel Compiler, etc…

  7. Aleš Svetek says:

    I made similar tests with KEIL uVision v4.00 with armcc compiler v 4.0.0.524. targeting ARM7 and Cortex-M3 microcontrollers. The results are not encouraging…

    The particular C and C++ compilers have 4 optimization levels, -O0, -O1,-O2 and -O3, each with an option to optimize for time or size.

    The compilation results are basically the same for both CPU architectures, ARM7 and CM3.

    Compiling code with C compiler at optimization level -O0 and -O1 did emit the instructions for all 4 functions where volatile data was accessed. At optimization level -O2 and -O3 volatile data was accessed only in functions cload_reg2() and cload_reg4(). Additionally, C compiler did not produce any warning for not accessing the data in functions cload_reg1() and cload_reg3(). Options to optimize for time or size did not make any change regarding the matter at hand.

    On the other hand, C++ compiler did emit the instructions to access volatile data only in functions cload_reg2() and cload_reg4(), irrespectively of optimization level. There was no data access in functions cload_reg1() and cload_reg3(). But, at least, it did produce the warning: “expression has no effect” and what is important IMHO it was consistent at various optimization levels.

    If anyone is interested in disassembled output from both compilers at various optimization levels,
    please let me know and I can send you the data via email.

    ~Aleš

  8. Juergen says:

    It would be interesting to see what happens when sink is not declared volatile. Intuitively I would still expect the reads to take place.

    • Ashleigh says:

      I think you will find it depends on compiler and optimiation level. In a number of cases I’ve seen the assignment optimised out of existence, for C compilers not C++!

  9. Amol says:

    I tried with “CodeWarrior C/C++ for ColdFire” Version 5.2 Build 26 with Optimization level 4 (Best optimization level). It generated load in all the four cases AND it also gave a warning in cload_reg1() and cload_reg3() that “expression has no side effect”

  10. Peter B says:

    This is why I do low-level code in assembly. I KNOW what it does.

    When calling assembly code from C++ I often must use extern “C” … because the assembler refuses to create/use externals like “fifoInit()” for the C++ “void fifoInit(void)”. At least the linker error is obvious.

    My thinking goes like this —

    Any change in chip vendor such as between Luminary Micro and ST for Cortex-M3 is far from a “recompile and you are done” effort. Thus microcontroller vendor unique code will have to be rethought anyway.

    My assembly language skills have other uses. Once I had to resolve a struct that appeared to act differently under C and C++. The problem was “typedef BOOL” in C++ it turned into bool, while in C is was unsigned int. (This was 32-bit x86 code.) Nobody else realized that there was no reason for bool to be 32 bits. A quick conformation using sizeof in a C and C++ module confirmed my supposition. Changed int to char and all was well.

    Please allow a somewhat OT drift

    While employed at a disk drive company I worked on an embedded drive test system. I was responsible for buffer allocation. The system featured a script from which the actual tests were called. My buffer allocation ran as script pre-allocating one large chunk of RAM then splitting this RAM into various sized buffers. Script said how many buffers of each size. There was zero heap allocation/deallocation at run time.

    • Nigel Jones says:

      Although I agree using assembly language ensures you get what you wrote (if not what you wanted!), I’m not sure it’s the solution here (at least for C programs). The C language specification is quite clear about what the compiler should do in this circumstance. Thus the question becomes where do you draw the line, because in the limit one would have to write entirely in assembly language. Changing to your off topic comment. What scripting language did you use? I ask because in my CFT (copious free time) I’m intending to become proficient in a scripting language, with Perl being the current front runner.

  11. Ignacio G. T. says:

    Renesas C compiler for M32C

    I tested M32C/90,80,M16C/80,70 Series C Compiler V.5.41 Release 01, hosted on Windows XP, using maximum speed optimization (-O3 -Os). The C compiler performed the load in all four cases. No C++ compiler is available.

  12. Jim Sawyer says:

    Interesting response from the vendors.

    Because the premise:

    ” The operation that turns into a load instruction in the executable code is what the C++ standard calls
    the lvalue-to-rvalue conversion;… ”

    does not apply to the context:

    “volatile”

    since “volatile” values NEVER convert to rvalues:

    ” an rvalue (or just value; something whose address can’t be taken and can be in a register) ”

    The disconnect seems to occur at the phrase

    ” can be in a register ”

    which is of course still true for volatile values,
    but misleading, because the (missing?) phrase

    ” can be cached in a register ”

    is true for rvalues, but false for volatile values.

    Therefore, the spec is apparently silent
    on the issue of LOADing volatile values.

    That said, if I were talking to volatile registers,
    I’d be inclined to consider the C spec in error.

    I might have tried something like:

    void cload_reg (void)
    {
    register unsigned char sink;
    sink = SBUF;
    }

    expecting it to work in all cases.

    Truth be told, I’ve used “volatile” in proprietary languages,
    e.g. PLUS, because it was reliable — the compiler guy was
    right down the hall, and beatings were seldom required 😉

    For C, I’ve always punted, and done volatile access in ASM.

  13. Lundin says:

    I have some doubt in this article. Can you please cite where in the C standard the code given has an “unambiguous meaning”?

    In ISO C 9899:1999 5.1.2.3 “Program execution” we can read that accessing a volatile object is a “side effect” and that “Evaluation of an expression may produce side effects”.

    It says “may” and not “shall”.

    Then in 6.8.3 “Expression and null statements” we can read that an expression is evaluated for its side effects, although the result is ignored (void expression).

    So the expression will be -evaluated- but there is no guarantee for side effects, i.e actual access of the volatile. So I don’t think the C standard is “unambiguous”, on the contrary I think the standard is pretty clear that this is implementation-defined behavior. Feel free to cite the standard to prove me wrong.

    For your information, I tested cload_reg1() and cload_reg2() on:
    Freescale Codewarrior 4.7 for HCS12
    Freescale Codewarrior 6.0 for HCS08.

    – Optimization was enabled in both cases.
    – On both compilers code is generated to access the variable.
    – On both compilers I get a warning for “result not used”.

    • John Regehr says:

      Hi Daniel.

      You are arguing that it is implementation-defined whether a C compiler performs side-effects while evaluating an expression. I’ll first try to show you that this interpretation does not make sense, and second back this up using the standard.

      You are arguing that when a C implementation evaluates an expression like this:

      x;

      (where x is volatile), the implementation is not required to perform the volatile access. Let us look at a different expression:

      1+printf(“hello”);

      You are arguing that the compiler does not need to print anything when evaluating this expression. Clearly this is wrong.

      Next let’s look at the standard. I’m using N1425. You are interpreting “may” as “evaluation of a side-effecting expression (in the abstract machine) may or may not produce side effects in the actual computation.” This seems to be a misreading. I read it as “evaluating some expressions (like the examples above) produces side effects, whereas other expressions have no side effects.”

      If you look at the standard you’ll see that “shall” and “shall not” have a codified meaning, whereas “may” does not. It is plain English. For example the standard says “Attention is drawn to the possibility that some of the elements of this document MAY be the subject of patent rights.” Clearly this is a plain English “may” and not some codified requirement about a C implementation.

      If I haven’t convinced you, let me know, and we can continue to argue. However, since your interpretation leads to a useless language I cannot see how it could be a valid reading of the standard.

  14. David Brown says:

    I usually use the form:

    void cload_reg5 (void)
    {
    (void) SBUF;
    }

    To my mind, this is the clearest way to express the intent – you are explicitly saying you will throw away the value read. And I believe it forces an lvalue to rvalue conversion in C++, but I not entirely sure.

    It also clearly expresses that intent to the compiler, and produces the expected code in all the tests I have tried without producing any warning messages. I use all the warning capabilities of my compilers, and a function like cload_reg2 will rightfully warn that “sink” is not used, while the cload_reg1 form will (on some compilers) complain the statement has no effect.

    Could you test this form on your compilers? The only C++ compiler I have conveniently available is gcc (in lots of versions, for lots of targets), and I already know it works as expected.

  15. Rico says:

    Superbe poste comme d’habitude

  16. […] many C compilers do not generate the load , so you should not rely on this construct, regardless of what is the right interpretation of the […]

Leave a Reply

You must be logged in to post a comment.