embedded software boot camp

Firmware-Specific Bug #3: Missing Volatile Keyword

Thursday, February 18th, 2010 by Michael Barr

Failure to tag certain types of variables with C’s ‘volatile’ keyword, can cause a number of symptoms in a system that works properly only when the compiler’s optimizer is set to a low level or disabled. The volatile qualifier is used during variable declarations, where its purpose is to prevent optimization of the reads and writes of that variable.

For example, if you write code that says:


    g_alarm = ALARM_ON;    // Patient dying--get nurse!
    // Other code; with no reads of g_alarm state.
    g_alarm = ALARM_OFF;   // Patient stable.

the optimizer will generally try to make your program both faster and smaller by eliminating the first line above–to the detriment of the patient. However, if g_alarm is declared as volatile this optimization will not take place.

Best Practice: The ‘volatile’ keyword should be used to declare any: (a) global variable shared by an ISR and any other code; (b) global variable accessed by two or more RTOS tasks (even when race conditions in those accesses have been prevented); (c) pointer to a memory-mapped peripheral register (or register set); or (d) delay loop counter.

Note that in addition to ensuring all reads and writes take place for a given variable, the use of volatile also constrains the compiler by adding additional “sequence points”. Accesses to multiple volatiles must be executed in the order they are written in the code.

Firmware-Specific Bug #2

Firmware-Specific Bug #4

Tags: , , , ,

8 Responses to “Firmware-Specific Bug #3: Missing Volatile Keyword”

  1. Anonymous says:

    Michael-

    This comes from the Linux kernel but it’s very much an embedded sort of bug.

    In Linux 2.2.26 at arch/i386/kernel/smp.c:125 this code appears:

    volatile unsigned long ipi_count;

    However, the corresponding header file here include/asm-i386/smp.h:178 has dropped the volatile qualifier:

    extern unsigned long ipi_count;

    So one compilation unit treats this variable as volatile, but others do not. This sort of problem can obviously create any amount of trouble. The 2.3.x kernel series also contains this kind of bug, but at some point gcc started treating this as an error and the bugs seem to have been eradicated by 2.4. I’d imagine that plenty of embedded compilers such as those based on old versions of gcc will let this code slip through.

    John

    • David Brown says:

      There is no problem declaring a variable as volatile one place, and non-volatile somewhere else (though multiple declarations are a bad thing in general). The key to using “volatile” correctly is to understand that it is not /data/ that is volatile, it is /accesses/ that are volatile.

      Sometimes you have data that must have controlled volatile access at some times, and can be safely used as a normal variable at other times. Examples include data that is used within an interrupt routine – it may not need “volatile” inside the routine, but does need it outside of interrupt contexts. The correct way to handle this is through appropriate pointer casts, but it is legal (but lazy) to have different volatile qualifiers in two separate declarations.

  2. Anonymous says:

    Michael,

    I’m not an ANSI C buff but I think the last statement of your post may be open to mis-interpretation.You said: “All lines of code above the read or write of a volatile variable must be executed prior to that access; likewise, all lines of code below the access must be executed afterward.”

    But I have run into an issue where access to a location A followed by access to a location B would be re-ordered by the compiler unless **both** of them were declared as volatile. According to the vendor of the specific compiler, ordering is not guaranteed between volatile and non-volatile accesses.

    After reading your post I ran a quick search, and I’m under the impression other folks have come across the same type of issue. See for example: http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/

    Thanks,
    Antonio Arena

    • David Brown says:

      That’s correct. Volatile accesses only enforce order among other volatile accesses – there are no requirements for ordering other accesses. It is very common to see people write code like this in embedded systems :

      extern volatile bool interruptDisable; // This will be a real hardware flag
      void foo(void) {

      interruptDisable = true;
      // Code that must run atomically

      interruptDisable = false;

      }

      Typically you need a memory barrier before and after the critical code, as well as the interrupt disables.

      • James Vasil says:

        However, it seems to me that “lots” of the embedded systems code you are referring to is written for processors that don’t support out-of-order instruction execution and so there is no need for a memory barrier to prevent the processor causing a problem. Of course you do still need to make sure that the compiler doesn’t do this type of reordering in its efforts to optimize the code.
        -James

  3. Anonymous says:

    Hey,

    For the nastiest bug there is, I’d like to suggest the following one:

    Optimized non-volatile file scope variables

    The bug occurs when variables at file scope shared with an ISR (or thread) aren’t declared as volatile. The optimizing compiler will then detect that a particular variable is not used and possibly optimize it away. Example:

    static BOOL got_interrupt = FALSE;

    void main()
    {
    ...
    if (got_interrupt)
    {
    do_something();
    }
    }

    interrupt void isr (void)
    {
    got_interrupt = TRUE;
    }

    The optimizing compiler may in this example never execute do_something(). It notes that the variable is set in the function isr(), but since that function is never called anywhere, the compiler assumes that got_interrupt is always FALSE. The whole if-statement will get optimized away from the machine code.

    VERY common bug among (so-called) embedded programmers, and often very hard to detect. The compiler will usually never find it for you, you often won't see anything in a debugger either. Even static analyzers will often fail to find it as they don't support the non-standard interrupt syntax. Recognized secure standards such as MISRA and CERT don't address the bug either.

    The best way of detecting this bug is to manually look at the dissassembled C code.The consequences of the bug are always of random nature, and as interrupts are event-driven, the bug is often intermittent as well. The program can run for weeks before the bug appears. The bug is also very compiler-dependant, so even if the code is close to pure ANSI/ISO, it will still behave differently on another compiler, since the optimizing of C code isn't covered by any standards.

    Best regards,

    Daniel

  4. Amol says:

    Michael,

    Why “global variable accessed by two or more RTOS tasks (even when race conditions in those accesses have been prevented)” should be declared as volatile?

  5. Michael Barr says:

    Amol,

    The problem is that the mutex may wind up protecting the wrong critical section code. Consider the following pseudocode:


    int foo;

    task_a() { ... take(mutex); foo += 1; give(mutex); ... }

    task_b() { foo = 0; ... take(mutex); if (foo == 0) ... else ...; give(mutex); }

    The optimization phase of the compiler may note that task_b() initializes foo to 0 and shortly thereafter tests foo against 0. In that case, it is permitted to make the program smaller and/or faster by eliminating both the if test and the else case.

    Thus, when RTOS tasks share global variables you must BOTH declare those variables volatile AND protect all accesses to them from race conditions.

    Hope this helps clarify the situation.

    Cheers,
    Mike

Leave a Reply to Anonymous