embedded software boot camp

Jump Tables via Function Pointer Arrays in C/C++

December 17th, 2009 by Nigel Jones

Also available in PDF version.

Jump tables, also called branch tables, are an efficient means of handling similar events in software. Here’s a look at the use of arrays of function pointers in C/C++ as jump tables.

Examination of assembly language code that has been crafted by an expert will usually reveal extensive use of function “branch tables.” Branch tables (a.k.a., jump tables) are used because they offer a unique blend of compactness and execution speed, particularly on microprocessors that support indexed addressing. When one examines typical C/C++ code, however, the branch table (i.e., an array of funtion pointers) is a much rarer beast. The purpose of this article is to examine why branch tables are not used by C/C++ programmers and to make the case for their extensive use. Real world examples of their use are included.

Function pointers

In talking to C/C++ programmers about this topic, three reasons are usually cited for not using function pointers. They are:

  • They are dangerous
  • A good optimizing compiler will generate a jump table from a switch statement, so let the compiler do the work
  • They are too difficult to code and maintain

Are function pointers dangerous?

This school of thought comes about, because code that indexes into a table and then calls a function based on the index has the capability to end up just about anywhere. For instance, consider the following code fragment:

void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

void test(const INT jump_index)
{
    /* Call the function specified by jump_index */
    pf[jump_index]();
}

The above code declares pf[] to be an array of pointers to functions, each of which takes no arguments and returns void. The test() function simply calls the specified function via the array. As it stands, this code is dangerous for the following reasons.

  • pf[] is accessible by anyone
  • In test(), there is no bounds checking, such that an erroneous jump_index would spell disaster

A much better way to code this that avoids these problems is as follows

void test(uint8_t const ump_index)
{
    static void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

    if (jump_index < sizeof(pf) / sizeof(*pf))
    {
        /* Call the function specified by jump_index */
        pf[jump_index]();
    }
}

The changes are subtle, yet important.

  • By declaring the array static within the function, no one else can access the jump table
  • Forcing jump_index to be an unsigned quantity means that we only need to perform a one sided test for our bounds checking
  • Setting jump_index to the smallest data type possible that will meet the requirements provides a little more protection (most jump tables are smaller than 256 entries)
  • An explicit test is performed prior to making the call, thus ensuring that only valid function calls are made. (For performance critical applications, the if() statement could be replaced by an assert())

This approach to the use of a jump table is just as secure as an explicit switch statement, thus the idea that jump tables are dangerous may be rejected.

Leave it to the optimizer?

It is well known that many C compilers will attempt to convert a switch statement into a jump table. Thus, rather than use a function pointer array, many programmers prefer to use a switch statement of the form:

void test(uint8_t j const ump_index)
{
    switch (jump_index)
    {
      case 0:
        fna();
        break;

      case 1:
        fnb();
        break;

        …

      case 26:
        fnz();
        break;

      default:
        break;
    }
}

Indeed, Jack Crenshaw advocated this approach in a September 1998 column in Embedded Systems Programming. Well, I have never found myself disagreeing with Dr. Crenshaw before, but there is always a first time for everything! A quick survey of the documentation for a number of compilers revealed some interesting variations. They all claimed to potentially perform conversion of a switch statement into a jump table. However, the criteria for doing so varied considerably. One vendor simply said that they would attempt to perform this optimization. A second claimed to use a heuristic algorithm to decide which was “better,” while a third permitted pragma’s to let the user specify what they wanted. This sort of variation does not give one a warm fuzzy feeling!

In the case where one has, say, 26 contiguous indices, each associated with a single function call (such as the example above), the compiler will almost certainly generate a jump table. However, what about the case where you have 26 non-contiguous indices, that vary in value from 0 to 1000? A jump table would have 974 null entries or 1948 “wasted” bytes on the average microcontroller. Most compilers would deem this too high a penalty to pay, and would eschew the jump table for an if-else sequence. However, if you have EPROM to burn, it actually costs nothing to implement this as a jump table, but buys you consistent (and fast) execution time. By coding this as a jump table, you ensure that the compiler does what you want.

There is a further problem with large switch statements. Once a switch statement gets much beyond a screen length, it becomes harder to see the big picture, and thus the code is more difficult to maintain. A function pointer array declaration, adequately commented to explain the declaration, is much more compact, allowing one to see the overall picture. Furthermore, the function pointer array is potentially more robust. Who has not written a large switch statement and forgotten to add a break statement on one of the cases?

Complexities

Complexity associated with jump table declaration and use is the real reason they are not used more often. In embedded systems, where pointers normally have mandatory memory space qualifiers, the declarations can quickly become horrific. For instance, the example above would be highly undesirable on most embedded systems, since the pf[] array would probably end up being stored in RAM, instead of ROM. The way to ensure the array is stored in ROM varies somewhat between compiler vendors. However, a first step that is portable to all systems is to add const qualifiers to the declaration. Thus, our array declaration now becomes:

static void (* const pf[])(void) = {fna, fnb, fnc, …, fnz};

Like many users, I find these declarations cryptic and very daunting. However, over the years, I have built up a library of declaration templates that I simply refer to as necessary. A compilation of useful templates appears below.

A handy trick is to learn to read complex declarations like this backwards–i.e., from right to left. Doing this here’s how I’d read the above: pf is an array of constant pointers to functions that return void. The static keyword is only needed if this is declared privately within the function that uses it–and thus keeping it off the stack.

Arrays of function pointers

Most books about C programming cover function pointers in less than a page (while devoting entire chapters to simple looping constructs). The descriptions typically say something to the effect that you can take the address of a function, and thus one can define a pointer to a function, and the syntax looks like such and such. At which point, most readers are left staring at a complex declaration, and wondering what exactly function pointers are good for. Small wonder that function pointers do not feature heavily in their work.

Well then, where are jump tables useful? In general, arrays of function pointers are useful whenever there is the potential for a range of inputs to a program that subsequently alters program flow. Some typical examples from the embedded world are given below.

Keypads

The most often cited example for uses of function pointers is with keypads. The general idea is obvious. A keypad is normally arranged to produce a unique keycode. Based on the value of the key pressed, some action is taken. This can be handled via a switch statement. However, an array of function pointers is far more elegant. This is particularly true when the application has multiple user screens, with some key definitions changing from screen to screen (i.e., the system uses soft keys). In this case, a two dimensional array of function pointers is often used.

#define N_SCREENS  16
#define N_KEYS     6

/* Prototypes for functions that appear in the jump table */
INT fnUp(void);
INT fnDown(void);
…
INT fnMenu(void);
INT fnNull(void);

INT keypress(uint8_t key, uint8_t screen)
{
    static INT (* const pf[N_SCREENS][N_KEYS])(void) =
    {
        {fnUp, fnDown, fnNull, …, fnMenu},
	{fnMenu, fnNull, …, fnHome},
	…
	{fnF0, fnF1, …, fnF5}
    };

    assert (key < N_KEYS);
    assert( screen < N_SCREENS);

    /* Call the function and return result */
    return (*pf[screen][key])();
}

/* Dummy function - used as an array filler */
INT fnNull(void)
{
    return 0;
}

There are several points to note about the above example:

  • All functions to be named in a jump table should be prototyped. Prototyping is the best line of defense against including a function that expects the wrong parameters, or returns the wrong type.
  • As for earlier examples, the function table is declared within the function that makes use of it (and, thus, static)
  • The array is made const signifying that we wish it to remain unchanged
  • The indices into the array are unsigned, such that only single sided bounds checking need be done
  • In this case, I have chosen to use the assert() macro to provide the bounds checking. This is a good compromise between debugging ease and runtime efficiency.
  • A dummy function fnNull() has been declared. This is used where a keypress is undefined. Rather than explicitly testing to see whether a key is valid, the dummy function is invoked. This is usually the most efficient method of handling an function array that is only partially populated.
  • The functions that are called need not be unique. For example, a function such as fnMenu may appear many times in the same jump table.

Communication protocols

Although the keypad example is easy to appreciate, my experience in embedded systems is that communication links occur far more often than keypads. Communication protocols are a challenge ripe for a branch table solution. This is best illustrated by an example.

Last year, I worked on the design for an interface box to a very large industrial power supply. This interface box had to accept commands and return parameter values over a RS-232 link. The communications used a set of simple ASCII mnemonics to specify the action to be taken. The mnemonics consisted of a channel number (0,1, or 2), followed by a two character parameter. The code to handle a read request coming in over the serial link is shown below. The function process_read() is called with a pointer to a string fragment that is expected to consist of the three characters (null terminated) containing the required command.

const CHAR *fna(void);	// Example function prototype

static void process_read(const CHAR *buf)
{
    CHAR *cmdptr;
    UCHAR offset;
    const CHAR *replyptr;

    static const CHAR read_str[] =
	"0SV 0SN 0MO 0WF 0MT 0MP 0SW 1SP 1VO 1CC 1CA 1CB
	 1ST 1MF 1CL 1SZ 1SS 1AZ 1AS 1BZ 1BS 1VZ 1VS 1MZ
	 1MS 2SP 2VO 2CC 2CA 2CB 2ST 2MF 2CL 2SZ 2SS
	 2AZ 2AS 2BZ 2BS 2VZ 2VS 2MZ 2MS ";	

    static const CHAR *
        (* const readfns[sizeof(read_str)/4])(void) =
        {
	    fna,fnb,fnc, …
        };

    cmdptr = strstr(read_str, buf);

    if (cmdptr != NULL)
    {
	/*
         * cmdptr points to the valid command, so compute offset,
	 * in order to get entry into function jump table
         */
	offset = (cmdptr - read_str) / 4;  

	/* Call function and get pointer to reply*/
	replyptr = (*readfns[offset])();

	/* rest of the code goes here */
    }
}

The code above is quite straightforward. A constant string, read_str, is defined. The read_str contains the list of all legal mnemonic combinations. Note the use of added spaces to aid clarity. Next, we have the array of function pointers, one pointer for each valid command. We determine if we have a valid command sequence by making use of the standard library function strstr(). If a match is found, it returns a pointer to the matching substring, else it returns NULL. We check for a valid pointer, compute the offset into the string, and then use the offset to call the appropriate handler function in the jump table. Thus, in four lines of code, we have determined if the command is valid and called the appropriate function. Although the declaration of readfns[] is complex, the simplicity of the runtime code is tough to beat.

Timed task list

A third area where function pointers are useful is in timed task lists. In this case, the input to the system is the passage of time. Many projects cannot justify the use of an RTOS. Instead, all that is required is that a number of tasks run at predetermined intervals. This is very simply handled as shown below.

typedef struct
{
   UCHAR interval;      /* How often to call the task */
   void (*proc)(void);	/* pointer to function returning void */

} TIMED_TASK;

static const TIMED_TASK timed_task[] =
{
    { INTERVAL_16_MSEC,  fnA },
    { INTERVAL_50_MSEC,  fnB },
    { INTERVAL_500_MSEC, fnC },
    …
    { 0, NULL }
};

extern volatile UCHAR tick;

void main(void)
{
    const TIMED_TASK *ptr;
    UCHAR time;

    /* Initialization code goes here. Then enter the main loop */

    while (1)
    {
	if (tick)
        {
            /* Check timed task list */
            tick--;
            time = computeElapsedTime(tick);
            for (ptr = timed_task; ptr->interval !=0; ptr++)
            {
                if (!(time % ptr->interval))
                {
                    /* Time to call the function */
		    (ptr->proc)();
                }
            }
	}
    }
}

In this case, we define our own data type (TIMED_TASK) that consists simply of an interval and a pointer to a function. We then define an array of TIMED_TASK, and initialize it with the list of functions that are to be called and their calling interval. In main(), we have the start up code which must enable a periodic timer interrupt that increments the volatile variable tick at a fixed interval. We then enter the infinite loop.

The infinite loop checks for a non-zero tick value, decrements the tick variable and computes the elapsed time since the program started running. The code then simply steps through each of the tasks, to see whether it is time for that one to be executed and, if so, calls it via the function pointer.

If your application only consists of two or three tasks, then this approach is probably overkill. However, if your project has a large number of timed tasks, or it is likely that you will have to add tasks in the future, then this approach is rather palatable. Note that adding tasks and/or changing intervals simply requires editing of the timed_task[] array. No code, per se, has to be changed.

Interrupt vector tables

The fourth application of function jump tables is the array of interrupt vectors. On most processors, the interrupt vectors are in contiguous locations, with each vector representing a pointer to an interrupt service routine function. Depending upon the compiler, the work may be done for you implicitly, or you may be forced to generate the function table. In the latter case, implementing the vectors via a switch statement will not work!

Here is the vector table from the industrial power supply project mentioned above. This project was implemented using a Whitesmiths’ compiler and a 68HC11 microncontroller.

IMPORT VOID _stext();  /* 68HC11-specific startup routine */

static VOID (* const _vectab[])() =
{
    SCI_Interrupt,	/* SCI              */
    badSPI_Interrupt,	/* SPI              */
    badPAI_Interrupt,	/* Pulse acc input  */
    badPAO_Interrupt, 	/* Pulse acc overf  */
    badTO_Interrupt,	/* Timer overf      */
    badOC5_Interrupt,	/* Output compare 5 */
    badOC4_Interrupt,	/* Output compare 4 */
    badOC3_Interrupt, 	/* Output compare 3 */
    badOC2_Interrupt,	/* Output compare 2 */
    badOC1_Interrupt,	/* Output compare 1 */
    badIC3_Interrupt,	/* Input capture 3  */
    badIC2_Interrupt,	/* Input capture 2  */
    badIC1_Interrupt,	/* Input capture 1  */
    RTI_Interrupt,	/* Real time        */
    Uart_Interrupt,	/* IRQ              */
    PFI_Interrupt,	/* XIRQ             */
    badSWI_Interrupt,	/* SWI              */
    IlOpC_Interrupt,	/* illegal          */
    _stext,		/* cop fail         */
    _stext,		/* cop clock fail   */
    _stext,		/* RESET            */
};

A couple of points are worth making:

  • The above is insufficient to locate the table correctly in memory. This has to be done via linker directives.
  • Note that unused interrupts still have an entry in the table. Doing so ensures that the table is correctly aligned and traps can be placed on unexpected interrupts.

If any of these examples has whet your appetite for using arrays of function pointers, but you are still uncomfortable with the declaration complexity, then fear not! You will find a variety of declarations, ranging from the straightforward to the downright appalling below. The examples are all reasonably practical in the sense that the desired functionality is not outlandish (that is, there are no declarations for arrays of pointers to functions that take pointers to arrays of function pointers and so on).

Declaration and use hints

All of the examples below adhere to conventions that I have found to be useful over the years, specifically:

1. All of the examples are preceded by static. This is done on the assumption that the scope of a function table should be highly localized, ideally within an enclosing function.

2. In every example the array pf[] is also preceded with const. This declares that the pointers in the array cannot be modified after initialization. This is the normal (and safe) usage scenario.

3. There are two syntactically different ways of invoking a function via a pointer. If we have a function pointer with the declaration:

void (*fnptr)(int);	/* fnptr is a function pointer */

Then it may be invoked using either of these methods:

fnptr(3);	/* Method 1 of invoking the function */
(*fnptr)(3);	/* Method 2 of invoking the function */

The advantage of the first method is an uncluttered syntax. However, it makes it look as if fnptr is a function, as opposed to being a function pointer. Someone maintaining the code may end up searching in vain for the function fnptr(). With method 2, it is much clearer that we are dereferencing a pointer. However, when the declarations get complex, the added (*) can be a significant burden. Throughout the examples, each syntax is shown. In practice, the latter syntax seems to be more popular–and you should use only one.

4. In every example, the syntax for using a typedef is also given. It is quite permissible to use a typedef to define a complex declaration, and then use the new type like a simple type. If we stay with the example above, then an alternative declaration is:

typedef void (*PFV_I )(int);

/* Declare a PVFV_I typed variable and init it */
PFV_I fnptr = fna;

/* Call fna with parameter 3 using method 1 */
fnptr(3);	

/* Call fna with parameter 3 using method 2 */
(*fnptr)(3);

The typedef declares the type PFV_I to be a pointer to a function that returns void and is passed an integer. We then simply declare fnptr to a variable of this type, and use it. Typedefs are very good when you regularly use a certain function pointer type, since it saves you having to remember and type in the declaration. The downside of using a typedef, is the fact that it is not obvious that the variable that has been declared is a pointer to a function. Thus, just as for the two invocation methods above, you can gain syntactical simplicity by hiding the underlying functionality.

In the typedefs, a consistent naming convention is used. Every type starts with PF (Pointer to Function) and is then followed with the return type, followed by an underscore, the first parameter type, underscore, second parameter type and so on. For void, boolean, char, int, long, float and double, the characters V, B, C, I, L, S, D are used. (Note the use of S(ingle) for float, to avoid confusion with F(unction)). For a pointer to a data type, the type is preceded with P. Thus PL is a pointer to a long. If a parameter is const, then a c appears in the appropriate place. Thus, cPL is a const pointer to a long, whereas a PcL is a pointer to a const long, and cPcL is a const pointer to a const long. For volatile qualifiers, v is used. For unsigned types, a u precedes the base type. For user defined data types, you are on your own!

An extreme example: PFcPcI_uI_PvuC. This is a pointer to a function that returns a const pointer to a const Integer that is passed an unsigned integer and a pointer to a volatile unsigned char.

Function pointer templates

The first eleven examples are generic in the sense that they do not use memory space qualifiers and hence may be used on any target. Example 12 shows how to add memory space qualifiers, such that all the components of the declaration end up in the correct memory spaces.

Example 1

pf[] is a static array of pointers to functions that take an INT as an argument and return void.

void fna(INT);	// Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_I)(INT);
static PFV_I pf[] = {fna,fnb,fnc, … fnz);

// Direct declaration
static void (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](a);	// Calling method 1
(*pf[jump_index])(a);	// Calling method 2

Example 2

pf [] is a static array of pointers to functions that take a pointer to an INT as an argument and return void.

void fna(INT *);	// Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_PI)(INT *);
static PVF_PI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static void (* const pf[])(INT *) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](&a);	// Calling method 1
(*pf[jump_index])(&a);	// Calling method 2

Example 3

pf [] is a static array of pointers to functions that take an INT as an argument and return a CHAR

CHAR fna(INT); 	// Example prototype of a function to be called

// Declaration using typedef
typedef CHAR (* const PFC_I)(INT);
static PVC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR res;
res = pf[jump_index](a);	// Calling method 1
res = (*pf[jump_index])(a);	// Calling method 2

Example 4

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a CHAR.

CHAR *fna(INT);	// Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * (* const PFPC_I)(INT);
static PVPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * res;
res = pf[jump_index](a); 	// Calling method 1
res = (*pf[jump_index])(a);	// Calling method 2

Example 5

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a const CHAR (i.e. the pointer may be modified, but what it points to may not).

const CHAR *fna(INT); 	// Example prototype of a function to be called

// Declaration using typedef
typedef const CHAR * (* const PFPcC_I)(INT);
static PVPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR * res;
res = pf[jump_index](a);		//Calling method 2
res = (*pf[jump_index])(a);	//Calling method 2

Example 6

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a CHAR (i.e. the pointer may not be modified, but what it points to may be modified).

CHAR * const fna(INT i);  // Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * const (* const PFcPC_I)(INT);
static PVcPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * const res = pf[jump_index](a);	//Calling method 1
CHAR * const res = (*pf[jump_index])(a);	//Calling method 2

Example 7

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT i);  // Example function prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_I)(INT);
static PVcPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR* const res = pf[jump_index](a); 	// Calling method 1
const CHAR* const res = (*pf[jump_index])(a); 	// Calling method 2

Example 8

pf [] is a static array of pointers to functions that take a pointer to a const INT as an argument (i.e. the pointer may be modified, but what it points to may not) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(const INT *i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_PcI)(const INT *);
static PVcPcC_PcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT *) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *aptr;
aptr = &a;
const CHAR* const res = pf[jump_index](aptr);	//Calling method 1
const CHAR* const res = (*pf[jump_index])(aptr);//Calling method 2

Example 9

pf [] is a static array of pointers to functions that take a const pointer to an INT as an argument (i.e. the pointer may not be modified, but what it points to may ) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT *const i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPI)(INT * const);
static PVcPcC_cPI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT * const) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
INT *const aptr = &a;
const CHAR* const res = pf[jump_index](aptr);		//Method 1
const CHAR* const res = (*pf[jump_index])(aptr);		//Method 2

Example 10

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(const INT *const i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPcI)(const INT * const);
static PVcPcC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *const aptr = &a;

const CHAR* const res = pf[jump_index](aptr);		// Method 1
const CHAR* const res = (*pf[jump_index])(aptr);		// Method 2

This example manages to combine five incidences of const and one of static into a single declaration. For all of its complexity, however, this is not an artificial example. You could go ahead and remove all the const and static declarations and the code would still work. It would, however, be a lot less safe, and potentially less efficient.

Just to break up the monotony, here is the same declaration, but with a twist.

Example 11

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a volatile CHAR (i.e. the pointer may not be modified, but what it points to may change unexpectedly)

volatile CHAR * const fna(const INT *const i);	// Example prototype

// Declaration using typedef
typedef volatile CHAR * const (* const PFcPvC_cPcI)(const INT * const);
static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static volatile CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT * const aptr = &a;

volatile CHAR * const res = pf[jump_index](aptr);	// Method 1
volatile CHAR * const res = (*pf[jump_index])(aptr);	//Method 2

while (*res)
	;	//Wait for volatile register to clear

With memory space qualifiers, things can get even more hairy. For most vendors, the memory space qualifier is treated syntactically as a type qualifier (such as const or volatile) and thus follows the same placement rules. For consistency, I place type qualifiers to the left of the “thing” being qualified. Where there are multiple type qualifiers, alphabetic ordering is used. Since memory space qualifiers are typically compiler extensions, they are normally preceded by an underscore, and hence come first alphabetically. Thus, a nasty declaration may look like this:

_ram const volatile UCHAR status_register;

To demonstrate memory space qualifier use, here is example 11 again, except this time memory space qualifiers have been added. The qualifiers are named _m1 … _m5.

Example 12

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a volatile CHAR (i.e. the pointer may be modified, but what it points to may change unexpectedly). Each element of the declaration lies in a different memory space. In this particular case, it is assumed that you can even declare the memory space in which parameters passed by value appear. This is extreme, but is justified on pedagogical grounds.

/* An example prototype. This declaration reads as follows.
 * Function fna is passed a const pointer in _m5 space that points to a
 * const integer in _m4 space. It returns a const pointer in _m2 space to
 * a volatile character in _m1 space.
 */
_m1 volatile CHAR * _m2 const fna(_m4 const INT * _m5 const i);

/* Declaration using typedef. This declaration reads as follows.
 * PFcPvC_cPcI is a pointer to function data type, variables based
 * upon which lie in _m3 space. Each Function is passed a const
 * pointer in _m5 space that points to a const integer in _m4 space.
 * It returns a const pointer in _m2 space to a volatile character
 * in _m1 space.
 */
typedef _m1 volatile CHAR * _m2 const (* _m3 const PFcPvC_cPcI) (_m4 const INT * _m5 const);

static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

/* Direct declaration. This declaration reads as follows. pf[] is
 * a statically allocated constant array in _m3 space of pointers to functions.
 * Each Function is passed a const pointer in _m5 space that points to
 * a const integer in _m4 space. It returns a const pointer in _m2 space
 * to a volatile character in _m1 space.
 */
static _m1 volatile CHAR * _m2 const (* _m3 const pf[]) (_m4 const INT * _m5 const) = {fna, fnb, fnc, … fnz};

// Declare a const variable that lies in _m4 space
_m4 const INT a = 6;

// Now declare a const pointer in _m5 space that points to a const
// variable that is in _m4 space
_m4 const INT * _m5 const aptr = &a;

// Make the function call, and get back the pointer
volatile CHAR * const  res = pf[jump_index](&a); 	//Method 1
volatile CHAR * const  res = (*pf[jump_index])(&a); 	//Method 2

while (*res)
	;	// Wait for volatile register to clear

Acknowledgments

My thanks to Mike Stevens not only for reading over this manuscript and making some excellent suggestions but also for over the years showing me more ways to use function pointers that I ever dreamed was possible.


This article was published in the May 1999 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Arrays of Pointers to Functions” Embedded Systems Programming, May 1999.

In Praise of the C Preprocessor’s #error Directive

December 17th, 2009 by Nigel Jones

Also available in PDF version.

One of the least used but potentially most useful C preprocessor directives is #error. Here’s a look at a couple of clever uses for #error that have proven invaluable in embedded software development.

#error is an ANSI-specified feature of the C preprocessor (cpp). Its syntax is very straightforward:

#error <writer supplied error message>

The <writer supplied error message> can consist of any printable text. You don’t even have to enclose the text in quotes. (Technically, the message is optional–though it rarely makes sense to omit it.)

When the C preprocessor encounters a #error statement, it causes compilation to terminate and the writer-supplied error message to be printed to stderr. A typical error message from a C compiler looks like this:

Filename(line_number): Error!
Ennnn: <writer supplied error message>

where Filename is the source file name, line_number is the line number where the #error statement is located, and Ennnn is a compiler-specific error number. Thus, the #error message is basically indistinguishable from ordinary compiler error messages.

“Wait a minute,” you might say. “I spend enough time trying to get code to compile and now he wants me to do something that causes more compiler errors?” Absolutely! The essential point is that code that compiles but is incorrect is worse than useless. I’ve found three general areas in which this problem can arise and #error can help. Read on and see if you agree with me.

Incomplete code

I tend to code using a step-wise refinement approach, so it isn’t unusual during development for me to have functions that do nothing, for loops that lack a body, and so forth. Consequently, I often have files that are compilable but lack some essential functionality. Working this way is fine, until I’m pulled off to work on something else (an occupational hazard of being in the consulting business). Because these distractions can occasionally run into weeks, I sometimes return to the job with my memory a little hazy about what I haven’t completed. In the worst-case scenario (which has occurred), I perform a make, which runs happily, and then I attempt to use the code. The program, of course, crashes and burns, and I’m left wondering where to start.

In the past, I’d comment the file to note what had been done and what was still needed. However, I found this approach to be rather weak because I then had to read all my comments (and I comment heavily) in order to find what I was looking for. Now I simply enter something like the following in an appropriate place in the file:

#error *** Nigel - Function incomplete. Fix before using ***

Thus, if I forget that I haven’t done the necessary work, an inadvertent attempt to use the file will result in just about the most meaningful compiler message I’ll ever receive. Furthermore, it saves me from having to wade through pages of comments, trying to find what work I haven’t finished.

Compiler-dependent code

As much as I strive to write portable code, I often find myself having to trade off performance for portability – and in the embedded world, performance tends to win. However, what happens if a few years later I reuse some code without remembering that the code has compiler-specific peculiarities? The result is a much longer debug session than is necessary. But a judicious #error statement can prevent a lot of grief. A couple of examples may help.

Example 1

Some floating-point code requires at least 12 digits of resolution to return the correct results. Accordingly, the various variables are defined as type long double. But ISO C only requires that a long double have 10 digits of resolution. Thus on certain machines, a long double may be inadequate to do the job. To protect against this, I would include the following:

#include <float.h>
#if (LDBL_DIG < 12)
	#error *** long doubles need 12 digit resolution.
	Do not use this compiler! ***
#endif

This approach works by examining the value of an ANSI-mandated constant found in float.h.

Example 2

An amazing amount of code makes invalid assumptions about the underlying size of the various integer types. If you have code that has to use an int (as opposed to a user-specified data type such as int16), and the code assumes that an int is 16 bits, you can do the following:

#include <limits.h>
#if (INT_MAX != 32767)
	#error *** This file only works with 16-bit int.
	Do not use this compiler! ***
#endif

Again, this works by checking the value of an ANSI-mandated constant. This time the constant is found in the file limits.h. This approach is a lot more useful than putting these limitations inside a big comment that someone may or may not read. After all, you have to read the compiler error messages.

Conditionally-compiled code

Since conditionally compiled code seems to be a necessary evil in embedded programming, it’s common to find code sequences such as the following:

#if defined OPT_1
	/* Do option_1 */
#else
	/* Do option_2 */
#endif

As it is written, this code means the following: if and only if OPT_1 is defined, we will do option_1; otherwise we’ll do option_2. The problem with this code is that a user of the code doesn’t know (without explicitly examining the code) that OPT_1 is a valid compiler switch. Instead, the naïve user will simply compile the code without defining OPT_1 and get the alternate implementation, irrespective of whether that is what’s required or not. A more considerate coder might be aware of this problem, and instead do the following:

#if defined OPT_1
	/* Do option 1 */
#elif defined OPT_2
	/* Do option 2*/
#endif

In this case, failure to define either OPT_1 or OPT_2 will typically result in an obscure compiler error at a point later in the code. The user of this code will then be stuck with trying to work out what must be done to get the module to compile. This is where #error comes in. Consider the following code sequence:

#if defined OPT_1
	/* Do option_1 */
#elif defined OPT_2
	/* Do option_2 */
#else
	#error *** You must define one of OPT_1 or OPT_2 ***
#endif

Now the compilation fails, but at least it tells the user explicitly what to do to make the module compile. I know that if this procedure had been adopted universally, I would have saved a lot of time over the years trying to reuse other people’s code.

So there you have it. Now tell me, don’t you agree that #error is a really useful part of the preprocessor, worthy of your frequent use-and occasional praise?


This article was published in the September 1999 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “In Praise of the #error Directive” Embedded Systems Programming, September 1999.

Efficient C Code for 8-bit Microcontrollers

December 17th, 2009 by Nigel Jones

Also available in PDF version.

The 8051, 68HC11, and Microchip PIC are popular microcontrollers, but they aren’t necessarily easy to program. This article shows how the use of ANSI C and compiler-specific constructs can help generate tighter code.

Getting the best possible performance out of the C compiler for an 8-bit microcontroller isn’t always easy. This article concentrates mainly on those microcontrollers that were never designed to support high-level languages, such as members of the 8051, 6800 (including the 68HC11), and Microchip PIC families of microcontrollers. Newer 8-bit machines such as the Philips 8051XA and the Atmel Atmega series were designed explicitly to support high-level languages and, as such, may not need all the techniques I describe here.

My emphasis is not on algorithm design, nor does it depend on a specific microcontroller or compiler. Rather, I describe general techniques that are widely applicable. In many cases, these techniques work on larger machines, although you may then decide that the trade-offs involved aren’t worthwhile.

Before jumping into the meat of the article, let’s briefly digress with a discussion of the philosophy involved. The microcontrollers I’ve named are popular for reasons of size, price, power consumption, peripheral mix, and so on. Notice that “ease of programming” is conspicuously missing from this list. Traditionally, these microcontrollers have been programmed in assembly language. In the last few years, many vendors have recognized the desire of users to increase their productivity, and have introduced C compilers for these machines—many of which are extremely good. However, it’s important to remember that no matter how good the compiler, the underlying hardware has severe limitations. Thus, to write efficient C for these targets, it’s essential that we be aware of what the compiler can do easily and what requires compiler heroics. In presenting these techniques, I have taken the attitude that I wish to solve a problem by programming a microcontroller, and that the C compiler is a tool, no different from an oscilloscope. In other words, C is a means to an end, and not an end in itself. As a result, many of my comments will seem heretical to the high-level language purists out there.

ANSI C

The first step to writing a realistic C program for an 8-bit computer is to dispense with the concept of writing 100% ANSI code. This concession is necessary because I don’t believe it’s possible, or even desirable, to write 100% ANSI code for any embedded system, particularly for 8-bit systems.

Some characteristics of 8-bit systems that prevent ANSI compliance are:

  • Embedded software interacts with hardware, yet ANSI C provides extremely crude tools for addressing registers at fixed memory locations
  • All nontrivial systems use interrupts, yet ANSI C doesn’t have a standard way of coding interrupt service routines
  • ANSI C has various type promotion rules that are absolute performance killers on an 8-bit computer
  • Many older microcontrollers feature multiple memory banks, which have to be hardware swapped in order to correctly address the desired variable
  • Many microcontrollers have no hardware support for C’s stack (i.e., they lack a stack pointer)

This is not to say that I advocate junking the entire ANSI C standard. I take the view that one should use standard C as much as possible. However, when it interferes with solving the problem at hand, do not hesitate to bypass it. Does this interfere with making code portable and reusable? Absolutely. But portable, reusable code that doesn’t get the job done isn’t much use.

I’ve also noticed that every compiler has a switch that strictly enforces ANSI C and disables all compiler extensions. I suspect that this is done purely so that a vendor can claim ANSI compliance, even though this feature is practically useless. I have also observed that vendors who strongly emphasize their ANSI compliance often produce inferior code (perhaps because the compiler has a generic front end that is shared among multiple targets) when compared to vendors that emphasize their performance and language extensions.

Enough about the ANSI standard. Let’s now discuss specific actions that can be taken to make your code run efficiently on an 8-bit microcontroller. The most important, by far, is the choice of data types.

Data types

Knowledge of the size of the underlying data types, together with careful data type selection, is essential for writing efficient code on eight-bit machines. Furthermore, understanding how the compiler handles expressions involving your data types can make a considerable difference in your coding decisions. These topics are discussed in the following paragraphs.

Data type size

In the embedded world, knowing the underlying representation of the various data types is usually essential. I have seen many discussions on this topic, none of which has been particularly satisfactory or portable. My preferred solution is to include a file, <types.h>, an excerpt from which appears below:

#ifndef TYPES_H
#define TYPES_H
#include <limits.h>/* Assign a compiler-specific data type to BOOLEAN */
#ifdef _C51_
typedef bit BOOLEAN
#define FALSE 0
#define TRUE 1
#else
typedef enum {FALSE=0, TRUE=1} BOOLEAN;
#endif

/* Assign an 8-bit signed type to CHAR */
#if (SCHAR_MAX == 127)
typedef char CHAR;
#elif (SCHAR_MAX == 255)
/* Implies that by default chars are unsigned */
typedef signed char CHAR;
#else
/* No eight bit data types */
#error Warning! Intrinsic data type char is not eight bits
#endif

/* Rest of the file goes here */
#endif

The concept is quite simple. The file types.h includes the ANSI-required file limits.h. It then explicitly tests each of the predefined data types for the smallest type that matches signed and unsigned 1-, 8-, 16-, and 32-bit variables. The result is that my data type UCHAR is guaranteed to be an 8-bit unsigned variable, INT is guaranteed to be a 16-bit signed variable, and so forth. In this manner, the following data types are defined: BOOLEAN, CHAR, UCHAR, INT, UINT, LONG, and ULONG.

Several points are worth making:

  • The definition of the BOOLEAN data type is difficult. Many 8-bit processors directly support single-bit data types, and I wish to take advantage of this if possible. Unfortunately, since ANSI is silent on this topic, it’s necessary to use compiler-specific code
  • Some compilers define a char as an unsigned quantity, such that if a signed 8-bit variable is required, one has to use the unusual declaration signed char
  • Note the use of the #error directive to force a compile error if I can’t achieve my goal of having unambiguous definitions of BOOLEAN, UCHAR, CHAR, UINT, INT, ULONG, and LONG

In all of the following examples, the types BOOLEAN, UCHAR, and so on will be used to specify unambiguously the size of the variable being used.

Data type selection

There are two basic guidelines for data type selection on 8-bit processors:

  • Use the smallest possible type to get the job done
  • Use an unsigned type whenever possible

The reasons for this are simply that many 8-bit processors have no direct support for manipulating anything more complicated than an unsigned 8-bit value. However, unlike large machines, eight-bitters often provide direct support for manipulation of bits. Thus, the fastest integer types to use on an 8-bit CPU are BOOLEAN and UCHAR. Consider the typical C code:

int is_positive(int a)
{
(a>=0) ? return(1) : return (0);
}

The better implementation is:

BOOLEAN is_positive(int a)
{
(a>=0) ? return(TRUE) : return (FALSE);
}

On an 8-bit processor we can get a large performance boost by using the BOOLEAN return type because the compiler need only return a bit (typically via the carry flag), vs. a 16-bit value stored in registers. The code is also more readable.

Let’s take a look at a second example. Consider the following code fragment that is littered throughout most C programs:

int j;

for (j = 0; j < 10; j++)
{

}

This fragment produces horribly inefficient code on an 8051. A better way to code this for 8-bit CPUs is as follows:

UCHAR j;

for (j = 0; j < 10; j++)
{

}

The result is a huge boost in performance because we are now using an 8-bit unsigned variable (that can be manipulated directly) vs. a signed 16-bit quantity that will typically be handled by a library call. Note also that there is generally no penalty for coding this way on most big CPUs (with the exception of some RISC processors). Furthermore, a strong case exists for doing this on all machines. Those of you who know Pascal are aware that when declaring an integer variable, it’s possible, and normally desirable, to specify the allowable range that the integer can take on. For example:

type loopindex = 0..9;
var j loopindex;

Upon rereading the code later, you’ll have additional information concerning the intended use of the variable. For our classical C code above, the variable int j may take on values of at least –32768 to +32767. For the case in which we have UCHAR j, we inform others that this variable is intended to have strictly positive values over a restricted range. Thus, this simple change manages to combine tighter code with improved maintainability—not a bad combination.

Enumerated types

The use of enumerated data types was a welcome addition to ANSI C. Unfortunately, the ANSI standard calls for the underlying data type of an enum to be an int. Thus, on many compilers, declaration of an enumerated type forces the compiler to generate 16-bit signed code, which, as I’ve mentioned, is extremely inefficient on an 8-bit CPU. This is unfortunate, especially as I have never seen an enumerated type list go over a few dozen elements; it could usually easily be fit in a UCHAR. To overcome this limitation, several options exist, none of which is palatable:

  • Check your compiler documentation, which may show you how to specify via a (compiler-specific) command line switch that enumerated types be put into the smallest possible data type
  • Accept the inefficiency as an acceptable trade-off for readability
  • Dispense with enumerated types and resort to lists of manifest constants

Integer promotion

The integer promotion rules of ANSI C are probably the most heinous crime committed against those of us who labor in the 8-bit world. I have no doubt that the standard is quite detailed in this area. However, the two most important rules in practice are the following:

  • Any expression involving integral types smaller than an int have all the variables automatically promoted to int
  • Any function call that passes an integral type smaller than an int automatically promotes the variable to an int, if the function is not prototyped

The key word here is automatically. Unless you take explicit steps, the compiler is unlikely to do what you want. Consider the following code fragment:

CHAR a,b,res;

res = a+b;

The compiler will promote a and b to integers, perform a 16-bit addition, and then assign the lower eight bits of the result to res. Several ways around this problem exist. First, many compiler vendors have seen the light, and allow you to disable the ANSI automatic integer promotion rules. However, you’re then stuck with compiler-dependant code.

Alternatively, you can resort to very clumsy casting, and hope that the compiler’s optimizer works out what you really want to do. The extent of the casting required seems to vary among compiler vendors. As a result, I tend to go overboard:

res = (CHAR)((CHAR)a + (CHAR)b);

With complex expressions, the result can be hideous.

More integer promotion rules

A third integer promotion rule that is often overlooked concerns expressions that contain both signed and unsigned integers. In this case, signed integers are promoted to unsigned integers. Although this makes sense, it can present problems in our 8-bit environment, where the unsigned integer rules. For example:

void demo(void)
{
UINT a = 6;
INT b = -20;(a+b > 6) ?
puts(“More than 6”) :
puts(“Less than or equal to 6”);
}

If you run this program, you may be surprised to find that the output is “More than 6.” This problem is a very subtle one, and is even more difficult to detect when you use enumerated data types or other defined data types that evaluate to a signed integer data type. Using the result of a function call in an expression is also problematic.

The good news is that in the embedded world, the percentage of integral data types that must be signed is quite low, thus the potential number of expressions in which mixed types occur is also low. The time to be cautious is when reusing code that was written by someone who didn’t believe in unsigned data types.

Floating-point types

Floating-point arithmetic is required in many applications. However, since we’re normally dealing with real-world data whose representation rarely goes beyond 16 bits (a 20-bit A/D converter on an 8-bit machine is rare), the requirements for double-precision arithmetic are tenuous, except in the strangest of circumstances.

Again, the ANSI people have handicapped us by requiring that any floating-point expression be promoted to double before execution. Fortunately, a lot of compiler vendors have done the sensible thing, and simply defined doubles to be the same as floats, so that this promotion is benign. Be warned, however, that many reputable vendors have made a virtue out of providing a genuine double-precision data type. The result is that unless you take great care, you may end up computing values with ridiculous levels of precision, and paying the price computationally. If you’re considering a compiler that offers double-precision math, study the documentation carefully to ensure that there is some way of disabling the automatic promotion of float to dobuble. If there isn’t, look for another compiler.

While we’re on this topic, I’d like to air a pet peeve of mine. Years ago, before decent compiler support for 8-bit processors was available, I would code in assembly language using a bespoke floating-point library. This library was always implemented using 24-bit floats, with a long float consuming four bytes. I found that this was more than adequate for the real world. I’ve yet to find a compiler vendor that offers this as an option. My guess is that the marketing people insisted on a true ANSI floating-point library, the real world be damned. As a result, I can calculate hyperbolic sines on my 68HC11, but I can’t get the performance boost that comes from using just a 24-bit float.

Having moaned about the ANSI-induced problems, let’s turn to an area in which ANSI has helped a lot. I’m referring to the keywords const and volatile, which, together with static, allow the production of better code.

C’s static keyword

The keywords static, volatile, and const together allow one to write not only better code (in the sense of information hiding and so forth) but also tighter code.

Static variables

When applied to variables, static has two primary functions. The first and most common use is to declare a variable that doesn’t disappear between successive invocations of a function. For example:

void func(void) { static UCHAR state = 0; switch (state) { … } }

In this case, the use of static is mandatory for the code to work.

The second use of static is to limit the scope of a variable. A variable that is declared static at the module level is accessible by all functions in the module, but by no one else. This is important because it allows us to gain all the performance benefits of global variables, while severely limiting the well-known problems of globals. As a result, if I have a data structure which must be accessed frequently by a number of functions, I’ll put all of the functions into the same module and declare the structure static. Then all of the functions that need to can access the data without going through the overhead of an access function, while at the same time, code that has no business knowing about the data structure is prevented from accessing it. This technique is an admission that directly accessible variables are essential to gaining adequate performance on small machines.

A few other potential benefits can result from declaring module level variables static (as opposed to leaving them global). Static variables, by definition, may only be accessed by a specific set of functions. Consequently, the compiler and linker are able to make sensible choices concerning the placement of the variables in memory. For instance, with static variables, the compiler/linker may choose to place all of the static variables in a module in contiguous locations, thus increasing the chances of various optimizations, such as pointers being simply incremented or decremented instead of being reloaded. In contrast, global variables are often placed in memory locations that are designed to optimize the compiler’s hashing algorithms, thus eliminating potential optimizations.

Static functions

A static function is only callable by other functions within its module. While the use of static functions is good structured programming practice, you may also be surprised to learn that static functions can result in smaller and/or faster code. This is possible because the compiler knows at compile time exactly what functions can call a given static function. Therefore, the relative memory locations of functions can be adjusted such that the static functions may be called using a short version of the call or jump instruction. For instance, the 8051 supports both an ACALL and an LCALL op code. ACALL is a two-byte instruction, and is limited to a 2K address block. LCALL is a three-byte instruction that can access the full 8051 address space. Thus, use of static functions gives the compiler the opportunity to use an ACALL where otherwise it might use an LCALL.

The potential improvements are even better, in which the compiler is smart enough to replace calls with jumps. For example:

void fa(void) { … fb(); } static void fb(void) { … }

In this case, because function fb() is called on the last line of function fa(), the compiler can replace the call with a jump. Since fb() is static, and the compiler knows its exact distance from fa(), the compiler can use the shortest jump instruction. For the Dallas DS80C320, this is an SJMP instruction (two bytes, three cycles) vs. an LCALL (three bytes, four cycles).

On a recent project, rigorous application of the static modifier to functions resulted in about a 1% reduction in code size. When your ROM is 95% full, a 1% reduction is most welcome!

A final point concerning static variables and debugging: for reasons that I do not fully understand, with many in-circuit emulators that support source-level debug, static variables and/or automatic variables in static functions are not always accessible symbolically. As a result, I tend to use the following construct in my project-wide include file:

#ifndef NDEBUG #define STATIC #else #define STATIC static #endif

I then use STATIC instead of static to define static variables, so that while in debug mode, I can guarantee symbolic access to the variables.

C’s volatile keyword

A volatile variable is one whose value may be changed outside the normal program flow. In embedded systems, the two main ways that this can happen is either via an interrupt service routine, or as a consequence of hardware action (for instance, a serial port status register updates as a result of a character being received via the serial port). Most programmers are aware that the compiler will not attempt to optimize a volatile register, but rather will reload it every time. The case to watch out for is when compiler vendors offer extensions for accessing absolute memory locations, such as hardware registers. Sometimes these extensions have either an implicit or an explicit declaration of volatility and sometimes they don’t. The point is to fully understand what the compiler is doing. If you do not, you may end up accessing a volatile variable when you don’t want to and vice versa. For example, the popular 8051 compiler from Keil offers two ways of accessing a specific memory location. The first uses a language extension, _at_, to specify where a variable should be located. The second method uses a macro such as XBYTE[] to dereference a pointer. The “volatility” of these two is different. For example:

UCHAR status_register _at_ 0xE000;

This method is simply a much more convenient way of accessing a specific memory location. However, volatile is not implied here. Thus, the following code is unlikely to work:

while (status_register); /* Wait for status register to clear */

Instead, one needs to use the following declaration:

volatile UCHAR status_register _at_ 0xE000;

The second method that Keil offers is the use of macros, such as the XBYTE macro, as in:

status_register = XBYTE[0xE000];

Here, however, examination of the XBYTE macro shows that volatile is assumed:

#define XBYTE ((unsigned char volatile xdata*) 0)

(The xdata is a memory space qualifier, which isn’t relevant to the discussion here and may be ignored.)

Thus, the code:

while (status_register); /* Wait for status register to clear */

will work as you would expect in this case. However, in the case in which you wish to access a variable at a specific location that is not volatile, the use of the XBYTE macro is potentially inefficient.

C’s const keyword

The keyword const, which is by the way the most badly named keyword in the C language, does not mean “constant”! Rather, it means “read only”. In embedded systems, there is a huge difference, which will become clear.

Many texts recommend that instead of using manifest constants, one should use a const variable. For instance:

const UCHAR nos_atod_channels = 8;

instead of

#define NOS_ATOD_CHANNELS 8

The rationale for this approach is that inside a debugger, you can examine a const variable (since it should appear in the symbol table), whereas a manifest constant isn’t accessible. Unfortunately, on many eight-bit machines you’ll pay a significant price for this benefit. The two primary costs are:

  • The compiler creates a genuine variable in RAM to hold the variable. On RAM-limited systems, this can be a significant penalty
  • Some compilers, recognizing that the variable is const, will store the variable in ROM. However, the variable is still treated as a variable and is accessed as such, typically using some form of indexed addressing. Compared to immediate addressing, this method is normally much slower

I recommend that you eschew the use of const variables on 8-bit micros, except in the following circumstances.

const function parameters

Declaring function parameters const whenever possible not only makes for better, safer code, but also has the potential for generating tighter code. This is best illustrated by an example:

void output_string(CHAR *cp) { while (*cp) putchar(*cp++); } void demo(void) { char *str = “Hello, world”; output_string(str); if (‘H’ == str[0]) { some_function(); } }

In this case, there is no guarantee that output_string() will not modify our original string, str. As a result, the compiler is forced to perform the test in demo(). If instead, output_string is correctly declared as follows:

void output_string(const char *cp) { while (*cp) putchar(*cp++); }

then the compiler knows that output_string() cannot modify the original string str, and as a result it can dispense with the test and invoke some_function() unconditionally. Thus, I strongly recommend liberal use of the const modifier on function parameters.

const volatile variables

We now come to an esoteric topic. Can a variable be both const and volatile, and if so, what does that mean and how might you use it? The answer is, of course, yes (why else would it have been asked?), and it should be used on any memory location that can change unexpectedly (hence the need for the volatile qualifier) and that is read-only (hence the const). The most obvious example of this is a hardware status register. Thus, returning to the status_register example above, a better declaration for our status register is:

const volatile UCHAR status_register _at_ 0xE000;

Typed data pointers

We now come to another area in which a major trade-off exists between writing portable code and writing efficient code—namely the use of typed data pointers , which are pointers that are constrained in some way with respect to the type and/or size of memory that they can access. For example, those of you who have programmed the x86 architecture are undoubtedly familiar with the concept of using the __near and __far modifiers on pointers. These are examples of typed data pointers. Often the modifier is implied, based on the memory model being used. Sometimes the modifier is mandatory, such as in the prototype of an interrupt handler:

void __interrupt __far cntr_int7();

The requirement for the near and far modifiers comes about from the segmented x86 architecture. In the embedded eight-bit world, the situation is often far more complex. Microcontrollers typically require typed data pointers because they offer a number of disparate memory spaces, each of which may require the use of different addressing modes. The worst offender is the 8051 family, with at least five different memory spaces. However, even the 68HC11 has at least two different memory spaces (zero page and everything else), together with the EEPROM, pointers to which typically require an address space modifier.

The most obvious characteristic of typed data pointers is their inherent lack of portability. They also tend to lead to some horrific data declarations. For example, consider the following declaration from the Whitesmiths 68HC11 compiler:

@dir INT * @dir zpage_ptr_to_zero_page;

This declares a pointer to an INT. However, both the pointer and its object reside in the zero page (as indicated by the Whitesmith extension, @dir). If you were to add a const qualifier or two, such as:

@dir const INT * @dir const constant_zpage_ptr_to_constant_zero_page_data;

then the declarations can quickly become quite intimidating. Consequently, you may be tempted to simply ignore the use of typed pointers. Indeed, coding an application on a 68HC11 without ever using a typed data pointer is quite possible. However, by doing so the application’s performance will take an enormous hit because the zero page offers considerably faster access than the rest of memory.

This area is so critical to performance that all hope of portability is lost. For example, consider two leading 8051 compiler vendors, Keil and Tasking. Keil supports a three-byte generic pointer that may be used to point to any of the 8051 address spaces, together with typed data pointers that are strictly limited to a specific data space. Keil strongly recommends the use of typed data pointers, but doesn’t require it. By contrast, Tasking takes the attitude that generic pointers are so horribly inefficient that it mandates the use of typed pointers (an argument to which I am extremely sympathetic).

To get a feel for the magnitude of the difference, consider the following code, intended for use on an 8051:

void main(void) { UCHAR array[16]; /* array is in the data space by default */ UCHAR data * ptr = array; /* Note use of data qualifier */ UCHAR i; for (i = 0; i < 16; i++) *ptr++ = i; }

Using a generic pointer, this code requires 571 cycles and 88 bytes. Using a typed data pointer, it needs just 196 cycles and 52 bytes. (The memory sizes include the startup code, and the execution times are just those for executing main()).

With these sorts of performance increases, I recommend always using explicitly typed pointers, and paying the price in loss of portability and readability.

Implementing an assert() macro

The assert() macro is commonly used on PC platforms, but almost never used on small embedded systems. There are several reasons for this:

  • Many reputable compiler vendors don’t bother to supply an assert macro
  • Vendors that do supply the macro often provide it in an almost useless form
  • Most embedded systems don’t support a stderr to which the error may be printed

These limitations notwithstanding, it’s possible to gain the benefits of the assert() macro on even the smallest systems if you’re prepared to take a pragmatic approach.

Before I discuss possible implementations, mentioning why assert() is important (even in embedded systems) is worthwhile. Over the years, I’ve built up a library of drivers to various pieces of hardware such as LCDs, ADCs, and so on. These drivers typically require various parameters to be passed to them. For example, an LCD driver that displays a text string on a panel would expect the row, the column, a pointer to the string, and perhaps an attribute parameter. When writing the driver, it is obviously important that the passed parameters are correct. One way of ensuring this is to include code such as this:

void Lcd_Write_Str(UCHAR row, UCHAR column, CHAR *str, UCHAR attr) { row &= MAX_ROW; column &= MAX_COLUMN; attr &= ALLOWABLE_ATTRIBUTES; if (NULL == str) return; /* The real work of the driver goes here */ }

This code clips the parameters to allowable ranges, checks for a null pointer assignment, and so on. However, on a functioning system, executing this code every time the driver is invoked is extremely costly. But if the code is discarded, reuse of the driver in another project becomes a lot more difficult because errors in the driver invocation are tougher to detect.

The preferred solution is the liberal use of an assert macro. For example:

void Lcd_Write_Str(UCHAR row, UCHAR column, CHAR *str, UCHAR attr) { assert (row < MAX_ROW); assert (column < MAX_COLUMN); assert (attr < ALLOWABLE_ATTRIBUTES); assert (str != NULL); /* The real work of the driver goes here */ }

This is a practical approach if you’re prepared to redefine the assert macro. The level of resources in your system will control the sophistication of this macro, as shown in the examples below.

Assert 1

This example assumes that you have no spare RAM, no spare port pins, and virtually no ROM to spare. In this case, assert.h becomes:

#ifndef assert_h #define assert_h #ifndef NDEBUG #define assert(expr) \ if (expr) {\ while (1);\ } #else #define assert(expr) #endif #endif

Here, if the assertion fails, we simply enter an infinite loop. The only utility of this case is that, assuming you’re running a debug session on an ICE, you will eventually notice that the system is no longer running. In which case, breaking the emulator and examining the program counter will give you a good indication of which assertion failed. As a possible refinement, if your system is interrupt-driven, inserting a “disable all interrupts” command prior to the while(1) may be necessary, just to ensure that the system’s failure is obvious.

Assert 2

This case is the same as assert #1, except that in example #2 you have a spare port pin on the microcontroller to which an error LED is attached. This LED is lit if an error occurs, thus giving you instant feedback that an assertion has failed. Assert.h now becomes:

#ifndef assert_h #define assert_h #define ERROR_LED_ON() /* Put expression for turning LED on here */ #define INTERRUPTS_OFF() /* Put expression for interrupts off here */ #ifndef NDEBUG #define assert(expr) \ if (expr) {\ ERROR_LED_ON();\ INTERRUPTS_OFF();\ while (1);\ } #else #define assert(expr) #endif #endif

Assert 3

This example builds on assert #2. But in this case, we have sufficient RAM to define an error message buffer, into which the assert macro can sprintf() the exact failure. While debugging on an ICE, if a permanent watch point is associated with this buffer, then breaking the ICE will give you instant information on where the failure occurred. Assert.h for this case becomes:

#ifndef assert_h #define assert_h #define ERROR_LED_ON() /* Put expression for turning LED on here */ #define INTERRUPTS_OFF()/* Put expression for interrupts off here */ #ifndef NDEBUG extern char error_buf[80]; #define assert(expr) \ if (expr) {\ ERROR_LED_ON();\ INTERRUPTS_OFF();\ sprintf(error_buf,”Assert failed: “ #expr “ (file %s line %d)\n”, __FILE__, (int) __LINE__ );\ while (1);\ } #else #define assert(expr) #endif #endif

Obviously, this requires that you define error_buffer[80] somewhere else in your code.

I don’t expect that these three examples will cover everyone’s needs. Rather, I hope they give you some ideas on how to create your own assert macros to get the maximum debugging information within the constraints of your embedded system.

Heretical comments

So far, all of my suggestions have been about actively doing things to improve the code quality. Now, let’s address those areas of the C language that should be avoided, except in highly unusual circumstances. For some of you, the suggestions that follow will border on heresy.

Recursion

Recursion is a wonderful technique that solves certain problems in an elegant manner. It has no place on an eight-bit microcontroller. The reasons for this are quite simple:

  • Recursion relies on a stack-based approach to passing variables. Many small machines have no hardware support for a stack. Consequently, either the compiler will simply refuse to support reentrancy, or else it will resort to a software stack in order to solve the problem, resulting in dreadful code quality
  • Recursion relies on a “virtual stack” that purportedly has no real memory constraints. How many small machines can realistically support virtual memory?

If you find yourself using recursion on a small machine, I respectfully suggest that you are either (a) doing something really weird, or (b) you don’t understand the sum total of the constraints with which you’re working. If it is the former, then please contact me, as I will be fascinated to see what you are doing.

Variable length argument lists

You should avoid variable length argument lists because they too rely on a stack-based approach to passing variables. What about sprintf() and its cousins, you all cry? Well, if possible, you should consider avoiding the use of these library functions. The reasons for this are as follows:

  • If you use sprintf(), take a look at the linker output and see how much library code it pulls in. On one of my compilers, sprintf(), without floating-point support, consumes about 1K. If you’re using a masked micro with a code space of 8K, this penalty is huge
  • On some compilers, use of sprintf() implies the use of a floating-point library, even if you never use the library. Consequently, the code penalty quickly becomes enormous
  • If the compiler doesn’t support a stack, but rather passes variables in registers or fixed memory locations, then use of variable length argument functions forces the compiler to reserve a healthy block of memory simply to provide space for variables that you may decide to use. For instance, if your compiler vendor assumes that the maximum number of arguments you can pass is 10, then the compiler will reserve 40 bytes (assuming four bytes per longest intrinsic data type)

Fortunately, many vendors are aware of these issues and have taken steps to mitigate the effects of using sprintf(). Notwithstanding these actions, taking a close look at your code is still worthwhile. For instance, writing my own wrstr() and wrint() functions (to ouput strings and ints respectively) generated half the code of using sprintf. Thus, if all you need to format are strings and base 10 integers, then the roll-your-own approach is beneficial (while still being portable).

Dynamic memory allocation

When you’re programming an application for a PC, using dynamic memory allocation makes sense. The characteristics of PCs that permit and/or require dynamic memory allocation include:

  • When writing an application, you may not know how much memory will be available. Dynamic allocation provides a way of gracefully handling this problem
  • The PC has an operating system, which provides memory allocation services
  • The PC has a user interface, such that if an application runs out of memory, it can at least tell the user and attempt a relatively graceful shutdown

In contrast, small embedded systems typically have none of these characteristics. Therefore, I think that the use of dynamic memory allocation on these targets is silly. First, the amount of memory available is fixed, and is typically known at design time. Thus static allocation of all the required and/or available memory may be done at compile time.

Second, the execution time overhead of malloc(), free(), and so on is not only quite high, but also variable, depending on the degree of memory fragmentation.

Third, use of malloc(), free(), and so on consumes valuable EPROM space. And lastly, dynamic memory allocation is fraught with danger (witness the series from P.J. Plauger on garbage collection in the January 1998, March 1998, and April 1998 issues of Embedded Systems Programming).

Consequently, I strongly recommend that you not use dynamic memory allocation on small systems.

Final thoughts

I have attempted to illustrate how judicious use of both ANSI constructs and compiler-specific constructs can help generate tighter code on small microcontrollers. Often, though, these improvements come at the expense of portability and/or readability. If you are in the fortunate position of being able to use less efficient code, then you can ignore these suggestions. If, however, you are severely resource-constrained, then give a few of these techniques a try. I think you’ll be pleasantly surprised.

You may also want to read the following:


This article was published in the November 1998 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Efficient C Code for Eight-Bit MCUs” Embedded Systems Programming, November 1998.

Use Strings to Internationalize C Programs

December 17th, 2009 by Nigel Jones

Also available in PDF version.

Products destined for use in multiple countries often require user interfaces that support several human languages. Sloppy string management in your programs could result in unintelligible babble.

A decade or two ago, most embedded systems had an extremely limited user interface. In most cases, the interface was either non-existent, or it consisted of a few LEDs and the odd jumper or push button. As the cost of display technology has plummeted, alphanumeric user interfaces have become increasingly common. Simultaneously, a variety of technological, economic, and political pressures have brought about the need for products to be sold in many countries. As a result, the need for an embedded system to support multiple languages has become apparent.

This problem is even more acute for the non-embedded computer world. Part of the solution for that marketplace was the introduction of Unicode, wide character types, and so on. Unfortunately, these techniques require storage capabilities and display resolutions rarely found in embedded systems. Instead, most embedded systems with displays typically use a low-resolution LCD or vacuum fluorescent display (VFD) with a built-in character generator. It’s this type of display that I’ll be concentrating on.

Lessons learned

A few years ago, I worked on an industrial measurement system. The product was to be sold in both North America and Europe. Consequently, one of the essential design requirements was to support multiple European languages. The product in question has a 240 x 128-pixel LCD panel with a built-in character generator. The character generator contains a subset of the ASCII 256-character set, including “specialized” characters such as c, u, and e.

Anyway, as I pondered possible approaches to the design, I looked back at my previous attempts to solve this problem. They weren’t pretty! At the risk of ruining what little reputation I may have gained, I think it’s instructive to look at these previous attempts.

Lesson 1

The first product that I designed that had an alphanumeric display was implemented with the typical arrogance of a native English speaker. Namely, it never occurred to me to even consider the rest of the world. As a result, my code was littered with text strings. That is, you’d see the assembly language (yes, it was that long ago) equivalent of:

WrStr("Jet Black");

To make matters worse, this construct would be found in many functions, split between several files.

The foolishness of this approach struck home when I was asked to produce a version of the product for the German market. I quickly realized that I had to edit source code files to implement the translation. This had several ramifications:

  • A separate make environment was needed for each language.
  • Every time a change had to be made to the code, the same modification had to be made to each language’s version of the file. In short, it was a maintenance nightmare.
  • The source code had to be given to the translator. I think you can imagine the problems this caused.

Lesson 2

The next product was a big improvement, because I did what should have been done in the first place, which was to place all the text strings into one file. That is, there were alternative string files called english.c, german.c, and so on for additional languages, each containing all of the strings for a particular language. Each string file contained a single array definition that looked something like this:

char const * const strings[] =
{
    "Jet Black",
    "Red",
    ......
};

Thus, to display a particular string, my code looked like this:

WrStr(strings[0]);

Now all I had to do to enable support for a new language was to hand a copy of english.c to the translator, and have him produce the equivalent strings for the new language. Unfortunately, it didn’t work out that way. It turns out that the English language is extraordinarily terse when compared with certain other languages. For example, the German equivalent of Jet Black is Rabenschwarz.

Working from english.c, the translator assumed there were just nine characters into which to fit the translation. Thus, the translator was forced into abbreviating the German. However, in many cases, there was actually more space available on the display such that the abbreviation looked awkward in the product. The only way to find out was to execute the code and look at the results. This is a lot harder than it sounds, because many strings are only displayed in exceptional conditions. Thus one has to generate all the external stimuli such that those exceptions occur. In short, the translation effort remained a Herculean task.

Once it was complete, I still wasn’t out of the woods, because the different length strings caused the memory allocation to change significantly. Although it did not happen, theoretically I could have run out of memory.

Lesson 3

By the time I was working on my third product requiring multiple language support, I was a lot wiser and memory capacities had increased dramatically. As a result, I now ensured that every string in my string file was padded with spaces out to the maximum allowed field width. Furthermore, I had also learned the intricacies of conditional compilation and passing command line arguments to make, such that I included every language into the same text file. Thus, strings.c looked something like Listing 1.

#if defined(ENGLISH)
    char const * const strings[] =
    {
        "Jet Black ",
        ...
        "Good Bye ",
        ...
        "Evening"
    };
#elif defined(GERMAN)
    char const * const strings[] =
    {
        "Rabenschwarz",
        ...
        "Auf Wiedersehen ",
        ...
        "Abend "
    };
#endif

Listing 1. Multiple languages in a single C module

This third solution worked well, except that at the same time, the size of the alphanumeric displays and the complexity of the user interface had increased. While my first product had just 30 or 40 strings, this latest product had around 500. Thus, the bulk of the user interface code ended up looking like this:

WrStr(strings[27]);
WrStr(strings[47]);
WrStr(strings[108]);

This code doesn’t make clear what string I was actually displaying. So I was beginning to long for the original:

WrStr("Jet Black");

I ran into another major problem at this time. As the product evolved, so did the strings. I found myself wanting to delete certain strings that were no longer needed. But I couldn’t do that without destroying my indexing scheme. Thus, I was forced into changing unwanted text into null strings, such that strings.c now contained sequences like this:

char const * const strings[] = {
    "Jet Black  ",
    ...
    "", /* deleted */
    ...
    "Evening"
};

Although this saved the space consumed by the string, I was still wasting ROM on the pointer. In addition, it looked ugly and had “kludge” written all over it. I also ran into a more serious problem. From a maintenance perspective, it would be very useful if related strings were in contiguous locations. Thus if a particular field could contain “Jet Black,” “Red,” or “Pale Grey,” I would place these together in the string file. However, two years later, when marketing asked for “Yellow” to be added to the list of selections, I was forced to place “Yellow” at the end of the string list, well away from its partners. This pained me greatly.

There was one final problem with this implementation (and all the others) and that was the fact that the strings array was a global. I’ve become quite paranoid about globals in the last few years, such that when I look back at the code now, I have to confess that I cringe.

I also discovered a neat feature that was missing from all of the previous disasters. A few years ago, I saw a product demonstration in which the language was changeable on the fly. That is, without changing the operating mode or cycling power, the entire user interface could be changed to another language. The demonstration was incredibly slick. (Consider the following scenario. Your product is being introduced at a trade show. Some native French speakers come to the booth to look at the product. With the push of a button, you switch the user interface to French. You’re already halfway to a sale.)

In addition to its value as a sales tool, the ability to change language on the fly is also a valuable tool for validating a new translation. It’s particularly useful when the correct translation of a word depends heavily upon its context. When working through the string file, the translator can’t see the context, so having the ability to operate the product and switch back and forth between languages is invaluable.

An international approach

Being quite a few years wiser than when Lesson #3 was learned, I was determined to come up with a scheme that would address as many of the aforementioned problems as possible and add the ability to switch languages on the fly. What follows is my solution. It’s not perfect-but it is a lot better than any of the previous attempts.

The first decision I made was to separate the string retrieval mechanism and the string storage technique. There would be no more global strings array. Instead, strings would be accessed through a function call. This access function would take a string number as an argument and return a pointer to the desired string. Its provisional prototype is:

char const * strGet(int string_no);

This abstraction gave me freedom to implement the data storage part of the strings in whatever manner I saw fit. In particular, I realized that implementing the storage as an array of data structures would have considerable benefit. My data structure looked like this:

#define N_LANG4

typedef struct
{
    /*
     * Maximum length
     */
    int const len; 

    /*
     * Array of pointers to language-specific string
     */
    char const * const text[N_LANG]; 

} STRING;

This arrangement offered some serious benefits. First, the maximum allowed string length for the field is stored with the text. Second, the original string and all of the various translations of it are together in one place. This makes life a lot easier for the translator. The downside is, of course, that this uses a lot more ROM than a preprocessor-based selection. In my case, I had the ROM to spare. With this data structure in hand, the strings array now looked like Listing 2.

static const STRING strings[] =
{
    {
        15,
        {
            "Jet Black ", 		/* English */
            "Rabenschwarz	", 	/* German */
            ...
        }
    },
    {
        15,
        {
            "Red ",			/* English */
            "Rot ",			/* German */
            ...
        }
    },
    ...
};

Listing 2. A better string storage structure

The access function also had another valuable property. When the application requested a string, the access function could interrogate the relevant database to find out the current language, and return the requisite pointer. Voila! Language-independent code. The access function looks something like this:

char const * strGet(int str_no)
{
    return strings[str_no].text[GetLanguage()];
}

Further improvements

The previous approach certainly solved a few of my problems. However, it did nothing for the problems of indecipherable string numbers or adding and deleting strings in the middle of the array. I needed a means of referring to the strings in a meaningful manner, together with a way of automatically re-indexing the string table. My solution was to use an enumerated type. The members of the enumeration are given the same name as the strings to which they refer. An example should help clarify this.

Assume the first four strings that appear in the strings array are “Jet Black,” “Red,” “Pale Grey,” and “Yellow.” To display “Red,” I would have to call:

WrStr(strGet(1));

Instead, I define an enumerated list as follows:

typedef enum { JET_BLACK, RED, PALE_GREY, YELLOW, ... } STR;

I now change the prototype of strGet() to (with the changes in red):

char const * strGet(STR str_no);

Thus, to display the string “Red,” the code becomes:

WrStr(strGet(RED));

Furthermore, by defining a macro, Wrstr(X) as follows,

#define Wrstr(X) WrStr(strGet((X)))

we can write:

Wrstr(RED);

This is just as readable as the original WrStr(“Red”), but without any of the aforementioned problems. Furthermore, this technique allows one to insert or delete strings at will. For instance, I could insert “Pink” before “Red” in the strings array, do the same in the enumeration list, and recompile-and none of the code should be broken.

This was a major step forward, because I now had a system that met most of my goals

  • No global data
  • Easy to add additional languages
  • Language selection on the fly
  • Code is meaningful
  • Allows strings to be inserted and deleted at will

Gotchas of enumerated types

However, a couple of “gotchas” arise when using enumerated types in this way. The first, and most important, is portability. ANSI C requires only that the first 31 identifiers in a name be significant. If you can guarantee that all of your strings are shorter than this, there is no problem. If you cannot make that guarantee, these are some of your options:

  • Switch to a compiler that allows unlimited identifier lengths. Many compilers do have this feature.
  • Ensure that all strings are unique within the first 31 characters. Note that if they aren’t, the compiler should issue a re-declaration warning.

The next issue to watch out for occurs when you have a large number of strings. ANSI allows the compiler writer to implement enumerated types using an implementation-defined integer type. Thus, technically speaking, a compiler could limit the number of items in an enumeration to 127 (the largest positive number that can fit into an 8-bit integer). Thus, if you have rigid portability constraints, this technique may be problematic. However, practically speaking, most compilers appear to implement enumerations either as an int, or as the smallest integral type that can accommodate the enumerated list.

The third problem I ran into concerns the limited number of legal characters that can make up an identifier (that is, a-z, A-Z, 0-9, and _). For instance, it is impossible to exactly reproduce the string “3 Score & 10!”. In situations like this, I used _ wherever I couldn’t make the exact substitution. Thus, the enumerated value for “3 Score & 10!” would be _3_SCORE___10_, or possibly _3_SCORE_AND_10_. This isn’t perfect, but it’s still better than a meaningless identifier such as STRING_49.

The final issue was the absolute necessity of keeping the string file and the enumerated list synchronized. This proved to be quite difficult. To aid the process, I modified the string table slightly to include the enumerated type name in the comment field. Next, I ensured that the last entry in the enumerated list was always LAST_STR. This allowed the string array to be changed from being an incomplete declaration to a complete declaration. This means that the compiler will complain if the number of elements in the enumerated list does not exactly match the number of elements in the string array. This proved to be valuable in keeping the two files synchronized.

The winning design

The final enumerated list and string table declarations are as shown in Listing 3.

typedef enum
{
    JET_BLACK, RED, PALE_GREY, YELLOW, ..., LAST_STR
} STR;

static const STRING strings[LAST_STR] =
{
    { /* JET_BLACK */
        15,
        {
            "Jet Black ",		/* English */
            "Rabenschwarz ",	/* German */
            ...
        }
    },
    { /* RED */
        15,
        {
            "Red ",			/* English */
            "Rot ",			/* German */
            ...
        }
    },
    ...
};

Listing 3: Final string storage structure

I did all of this manually, but you could certainly develop a script to automate the process of creating the enumerated list in the header file (from the contents of the string file).

Having gone through this exercise, I realized that I could make a bit more use of enumerated lists to make my code more readable and more maintainable. When the string data structure was introduced, a manifest constant N_LANG was used to specify the number of languages supported. A better approach is as follows (with the changes in red):

typedef enum
    { ENGLISH, FRENCH, GERMAN, SPANISH, LAST_LANGUAGE }
    LANGUAGE;

Now, the STRING data structure is defined as:

typedef struct
{
    /*
     * Maximum length
     */
    int const len; 

    /*
     * Array of pointers to language-specific string
     */
    char const * const text[LAST_LANGUAGE]; 

} STRING;

This change may look minor, but it makes adding another language more intuitive.

So far, I haven’t mentioned the utility of storing the maximum string length in the STRING data structure. The use of this field arises when ROM is not plentiful, such that it is necessary to store strings without padding out to the maximum allowed field width. In cases like this, one has to be careful to clear the entire field before writing the string. This may be accomplished by using the string len parameter. If you can afford to pad all strings out to the allowed field width, it is permissible to drop the len parameter from the data structure.

Well, that’s my fourth attempt at producing an international product. After three disasters, I’m reasonably confident that this latest attempt will at least make it into the “not bad” category. If you have any refinements that you would care to share, please e-mail me. In the meantime, I’m going to ponder how to elegantly and robustly support languages such as Chinese, Arabic and Russian in embedded systems. If I manage to find a reasonable solution, I’ll let you know.


This article was published in the February 2001 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Support Multiple Languages” Embedded Systems Programming, February 2001.

Uses for C’s offsetof() Macro

December 17th, 2009 by Nigel Jones

Also available in PDF version.

C’s seldom-used offsetof() macro can actually be a helpful addition to your bag of tricks. Here are a couple of places in embedded systems where the macro is indispensable, including packing data structures and describing how EEPROM data are stored.

If you browse through an ANSI C compiler’s header files, you’ll come across a very strange looking macro in stddef.h. The macro, offsetof(), has a horrid declaration. Furthermore, if you consult the compiler manuals, you’ll find an unhelpful explanation that reads something like this:

The offsetof() macro returns the offset of the element name within the struct or union composite. This provides a portable method to determine the offset.

At this point, your eyes start to glaze over, and you move on to something that’s more understandable and useful. Indeed, this was my position until about a year ago–whence the macro’s usefulness finally dawned on me. I now kick myself for not realizing the benefits earlier—the macro could have saved me a lot of grief over the years. However, I console myself by realizing that I wasn’t alone, since I’d never seen this macro used in any embedded code. Offline and online searches confirmed that offsetof() is essentially not used. I even found compilers that had not bothered to define it.

How offsetof() works

Before delving into the three areas where I’ve found the macro useful, it’s necessary to discuss what the macro does, and how it does it.

The offsetof() macro is an ANSI-required macro that should be found in stddef.h. Simply put, the offsetof() macro returns the number of bytes of offset before a particular element of a struct or union.

The declaration of the macro varies from vendor to vendor and depends upon the processor architecture. Browsing through the compilers on my computer, I found the example declarations shown in Listing 1. As you can see, the definition of the macro can get complicated.

// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)

// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)

// Diab Coldfire compiler
#define offsetof(s,memb) \
    ((size_t)((char *)&((s *)0)->memb-(char *)0))

Listing 1. A representative set of offsetof() macro definitions

Regardless of the implementation, the offsetof() macro takes two parameters. The first parameter is the structure name; the second, the name of the structure element. (I apologize for using a term as vague as “structure name.” I’ll refine this shortly.) A straightforward use of the macro is shown in Listing 2.

typedef struct
{
    int   i;
    float f;
    char  c;

} SFOO;

void main(void)
{
    printf("Offset of 'f' is %u", offsetof(SFOO, f));
}

Listing 2. A straightforward use of offsetof()

To better understand the magic of the offsetof() macro, consider the details of Keil’s definition. The various operators within the macro are evaluated in an order such that the following steps are performed:

  • ((s *)0): takes the integer zero and casts it as a pointer to s.
  • ((s *)0)->m: dereferences that pointer to point to structure member m.
  • &(((s *)0)->m): computes the address of m.
  • (size_t)&(((s *)0)->m): casts the result to an appropriate data type.

By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure. At this point, we can make several observations:

  • We can be a bit more specific about the term “structure name.” In a nutshell, if the structure name you use, call it s, results in a valid C expression when written as (s *)0->m, you can use s in the offsetof() macro. The examples shown in Listings 3 and 4 will help clarify that point.
  • The member expression, m, can be of arbitrary complexity; indeed, if you have nested structures, then the member field can be an expression that resolves to a parameter deeply nested within a structure
  • It’s easy enough to see why this macro also works with unions
  • The macro won’t work with bitfields, as you can’t take the address of a bitfield member of a structure or union

Listings 3 and 4 contain simple variations on the usage of this macro. These should help you get you comfortable with the offsetof() basics.

struct sfoo
{
    int   i;
    float f;
    char  c;

};

void main(void)
{
    printf("Offset of 'f' is %u", offsetof(struct sfoo, f));
}

Listing 3. A struct without a typedef

typedef struct
{
    long  l;
    short s;

} SBAR;

typedef struct
{
    int   i;
    float f;
    SBAR  b;

} SFOO;

void main(void)
{
    printf("Offset of 'l' is %u", offsetof(SFOO, b.l));
}

Listing 4. Nested structs

Now that you understand the semantics of the macro, it’s time to take a look at a few use examples.

struct padding bytes

Most 16-bit and larger processors require that data structures in memory be aligned on a multibyte (for example, 16-bit or 32-bit) boundary. Sometimes the requirement is absolute, and sometimes it’s merely recommended for optimal bus throughput. In the latter case, the flexibility is offered because the designers recognized that you may wish to trade off memory access time with other competing issues such as memory size and the ability to transfer (perhaps via a communications link or direct memory access) the memory contents directly to another processor that has a different alignment requirement.

For cases such as these, it’s often necessary to resort to compiler directives to achieve the required level of packing. As the C structure declarations can be quite complex, working out how to achieve this can be daunting. Furthermore, after poring over the compiler manuals, I’m always left with a slight sense of unease about whether I’ve really achieved what I set out to do.

The most straightforward solution to this problem is to write a small piece of test code. For instance, consider the moderately complex declaration given in Listing 5.

typedef union
{
    int   i;
    float f;
    char  c;

    struct
    {
        float  g;
        double h;
    } b;

} UFOO;

void main(void)
{
    printf("Offset of 'h' is %u", offsetof(UFOO, b.h));
}

Listing 5. A union containing a struct

If you need to know where field b.h resides in the structure, then the simplest way to find out is to write some test code such as that shown in Listing 5.

This is all well and good, but what about portability? Writing code that relies on offsets into structures can be risky—particularly if the code gets ported to a new target at a later date. Adding a comment is of course a good idea—but what one really needs is a means of forcing the compiler to tell you if the critical members of a structure are in the wrong place. Fortunately, one can do this using the offsetof() macro and the technique in Listing 6.

typedef union
{
    int   i;
    float f;
    char  c;

    struct
    {
        float  g;
        double h;
    } b; 	   

} UFOO;

static union
{
    char wrong_offset_i[offsetof(UFOO, i) == 0];
    char wrong_offset_f[offsetof(UFOO, f) == 0];
    ...
    char wrong_offset_h[offsetof(UFOO, b.h) == 2]; // Error
};

Listing 6. An anonymous union to check struct offsets

The technique works by attempting to declare a union of one-char arrays. If any test evaluates to false, its array will be of zero size, and a compiler error will result. One compiler I tested generated the specific error “Invalid dimension size [0]” on the line defining array wrong_offset_h[].

Thus the offsetof() macro can be used both to determine and to validate the packing of elements within C structs.

Nonvolatile memory layouts

Many embedded systems contain some form of nonvolatile memory, which holds configuration parameters and other device-specific information. One of the most common types of nonvolatile memory is serial EEPROM. Normally, such memories are byte addressable. The result is often a serial EEPROM driver that provides an API that includes a read function that looks like this:

ee_rd(uint16_t offset, uint16_t nBytes, uint8_t * dest)

In other words, read nBytes from offset offset in the EEPROM and store them at dest. The problem is knowing what offset in EEPROM to read from and how many bytes to read (in other words, the underlying size of the variable being read). The most common solution to this problem is to declare a data structure that represents the EEPROM and then declare a pointer to that struct and assign it to address 0x0000000. This technique is shown in Listing 7.

typedef struct
{
    int   i;
    float f;
    char  c;

} EEPROM;

EEPROM * const pEE = 0x0000000;

ee_rd(&(pEE->f), sizeof(pEE->f), dest);

Listing 7. Accessing data in serial EEPROM via a pointer

This technique has been in use for years. However, I dislike it precisely because it does create an actual pointer to a variable that supposedly resides at address 0. In my experience, this can create a number of problems including:

  • Someone maintaining the code can get confused into thinking that the EEPROM data structure really does exist
  • You can write perfectly legal code (for example, pEE->f = 3.2) and get no compiler warnings that what you’re doing is disastrous
  • The code doesn’t describe the underlying hardware well

A far better approach is to use the offsetof() macro. An example is shown in Listing 8

typedef struct
{
    int   i;
    float f;
    char  c;

} EEPROM;

ee_rd(offsetof(EEPROM,f), 4, dest);

Listing 8. Use offsetof() to access data stored in serial EEPROM

However, there’s still a bit of a problem. The size of the parameter has been entered manually (in this case “4”). It would be a lot better if we could have the compiler work out the size of the parameter as well. No problem, you say, just use the sizeof() operator. However, the sizeof() operator doesn’t work the way we would like it to. That is, we cannot write sizeof(EEPROM.f) because EEPROM is a data type and not a variable.

Normally, one always has at least one instance of a data type so that this is not a problem. In our case, the data type EEPROM is nothing more than a template for describing how data are stored in the serial EEPROM. So, how can we use the sizeof() operator in this case? Well, we can simply use the same technique used to define the offsetof() macro. Consider the definition:

#define SIZEOF(s,m) ((size_t) sizeof(((s *)0)->m))

This looks a lot like the offsetof() definitions in Listing 1. We take the value 0 and cast it to “pointer to s.” This gives us a variable to point to. We then point to the member we want and apply the regular sizeof() operator. The net result is that we can get the size of any member of a typedef without having to actually declare a variable of that data type.

Thus, we can now refine our read from the serial EEPROM as follows:

ee_rd(offsetof(EEPROM, f), SIZEOF(EEPROM, f), &dest);

At this point, we’re using two macros in the function call, with both macros taking the same two parameters. This leads to an obvious refinement that cuts down on typing and errors:

#define EE_RD(M,D) \
    ee_rd(offsetof(EEPROM,M), SIZEOF(EEPROM,M), D)

Now our call to the EEPROM driver becomes much more intuitive:

EE_RD(f, &dest);

That is, read f from the EEPROM and store its contents at location dest. The location and size of the parameter is handled automatically by the compiler, resulting in a clean, robust interface.

Protect nonvolatile memory

Many embedded systems contain directly addressable nonvolatile memory, such as battery-backed SRAM. It’s usually important to detect if the contents of this memory have been corrupted. I usually group the data into a structure, compute a CRC (cyclic redundancy check) over that structure, and append it to the data structure. Thus, I often end up with something like this:

struct nv
{
    short    param_1;
    float    param_2;
    char     param_3;
    uint16_t crc;

} nvram;

The intent of the crc field is that it contain a CRC of all the parameters in the data structure with the exception of itself. This seems reasonable enough. Thus, the question is, how does one compute the CRC? If we assume we have a function, crc16( ), that computes the CRC-16 over an array of bytes, then we might be tempted to use the following:

nvram.crc =
    crc16((char *) &nvram, sizeof(nvram)-sizeof(nvram.crc));

This code will only definitely work with compilers that pack all data on byte boundaries. For compilers that don’t do this, the code will almost certainly fail. To see that this is the case, let’s look at this example structure for a compiler that aligns everything on a 32-bit boundary. The effective structure could look like that in Listing 9.

struct nv
{
    short    param_1; 	// offset = 0
    char     pad1[2]; 	// 2 byte pad
    float    param_2; 	// offset = 4
    char     param_3; 	// offset = 8
    char     pad2[3]; 	// 3 byte pad
    uint16_t crc; 	 // offset = 12
    char     pad3[2]; 	// 2 byte pad

} nvram;

Listing 9. An example struct for a compiler that aligns everything on a 32-bit boundary

The first two pads are expected. However, why is the compiler adding two bytes onto the end of the structure? It does this because it has to handle the case when you declare an array of such structures. Arrays are required to be contiguous in memory, too. So to meet this requirement and to maintain alignment, the compiler pads the structure out as shown.

On this basis, we can see that the sizeof(nvram) is 16 bytes. Now our naive code in Listing 9 computes the CRC over sizeof(nvram) – sizeof(nvram.crc) bytes = 16 – 2 = 14 bytes. Thus the stored crc fieldnow includes its previous value in its computation! We certainly haven’t achieved what we set out to do.

struct nv
{
    struct data
    {
        short param_1; 	// offset = 0
        float param_2; 	// offset = 4
        char  param_3; 	// offset = 8

    } data;

    uint 16_t crc;	 // offset = 12

} nvram;

Listing 10. Nested data structures

The most common solution to this problem is to nest data structures as shown in Listing 10. Now we can compute the CRC using:

nvram.crc =
    crc16((uint8_t *) &nvram.data, sizeof(nvram.data));

This works well and is indeed the technique I’ve used over the years. However, it introduces an extra level of nesting within the structure—purely to overcome an artifact of the compiler. Another alternative is to place the CRC at the top of the structure. This overcomes most of the problems but feels unnatural to many people. On the basis that structures should always reflect the underlying system, a technique that doesn’t rely on artifice is preferable—and that technique is to use the offsetof() macro.

Using the offsetof() macro, one can simply use the following (assuming the original structure definition):

nvram.crc =
    crc16((uint8_t *) &nvram, offsetof(struct nv, crc));

Keep looking

I’ve provided a few examples where the offsetof() macro can improve your code. I’m sure that I’ll find further uses over the next few years. If you’ve found additional uses for the macro I would be interested to hear about them.


This article was published in the March 2004 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Learn a new trick with the offsetof() macro” Embedded Systems Programming, March 2004.