Archive for the ‘Publications’ Category

RS-485 Transmit Enable Signal Control

Thursday, December 17th, 2009 Nigel Jones

Also available in PDF version.

Quite a few embedded systems include multiple processors. Sometimes these processors stand in isolation, but more often they’re required to communicate over a multidrop bus such as EIA RS-485 or RS-422.

The bus wiring for a typical RS-485 or RS-422 implementation is shown in Figure 1.

Typical RS-485 or RS-422 Wiring

Figure 1. Typical RS-485 or RS-422 bus wiring

With this architecture, each station on the network takes turns talking typically by employing some form of token passing scheme, whereby the owner of the token has the right to talk. If the owner of the token doesn’t wish to talk, then it simply passes the token to the next station. In this article I’ll explain why it’s so difficult to ensure that only one processor talks at a time, and I’ll give you a number of possible solutions.

Multidrop transmit enable problem

A typical microcontroller will have some form of UART driver. The transmitter side of the driver’s job is to take characters out of a buffer and feed them to the transmitter of the UART. In its crudest form, this driver is polled, with the code looking something like this:

void tx(char *buf, int len)
{
    int j; 

    for (j = 0; j < len; j++)
    {
        while (STATUS_REG & TX_FULL);
          /* Wait for transmitter to empty */
        TX_REG = *buf++;
    }
}

This approach works fine with point-to-point protocols such as RS-232. However, when we move to multidropped systems, the problem is more complex because we must now also control the transmit enable line. The first time most people come up against this problem, they typically modify the above code to look like this:

void tx(char *buf, int len)
{
    int j;

    TX_ENABLE();

    for (j = 0; j < len; j++)
    {
        while (STATUS_REG & TX_FULL);
          /* Wait for transmitter to empty */
        TX_REG = *buf++;
    }

    TX_DISABLE();
}

It doesn’t normally take too much testing before they discover that the last character (or more) isn’t being transmitted. After scrupulously checking the buffer indexing code and so on, the poor software programmer eventually realizes that the transmitter is being turned off too early. That is, the last character is being shifted out of the UART but is never making it past the RS-485 transceiver because by that time the transceiver has been turned off. The reason that this occurs is that the status register on most UARTs tells you when you’re free to write another byte to the transmitter. This isn’t the same as saying that the transmitter has finished shifting out all previous bytes. Even simple UARTs (such as those found on the 8051, 68HC11, and the like) are double buffered, while sophisticated UARTs may have transmitter

FIFOs that can buffer up to 128 bytes. Thus, depending on the size of your transmitter’s FIFO and your communications speed, it may be many milliseconds between writing the last byte to the transmitter and it being safe to disable the transmitter. This gap presents a big problem! Further thought shows that the corollary to this problem is equally bad. That is, failure to turn the transmitter off as soon as the last bit has been transmitted leads to the possibility of line contention. For example, consider the typical token-passing bus system I mentioned in the introduction. The owner of the token has just passed the token to the next station, which has been patiently waiting for its turn to talk. As soon as it receives the token, the recipient may start transmitting (on well-designed, high-performance systems, the delay from token receipt to start of transmission is well under the transmission time for a single character). If the passer of the token has not disabled its transmitter in time, then the network will crash and life will be generally miserable for everyone.

Suggested solutions

Thus, to state the problem succinctly, we need to enable the transmitter just prior to transmission and disable it as soon as possible after the last bit of the last byte has been transmitted. A number of ways of doing this exist, varying from the downright crude to the fairly elegant. Here are your options.

Option 1: Delay loop

The crudest of all possible solutions is the simple software delay loop. A first pass at this code would look like this:

void tx(char * buf, int len)
{
    int j; 

    TX_ENABLE(); 

    for (j = 0; j < len; j++)
    {
        while(STATUS_REG & TX_FULL);
          /* Wait for tx fifo to have space */
        TX_REG = *buf++;
    }

    delay(TX_FIFO_LEN_PLUS_ONE_CHAR_TIMES);
    TX_DISABLE();
}

The concept is simple enough. After putting the last byte into the transmit FIFO, one delays the appropriate amount of time so that the byte has time to be shifted out. Incidentally, don’t assume that this approach is necessarily unacceptably inefficient. If your baud rate is high (say, 500Kbps), then your character time is only 20?s. Twiddling your thumbs for a few tens of microseconds may be perfectly acceptable. Despite the simplicity of the solution, a little thought soon reveals a subtle bug in this approach. This problem is best illustrated with an example.

Consider a double-buffered transmitter (FIFO length = 1). If the above code were required to transmit a buffer of length n, then the following would happen, assuming the transmitter is in an idle condition:

  • The first byte would pass straight from the hold register to the shift register
  • The second byte would then be immediately written to the hold register
  • The third byte would wait for the first byte to be shifted out before being placed in the hold register, and so on until the nth byte was placed in the transmit hold register
  • The code would delay two character times (the time to shift out the n – 1 byte in the shift register plus the nth byte in the hold register)
  • The transmitter would be turned off

Thus, the code appears to do exactly what we require. However, what happens if we need to transmit just one byte, such as an ACK or a NAK to a previously received command? In this case, the single byte would be passed straight through to the shift register, and we would end up delaying for one character time too long. As I mentioned previously, this is a bad thing because the next person to talk on the link may be already trying to talk, leading to line contention. Thus, if your communications protocol ever calls for the transmission of message blocks shorter than the transmit FIFO length plus one, you’ll need to modify the above code to take this into account.

Option 2: System call

If you’re using an RTOS, it almost certainly provides some sort of wait() or sleep() system call that allows the calling task to be suspended for the specified interval. Thus, the above code might be modified to replace the

delay(TX_FIFO_LEN_PLUS_ONE_CHAR_TIMES);

with

wait(TX_FIFO_LEN_PLUS_ONE_CHAR_TIMES);

This approach is acceptable because it dispenses with the inefficiency of the software delay. But you need to be aware of exactly how your RTOS works-particularly with the granularity of timer intervals, whether this task has a high enough priority, and how long the RTOS takes to perform a task switch. Get these wrong, and the time interval you end up waiting could be much longer than you requested, such that line contention again becomes a problem.

Option 3: Hardware timer

If you have a spare timer, kicking off the timer with the appropriate value is a method that avoids both tying up the CPU in a delay loop, and having to rely on the uncertainties of a wait() call. The code would look something like this:

void tx(char *buf, int len)
{
    int j;

    TX_ENABLE();

    for (j = 0; j < len; j++)
    {
        while (STATUS_REG & TX_FULL);
          /* Wait for tx fifo to have space */
        TX_REG = *buf++;
    }
    timer_enable(TX_FIFO_LEN_PLUS_ONE_CHAR_TIMES);
}

void timer_enable(uint delay)
{
    /* Program timer & start it running */
}

/* Timer ISR */
void timer_interrupt(void)
{
    TIMER_DISABLE();
    TX_DISABLE();
}

That is, the transmitter is turned off in the timer ISR. This approach works well. Its main drawbacks are the consumption of a timer and the fact that the code for controlling the transmitter is split between two disparate functions, making the code harder to maintain. Again, you have to make sure that you call the timer with the appropriate delay interval when the buffer length is less than the transmit FIFO length plus one.

Option 4: Transmission padding

This method is crude and has flaws, but can work if you’re stuck with a hardware/software architecture that precludes other options. The concept is simple. Take the message to be transmitted and append to it as many pad characters as the depth of your UART’s transmit FIFO plus one. That is, in the case where the transmitter is just double buffered (one byte FIFO), you add two innocuous characters to the end of your message. You can then use the first code example directly. The result looks something like this:

void demo()
{
    char buf[] = { 'h','e','l','p','','' }; 

    tx(buf, sizeof(buf));
} 

void tx(char *buf, int len)
{
    int j;

    TX_ENABLE(); 

    for (j = 0; j < len; j++)
    {
        while(STATUS_REG & TX_FULL);
         /* Wait for transmitter to empty */
        TX_REG = *buf++;
    }

    TX_DISABLE();
}

In this case, we add two nul characters onto the string “help.” When the “p” of “help” is moved into the transmit shift register, the first nul character is moved by the software into the transmit hold register of the UART. Once the “p” has been shifted out, the first nul character is moved into the shift register, while the second nul character is moved into the hold register. At this point, the for loop will terminate, and we disable the transmitter. It doesn’t take much thought to realize that the start bit and maybe even the first bit of the first nul byte might make it out onto the network before the transmitter is disabled. Whether this matters is system dependent. Note also that if your code is interrupt driven (which is the normal case), the time from the first nul byte being moved into the shift register to the transmitter being disabled will be the worst-case transmit interrupt latency of your system. This latency needs to be much shorter than the transmission bit time for this approach to be effective.

Option 5: “Sophisticated” UART

Most full-featured UARTs, such as the Exar ST16C550, now include a flag in the status register to indicate that the transmit shift register is empty. In this case, our example polled driver code now looks like this:

void tx(char *buf, int len)
{
    int j;

    TX_ENABLE();

    for (j = 0; j < len; j++)
    {
        while(STATUS_REG & TX_FULL);
          /* Wait for tx fifo to have space */
        TX_REG = *buf++;
    }

    while (STATUS_REG & TX_BUSY);
       /* Wait for shift register to empty */
    TX_DISABLE();
}

Although this looks like a panacea, you may end up bitterly disappointed with such a UART. A couple of months ago, I was working on a system that used this UART in an RS-485 multidropped system. The 16C550 has nice 16-byte TX and RX FIFOs, and all my messages were less than 16 bytes. Furthermore, the system was to be interrupt driven, as my CPU had a lot to do, and the baud rate was low (9600 baud and fixed by the other equipment). I had given the data sheet a cursory look merely to confirm that the chip did support a flag indicating when the shift register was empty. Thus, I was all set to easily control my RS-485 transmitter, with minimal overhead, when I discovered the gotcha. The 16C550 does not interrupt upon the shift register emptying. Thus, to use the line idle feature, one has to poll-completely defeating the benefit of the transmit FIFO-the transmit interrupts and so on. How this “feature” made it into the marketplace is beyond me. I also suspect that I’m not the first person to point out this shortcoming, since in newer UARTs, such as the XR16C850, the designers have implemented a complete RS-485 mode. In this case, the UART provides a control line that is asserted whenever it’s transmitting data. This control line may be connected to the transmit enable line on the RS-485 transceiver, and the problem is solved.

In my case, however, I was stuck with the 16C550, so I had to fall back on the best software method for controlling the transmitter, namely the loop-back mode.

Option 6: Loop-back

This method is by far the most elegant and robust. It relies on your receiving the data that you transmit and using the received data to decide when to disable transmission. However, to use this approach, your RS-485 transceiver must be wired correctly. The typical RS-485 transceiver is based on the 75176. This chip’s block diagram is shown in Figure 2.

RS-485 Transceiver Block Diagram

Figure 2. RS-485 transceiver block diagram

Note that the chip consists of a differential transmitter (“T”) and a differential receiver (“R”), each with its own enable line. The output of the transmitter is internally connected to the input of the receiver (a feature that we will exploit shortly). Careful examination of the enable lines shows that the transmit enable line is active high, while the receive enable is active low. This leads to one of two common hardware configurations. The first (Figure 3) shows the receiver and transmitter enable lines tied together, such that if the receiver is enabled, the transmitter is disabled and vice versa, while the second (Figure 4) shows the receiver permanently enabled.

Figures 3 and 4

Unfortunately, the arrangement shown in Figure 3 is the most common. To use the loop-back method, the 75176 must be wired such that its receiver is permanently enabled, as in Figure 4. (An acceptable alternative to the arrangement in Figure 4 is where the receiver enable line is under direct software control.) If you do not have Figure 4’s hardware arrangement, you must resort to one of the methods described earlier.

The loop-back method is at its most robust when all messages over the network start and end with unique characters (although variations on this theme are possible when this isn’t the case). To illustrate the approach, consider a communications protocol that starts all its messages with an STX character, ends them with ETX, and uses only printable characters in between. In this case, we set up the UART driver to be interrupt driven, and in the receiver’s interrupt service routine we unconditionally turn off the transmitter whenever we receive an . The code looks something like this (minus all the buffer handling, interrupt turning on and off, and so on):

void tx_interrupt()
{
    // Determine if a byte is to be transmitted
    // If so, force the transmitter on
    TX_ENABLE();

    // Write the byte
    TX_REG = tx_buf[op++];

    // Rest of code goes here
}

void rx_interrupt()
{
    uchar res = RX_REG; 	// Read byte
    if (res == ETX) 		// Is it ETX?
    TX_DISABLE();    // Force transmitter off

    //Store byte in buffer etc
}

This approach offers two very clear benefits:

  • You are guaranteed not to turn the transmitter off too early (since reception of the terminating character requires that it had been transmitted out of the transceiver)
  • The latency from transmitting the last bit to turning off the transmitter is your own receiver interrupt latency, which is normally far less than the time it takes the next station to decide to start talking; thus the probability of line contention is virtually zero

The biggest drawback to this approach is the fact that you’re now burdening yourself with listening to your own transmission. If this condition is unacceptable, the code can be refined such that your UART’s receiver is only enabled immediately prior to transmission of the byte. As an alternative, many microprocessors offer a nine-bit transmission mode, where the ninth bit can be used as a wake up bit. In this case, simply ensure that the and bytes have their wake up bits set.

Final thoughts

The bottom line of the discussion is this: in the unlikely event that your hardware designer asks your opinion on a proposed design, you should beg, plead, and grovel to ensure that the receiver enable line is not disabled when you’re transmitting. If the hardware designer demurs, citing power consumption issues or other real world trivia give that designer a copy of this article and ask him or her to write the driver. Then sit back and wait.


This article was published in the August 1999 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Controlling the Transmit Enable line on RS-485 Transceivers” Embedded Systems Programming, August 1999.

Jump Tables via Function Pointer Arrays in C/C++

Thursday, December 17th, 2009 Nigel Jones

Also available in PDF version.

Jump tables, also called branch tables, are an efficient means of handling similar events in software. Here’s a look at the use of arrays of function pointers in C/C++ as jump tables.

Examination of assembly language code that has been crafted by an expert will usually reveal extensive use of function “branch tables.” Branch tables (a.k.a., jump tables) are used because they offer a unique blend of compactness and execution speed, particularly on microprocessors that support indexed addressing. When one examines typical C/C++ code, however, the branch table (i.e., an array of funtion pointers) is a much rarer beast. The purpose of this article is to examine why branch tables are not used by C/C++ programmers and to make the case for their extensive use. Real world examples of their use are included.

Function pointers

In talking to C/C++ programmers about this topic, three reasons are usually cited for not using function pointers. They are:

  • They are dangerous
  • A good optimizing compiler will generate a jump table from a switch statement, so let the compiler do the work
  • They are too difficult to code and maintain

Are function pointers dangerous?

This school of thought comes about, because code that indexes into a table and then calls a function based on the index has the capability to end up just about anywhere. For instance, consider the following code fragment:

void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

void test(const INT jump_index)
{
    /* Call the function specified by jump_index */
    pf[jump_index]();
}

The above code declares pf[] to be an array of pointers to functions, each of which takes no arguments and returns void. The test() function simply calls the specified function via the array. As it stands, this code is dangerous for the following reasons.

  • pf[] is accessible by anyone
  • In test(), there is no bounds checking, such that an erroneous jump_index would spell disaster

A much better way to code this that avoids these problems is as follows

void test(uint8_t const ump_index)
{
    static void (*pf[])(void) = {fna, fnb, fnc, …, fnz};

    if (jump_index < sizeof(pf) / sizeof(*pf))
    {
        /* Call the function specified by jump_index */
        pf[jump_index]();
    }
}

The changes are subtle, yet important.

  • By declaring the array static within the function, no one else can access the jump table
  • Forcing jump_index to be an unsigned quantity means that we only need to perform a one sided test for our bounds checking
  • Setting jump_index to the smallest data type possible that will meet the requirements provides a little more protection (most jump tables are smaller than 256 entries)
  • An explicit test is performed prior to making the call, thus ensuring that only valid function calls are made. (For performance critical applications, the if() statement could be replaced by an assert())

This approach to the use of a jump table is just as secure as an explicit switch statement, thus the idea that jump tables are dangerous may be rejected.

Leave it to the optimizer?

It is well known that many C compilers will attempt to convert a switch statement into a jump table. Thus, rather than use a function pointer array, many programmers prefer to use a switch statement of the form:

void test(uint8_t j const ump_index)
{
    switch (jump_index)
    {
      case 0:
        fna();
        break;

      case 1:
        fnb();
        break;

        …

      case 26:
        fnz();
        break;

      default:
        break;
    }
}

Indeed, Jack Crenshaw advocated this approach in a September 1998 column in Embedded Systems Programming. Well, I have never found myself disagreeing with Dr. Crenshaw before, but there is always a first time for everything! A quick survey of the documentation for a number of compilers revealed some interesting variations. They all claimed to potentially perform conversion of a switch statement into a jump table. However, the criteria for doing so varied considerably. One vendor simply said that they would attempt to perform this optimization. A second claimed to use a heuristic algorithm to decide which was “better,” while a third permitted pragma’s to let the user specify what they wanted. This sort of variation does not give one a warm fuzzy feeling!

In the case where one has, say, 26 contiguous indices, each associated with a single function call (such as the example above), the compiler will almost certainly generate a jump table. However, what about the case where you have 26 non-contiguous indices, that vary in value from 0 to 1000? A jump table would have 974 null entries or 1948 “wasted” bytes on the average microcontroller. Most compilers would deem this too high a penalty to pay, and would eschew the jump table for an if-else sequence. However, if you have EPROM to burn, it actually costs nothing to implement this as a jump table, but buys you consistent (and fast) execution time. By coding this as a jump table, you ensure that the compiler does what you want.

There is a further problem with large switch statements. Once a switch statement gets much beyond a screen length, it becomes harder to see the big picture, and thus the code is more difficult to maintain. A function pointer array declaration, adequately commented to explain the declaration, is much more compact, allowing one to see the overall picture. Furthermore, the function pointer array is potentially more robust. Who has not written a large switch statement and forgotten to add a break statement on one of the cases?

Complexities

Complexity associated with jump table declaration and use is the real reason they are not used more often. In embedded systems, where pointers normally have mandatory memory space qualifiers, the declarations can quickly become horrific. For instance, the example above would be highly undesirable on most embedded systems, since the pf[] array would probably end up being stored in RAM, instead of ROM. The way to ensure the array is stored in ROM varies somewhat between compiler vendors. However, a first step that is portable to all systems is to add const qualifiers to the declaration. Thus, our array declaration now becomes:

static void (* const pf[])(void) = {fna, fnb, fnc, …, fnz};

Like many users, I find these declarations cryptic and very daunting. However, over the years, I have built up a library of declaration templates that I simply refer to as necessary. A compilation of useful templates appears below.

A handy trick is to learn to read complex declarations like this backwards–i.e., from right to left. Doing this here’s how I’d read the above: pf is an array of constant pointers to functions that return void. The static keyword is only needed if this is declared privately within the function that uses it–and thus keeping it off the stack.

Arrays of function pointers

Most books about C programming cover function pointers in less than a page (while devoting entire chapters to simple looping constructs). The descriptions typically say something to the effect that you can take the address of a function, and thus one can define a pointer to a function, and the syntax looks like such and such. At which point, most readers are left staring at a complex declaration, and wondering what exactly function pointers are good for. Small wonder that function pointers do not feature heavily in their work.

Well then, where are jump tables useful? In general, arrays of function pointers are useful whenever there is the potential for a range of inputs to a program that subsequently alters program flow. Some typical examples from the embedded world are given below.

Keypads

The most often cited example for uses of function pointers is with keypads. The general idea is obvious. A keypad is normally arranged to produce a unique keycode. Based on the value of the key pressed, some action is taken. This can be handled via a switch statement. However, an array of function pointers is far more elegant. This is particularly true when the application has multiple user screens, with some key definitions changing from screen to screen (i.e., the system uses soft keys). In this case, a two dimensional array of function pointers is often used.

#define N_SCREENS  16
#define N_KEYS     6

/* Prototypes for functions that appear in the jump table */
INT fnUp(void);
INT fnDown(void);
…
INT fnMenu(void);
INT fnNull(void);

INT keypress(uint8_t key, uint8_t screen)
{
    static INT (* const pf[N_SCREENS][N_KEYS])(void) =
    {
        {fnUp, fnDown, fnNull, …, fnMenu},
	{fnMenu, fnNull, …, fnHome},
	…
	{fnF0, fnF1, …, fnF5}
    };

    assert (key < N_KEYS);
    assert( screen < N_SCREENS);

    /* Call the function and return result */
    return (*pf[screen][key])();
}

/* Dummy function - used as an array filler */
INT fnNull(void)
{
    return 0;
}

There are several points to note about the above example:

  • All functions to be named in a jump table should be prototyped. Prototyping is the best line of defense against including a function that expects the wrong parameters, or returns the wrong type.
  • As for earlier examples, the function table is declared within the function that makes use of it (and, thus, static)
  • The array is made const signifying that we wish it to remain unchanged
  • The indices into the array are unsigned, such that only single sided bounds checking need be done
  • In this case, I have chosen to use the assert() macro to provide the bounds checking. This is a good compromise between debugging ease and runtime efficiency.
  • A dummy function fnNull() has been declared. This is used where a keypress is undefined. Rather than explicitly testing to see whether a key is valid, the dummy function is invoked. This is usually the most efficient method of handling an function array that is only partially populated.
  • The functions that are called need not be unique. For example, a function such as fnMenu may appear many times in the same jump table.

Communication protocols

Although the keypad example is easy to appreciate, my experience in embedded systems is that communication links occur far more often than keypads. Communication protocols are a challenge ripe for a branch table solution. This is best illustrated by an example.

Last year, I worked on the design for an interface box to a very large industrial power supply. This interface box had to accept commands and return parameter values over a RS-232 link. The communications used a set of simple ASCII mnemonics to specify the action to be taken. The mnemonics consisted of a channel number (0,1, or 2), followed by a two character parameter. The code to handle a read request coming in over the serial link is shown below. The function process_read() is called with a pointer to a string fragment that is expected to consist of the three characters (null terminated) containing the required command.

const CHAR *fna(void);	// Example function prototype

static void process_read(const CHAR *buf)
{
    CHAR *cmdptr;
    UCHAR offset;
    const CHAR *replyptr;

    static const CHAR read_str[] =
	"0SV 0SN 0MO 0WF 0MT 0MP 0SW 1SP 1VO 1CC 1CA 1CB
	 1ST 1MF 1CL 1SZ 1SS 1AZ 1AS 1BZ 1BS 1VZ 1VS 1MZ
	 1MS 2SP 2VO 2CC 2CA 2CB 2ST 2MF 2CL 2SZ 2SS
	 2AZ 2AS 2BZ 2BS 2VZ 2VS 2MZ 2MS ";	

    static const CHAR *
        (* const readfns[sizeof(read_str)/4])(void) =
        {
	    fna,fnb,fnc, …
        };

    cmdptr = strstr(read_str, buf);

    if (cmdptr != NULL)
    {
	/*
         * cmdptr points to the valid command, so compute offset,
	 * in order to get entry into function jump table
         */
	offset = (cmdptr - read_str) / 4;  

	/* Call function and get pointer to reply*/
	replyptr = (*readfns[offset])();

	/* rest of the code goes here */
    }
}

The code above is quite straightforward. A constant string, read_str, is defined. The read_str contains the list of all legal mnemonic combinations. Note the use of added spaces to aid clarity. Next, we have the array of function pointers, one pointer for each valid command. We determine if we have a valid command sequence by making use of the standard library function strstr(). If a match is found, it returns a pointer to the matching substring, else it returns NULL. We check for a valid pointer, compute the offset into the string, and then use the offset to call the appropriate handler function in the jump table. Thus, in four lines of code, we have determined if the command is valid and called the appropriate function. Although the declaration of readfns[] is complex, the simplicity of the runtime code is tough to beat.

Timed task list

A third area where function pointers are useful is in timed task lists. In this case, the input to the system is the passage of time. Many projects cannot justify the use of an RTOS. Instead, all that is required is that a number of tasks run at predetermined intervals. This is very simply handled as shown below.

typedef struct
{
   UCHAR interval;      /* How often to call the task */
   void (*proc)(void);	/* pointer to function returning void */

} TIMED_TASK;

static const TIMED_TASK timed_task[] =
{
    { INTERVAL_16_MSEC,  fnA },
    { INTERVAL_50_MSEC,  fnB },
    { INTERVAL_500_MSEC, fnC },
    …
    { 0, NULL }
};

extern volatile UCHAR tick;

void main(void)
{
    const TIMED_TASK *ptr;
    UCHAR time;

    /* Initialization code goes here. Then enter the main loop */

    while (1)
    {
	if (tick)
        {
            /* Check timed task list */
            tick--;
            time = computeElapsedTime(tick);
            for (ptr = timed_task; ptr->interval !=0; ptr++)
            {
                if (!(time % ptr->interval))
                {
                    /* Time to call the function */
		    (ptr->proc)();
                }
            }
	}
    }
}

In this case, we define our own data type (TIMED_TASK) that consists simply of an interval and a pointer to a function. We then define an array of TIMED_TASK, and initialize it with the list of functions that are to be called and their calling interval. In main(), we have the start up code which must enable a periodic timer interrupt that increments the volatile variable tick at a fixed interval. We then enter the infinite loop.

The infinite loop checks for a non-zero tick value, decrements the tick variable and computes the elapsed time since the program started running. The code then simply steps through each of the tasks, to see whether it is time for that one to be executed and, if so, calls it via the function pointer.

If your application only consists of two or three tasks, then this approach is probably overkill. However, if your project has a large number of timed tasks, or it is likely that you will have to add tasks in the future, then this approach is rather palatable. Note that adding tasks and/or changing intervals simply requires editing of the timed_task[] array. No code, per se, has to be changed.

Interrupt vector tables

The fourth application of function jump tables is the array of interrupt vectors. On most processors, the interrupt vectors are in contiguous locations, with each vector representing a pointer to an interrupt service routine function. Depending upon the compiler, the work may be done for you implicitly, or you may be forced to generate the function table. In the latter case, implementing the vectors via a switch statement will not work!

Here is the vector table from the industrial power supply project mentioned above. This project was implemented using a Whitesmiths’ compiler and a 68HC11 microncontroller.

IMPORT VOID _stext();  /* 68HC11-specific startup routine */

static VOID (* const _vectab[])() =
{
    SCI_Interrupt,	/* SCI              */
    badSPI_Interrupt,	/* SPI              */
    badPAI_Interrupt,	/* Pulse acc input  */
    badPAO_Interrupt, 	/* Pulse acc overf  */
    badTO_Interrupt,	/* Timer overf      */
    badOC5_Interrupt,	/* Output compare 5 */
    badOC4_Interrupt,	/* Output compare 4 */
    badOC3_Interrupt, 	/* Output compare 3 */
    badOC2_Interrupt,	/* Output compare 2 */
    badOC1_Interrupt,	/* Output compare 1 */
    badIC3_Interrupt,	/* Input capture 3  */
    badIC2_Interrupt,	/* Input capture 2  */
    badIC1_Interrupt,	/* Input capture 1  */
    RTI_Interrupt,	/* Real time        */
    Uart_Interrupt,	/* IRQ              */
    PFI_Interrupt,	/* XIRQ             */
    badSWI_Interrupt,	/* SWI              */
    IlOpC_Interrupt,	/* illegal          */
    _stext,		/* cop fail         */
    _stext,		/* cop clock fail   */
    _stext,		/* RESET            */
};

A couple of points are worth making:

  • The above is insufficient to locate the table correctly in memory. This has to be done via linker directives.
  • Note that unused interrupts still have an entry in the table. Doing so ensures that the table is correctly aligned and traps can be placed on unexpected interrupts.

If any of these examples has whet your appetite for using arrays of function pointers, but you are still uncomfortable with the declaration complexity, then fear not! You will find a variety of declarations, ranging from the straightforward to the downright appalling below. The examples are all reasonably practical in the sense that the desired functionality is not outlandish (that is, there are no declarations for arrays of pointers to functions that take pointers to arrays of function pointers and so on).

Declaration and use hints

All of the examples below adhere to conventions that I have found to be useful over the years, specifically:

1. All of the examples are preceded by static. This is done on the assumption that the scope of a function table should be highly localized, ideally within an enclosing function.

2. In every example the array pf[] is also preceded with const. This declares that the pointers in the array cannot be modified after initialization. This is the normal (and safe) usage scenario.

3. There are two syntactically different ways of invoking a function via a pointer. If we have a function pointer with the declaration:

void (*fnptr)(int);	/* fnptr is a function pointer */

Then it may be invoked using either of these methods:

fnptr(3);	/* Method 1 of invoking the function */
(*fnptr)(3);	/* Method 2 of invoking the function */

The advantage of the first method is an uncluttered syntax. However, it makes it look as if fnptr is a function, as opposed to being a function pointer. Someone maintaining the code may end up searching in vain for the function fnptr(). With method 2, it is much clearer that we are dereferencing a pointer. However, when the declarations get complex, the added (*) can be a significant burden. Throughout the examples, each syntax is shown. In practice, the latter syntax seems to be more popular–and you should use only one.

4. In every example, the syntax for using a typedef is also given. It is quite permissible to use a typedef to define a complex declaration, and then use the new type like a simple type. If we stay with the example above, then an alternative declaration is:

typedef void (*PFV_I )(int);

/* Declare a PVFV_I typed variable and init it */
PFV_I fnptr = fna;

/* Call fna with parameter 3 using method 1 */
fnptr(3);	

/* Call fna with parameter 3 using method 2 */
(*fnptr)(3);

The typedef declares the type PFV_I to be a pointer to a function that returns void and is passed an integer. We then simply declare fnptr to a variable of this type, and use it. Typedefs are very good when you regularly use a certain function pointer type, since it saves you having to remember and type in the declaration. The downside of using a typedef, is the fact that it is not obvious that the variable that has been declared is a pointer to a function. Thus, just as for the two invocation methods above, you can gain syntactical simplicity by hiding the underlying functionality.

In the typedefs, a consistent naming convention is used. Every type starts with PF (Pointer to Function) and is then followed with the return type, followed by an underscore, the first parameter type, underscore, second parameter type and so on. For void, boolean, char, int, long, float and double, the characters V, B, C, I, L, S, D are used. (Note the use of S(ingle) for float, to avoid confusion with F(unction)). For a pointer to a data type, the type is preceded with P. Thus PL is a pointer to a long. If a parameter is const, then a c appears in the appropriate place. Thus, cPL is a const pointer to a long, whereas a PcL is a pointer to a const long, and cPcL is a const pointer to a const long. For volatile qualifiers, v is used. For unsigned types, a u precedes the base type. For user defined data types, you are on your own!

An extreme example: PFcPcI_uI_PvuC. This is a pointer to a function that returns a const pointer to a const Integer that is passed an unsigned integer and a pointer to a volatile unsigned char.

Function pointer templates

The first eleven examples are generic in the sense that they do not use memory space qualifiers and hence may be used on any target. Example 12 shows how to add memory space qualifiers, such that all the components of the declaration end up in the correct memory spaces.

Example 1

pf[] is a static array of pointers to functions that take an INT as an argument and return void.

void fna(INT);	// Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_I)(INT);
static PFV_I pf[] = {fna,fnb,fnc, … fnz);

// Direct declaration
static void (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](a);	// Calling method 1
(*pf[jump_index])(a);	// Calling method 2

Example 2

pf [] is a static array of pointers to functions that take a pointer to an INT as an argument and return void.

void fna(INT *);	// Example prototype of a function to be called

// Declaration using typedef
typedef void (* const PFV_PI)(INT *);
static PVF_PI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static void (* const pf[])(INT *) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
pf[jump_index](&a);	// Calling method 1
(*pf[jump_index])(&a);	// Calling method 2

Example 3

pf [] is a static array of pointers to functions that take an INT as an argument and return a CHAR

CHAR fna(INT); 	// Example prototype of a function to be called

// Declaration using typedef
typedef CHAR (* const PFC_I)(INT);
static PVC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR res;
res = pf[jump_index](a);	// Calling method 1
res = (*pf[jump_index])(a);	// Calling method 2

Example 4

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a CHAR.

CHAR *fna(INT);	// Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * (* const PFPC_I)(INT);
static PVPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * res;
res = pf[jump_index](a); 	// Calling method 1
res = (*pf[jump_index])(a);	// Calling method 2

Example 5

pf [] is a static array of pointers to functions that take an INT as an argument and return a pointer to a const CHAR (i.e. the pointer may be modified, but what it points to may not).

const CHAR *fna(INT); 	// Example prototype of a function to be called

// Declaration using typedef
typedef const CHAR * (* const PFPcC_I)(INT);
static PVPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR * res;
res = pf[jump_index](a);		//Calling method 2
res = (*pf[jump_index])(a);	//Calling method 2

Example 6

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a CHAR (i.e. the pointer may not be modified, but what it points to may be modified).

CHAR * const fna(INT i);  // Example prototype of a function to be called

// Declaration using typedef
typedef CHAR * const (* const PFcPC_I)(INT);
static PVcPC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
CHAR * const res = pf[jump_index](a);	//Calling method 1
CHAR * const res = (*pf[jump_index])(a);	//Calling method 2

Example 7

pf [] is a static array of pointers to functions that take an INT as an argument and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT i);  // Example function prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_I)(INT);
static PVcPcC_I[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
const CHAR* const res = pf[jump_index](a); 	// Calling method 1
const CHAR* const res = (*pf[jump_index])(a); 	// Calling method 2

Example 8

pf [] is a static array of pointers to functions that take a pointer to a const INT as an argument (i.e. the pointer may be modified, but what it points to may not) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(const INT *i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_PcI)(const INT *);
static PVcPcC_PcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT *) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *aptr;
aptr = &a;
const CHAR* const res = pf[jump_index](aptr);	//Calling method 1
const CHAR* const res = (*pf[jump_index])(aptr);//Calling method 2

Example 9

pf [] is a static array of pointers to functions that take a const pointer to an INT as an argument (i.e. the pointer may not be modified, but what it points to may ) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(INT *const i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPI)(INT * const);
static PVcPcC_cPI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(INT * const) = {fna, fnb, fnc, … fnz};

// Example use
INT a = 6;
INT *const aptr = &a;
const CHAR* const res = pf[jump_index](aptr);		//Method 1
const CHAR* const res = (*pf[jump_index])(aptr);		//Method 2

Example 10

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a const CHAR (i.e. the pointer, nor what it points to may be modified)

const CHAR * const fna(const INT *const i);	// Example prototype

// Declaration using typedef
typedef const CHAR * const (* const PFcPcC_cPcI)(const INT * const);
static PVcPcC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static const CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT *const aptr = &a;

const CHAR* const res = pf[jump_index](aptr);		// Method 1
const CHAR* const res = (*pf[jump_index])(aptr);		// Method 2

This example manages to combine five incidences of const and one of static into a single declaration. For all of its complexity, however, this is not an artificial example. You could go ahead and remove all the const and static declarations and the code would still work. It would, however, be a lot less safe, and potentially less efficient.

Just to break up the monotony, here is the same declaration, but with a twist.

Example 11

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a volatile CHAR (i.e. the pointer may not be modified, but what it points to may change unexpectedly)

volatile CHAR * const fna(const INT *const i);	// Example prototype

// Declaration using typedef
typedef volatile CHAR * const (* const PFcPvC_cPcI)(const INT * const);
static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

// Direct declaration
static volatile CHAR * const (* const pf[])(const INT * const) = {fna, fnb, fnc, … fnz};

// Example use
const INT a = 6;
const INT * const aptr = &a;

volatile CHAR * const res = pf[jump_index](aptr);	// Method 1
volatile CHAR * const res = (*pf[jump_index])(aptr);	//Method 2

while (*res)
	;	//Wait for volatile register to clear

With memory space qualifiers, things can get even more hairy. For most vendors, the memory space qualifier is treated syntactically as a type qualifier (such as const or volatile) and thus follows the same placement rules. For consistency, I place type qualifiers to the left of the “thing” being qualified. Where there are multiple type qualifiers, alphabetic ordering is used. Since memory space qualifiers are typically compiler extensions, they are normally preceded by an underscore, and hence come first alphabetically. Thus, a nasty declaration may look like this:

_ram const volatile UCHAR status_register;

To demonstrate memory space qualifier use, here is example 11 again, except this time memory space qualifiers have been added. The qualifiers are named _m1 … _m5.

Example 12

pf [] is a static array of pointers to functions that take a const pointer to a const INT as an argument (i.e. the pointer nor what it points to may be modified) and return a const pointer to a volatile CHAR (i.e. the pointer may be modified, but what it points to may change unexpectedly). Each element of the declaration lies in a different memory space. In this particular case, it is assumed that you can even declare the memory space in which parameters passed by value appear. This is extreme, but is justified on pedagogical grounds.

/* An example prototype. This declaration reads as follows.
 * Function fna is passed a const pointer in _m5 space that points to a
 * const integer in _m4 space. It returns a const pointer in _m2 space to
 * a volatile character in _m1 space.
 */
_m1 volatile CHAR * _m2 const fna(_m4 const INT * _m5 const i);

/* Declaration using typedef. This declaration reads as follows.
 * PFcPvC_cPcI is a pointer to function data type, variables based
 * upon which lie in _m3 space. Each Function is passed a const
 * pointer in _m5 space that points to a const integer in _m4 space.
 * It returns a const pointer in _m2 space to a volatile character
 * in _m1 space.
 */
typedef _m1 volatile CHAR * _m2 const (* _m3 const PFcPvC_cPcI) (_m4 const INT * _m5 const);

static PVcPvC_cPcI[] = {fna,fnb,fnc, … fnz};

/* Direct declaration. This declaration reads as follows. pf[] is
 * a statically allocated constant array in _m3 space of pointers to functions.
 * Each Function is passed a const pointer in _m5 space that points to
 * a const integer in _m4 space. It returns a const pointer in _m2 space
 * to a volatile character in _m1 space.
 */
static _m1 volatile CHAR * _m2 const (* _m3 const pf[]) (_m4 const INT * _m5 const) = {fna, fnb, fnc, … fnz};

// Declare a const variable that lies in _m4 space
_m4 const INT a = 6;

// Now declare a const pointer in _m5 space that points to a const
// variable that is in _m4 space
_m4 const INT * _m5 const aptr = &a;

// Make the function call, and get back the pointer
volatile CHAR * const  res = pf[jump_index](&a); 	//Method 1
volatile CHAR * const  res = (*pf[jump_index])(&a); 	//Method 2

while (*res)
	;	// Wait for volatile register to clear

Acknowledgments

My thanks to Mike Stevens not only for reading over this manuscript and making some excellent suggestions but also for over the years showing me more ways to use function pointers that I ever dreamed was possible.


This article was published in the May 1999 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Arrays of Pointers to Functions” Embedded Systems Programming, May 1999.

In Praise of the C Preprocessor’s #error Directive

Thursday, December 17th, 2009 Nigel Jones

Also available in PDF version.

One of the least used but potentially most useful C preprocessor directives is #error. Here’s a look at a couple of clever uses for #error that have proven invaluable in embedded software development.

#error is an ANSI-specified feature of the C preprocessor (cpp). Its syntax is very straightforward:

#error <writer supplied error message>

The <writer supplied error message> can consist of any printable text. You don’t even have to enclose the text in quotes. (Technically, the message is optional–though it rarely makes sense to omit it.)

When the C preprocessor encounters a #error statement, it causes compilation to terminate and the writer-supplied error message to be printed to stderr. A typical error message from a C compiler looks like this:

Filename(line_number): Error!
Ennnn: <writer supplied error message>

where Filename is the source file name, line_number is the line number where the #error statement is located, and Ennnn is a compiler-specific error number. Thus, the #error message is basically indistinguishable from ordinary compiler error messages.

“Wait a minute,” you might say. “I spend enough time trying to get code to compile and now he wants me to do something that causes more compiler errors?” Absolutely! The essential point is that code that compiles but is incorrect is worse than useless. I’ve found three general areas in which this problem can arise and #error can help. Read on and see if you agree with me.

Incomplete code

I tend to code using a step-wise refinement approach, so it isn’t unusual during development for me to have functions that do nothing, for loops that lack a body, and so forth. Consequently, I often have files that are compilable but lack some essential functionality. Working this way is fine, until I’m pulled off to work on something else (an occupational hazard of being in the consulting business). Because these distractions can occasionally run into weeks, I sometimes return to the job with my memory a little hazy about what I haven’t completed. In the worst-case scenario (which has occurred), I perform a make, which runs happily, and then I attempt to use the code. The program, of course, crashes and burns, and I’m left wondering where to start.

In the past, I’d comment the file to note what had been done and what was still needed. However, I found this approach to be rather weak because I then had to read all my comments (and I comment heavily) in order to find what I was looking for. Now I simply enter something like the following in an appropriate place in the file:

#error *** Nigel - Function incomplete. Fix before using ***

Thus, if I forget that I haven’t done the necessary work, an inadvertent attempt to use the file will result in just about the most meaningful compiler message I’ll ever receive. Furthermore, it saves me from having to wade through pages of comments, trying to find what work I haven’t finished.

Compiler-dependent code

As much as I strive to write portable code, I often find myself having to trade off performance for portability – and in the embedded world, performance tends to win. However, what happens if a few years later I reuse some code without remembering that the code has compiler-specific peculiarities? The result is a much longer debug session than is necessary. But a judicious #error statement can prevent a lot of grief. A couple of examples may help.

Example 1

Some floating-point code requires at least 12 digits of resolution to return the correct results. Accordingly, the various variables are defined as type long double. But ISO C only requires that a long double have 10 digits of resolution. Thus on certain machines, a long double may be inadequate to do the job. To protect against this, I would include the following:

#include <float.h>
#if (LDBL_DIG < 12)
	#error *** long doubles need 12 digit resolution.
	Do not use this compiler! ***
#endif

This approach works by examining the value of an ANSI-mandated constant found in float.h.

Example 2

An amazing amount of code makes invalid assumptions about the underlying size of the various integer types. If you have code that has to use an int (as opposed to a user-specified data type such as int16), and the code assumes that an int is 16 bits, you can do the following:

#include <limits.h>
#if (INT_MAX != 32767)
	#error *** This file only works with 16-bit int.
	Do not use this compiler! ***
#endif

Again, this works by checking the value of an ANSI-mandated constant. This time the constant is found in the file limits.h. This approach is a lot more useful than putting these limitations inside a big comment that someone may or may not read. After all, you have to read the compiler error messages.

Conditionally-compiled code

Since conditionally compiled code seems to be a necessary evil in embedded programming, it’s common to find code sequences such as the following:

#if defined OPT_1
	/* Do option_1 */
#else
	/* Do option_2 */
#endif

As it is written, this code means the following: if and only if OPT_1 is defined, we will do option_1; otherwise we’ll do option_2. The problem with this code is that a user of the code doesn’t know (without explicitly examining the code) that OPT_1 is a valid compiler switch. Instead, the naïve user will simply compile the code without defining OPT_1 and get the alternate implementation, irrespective of whether that is what’s required or not. A more considerate coder might be aware of this problem, and instead do the following:

#if defined OPT_1
	/* Do option 1 */
#elif defined OPT_2
	/* Do option 2*/
#endif

In this case, failure to define either OPT_1 or OPT_2 will typically result in an obscure compiler error at a point later in the code. The user of this code will then be stuck with trying to work out what must be done to get the module to compile. This is where #error comes in. Consider the following code sequence:

#if defined OPT_1
	/* Do option_1 */
#elif defined OPT_2
	/* Do option_2 */
#else
	#error *** You must define one of OPT_1 or OPT_2 ***
#endif

Now the compilation fails, but at least it tells the user explicitly what to do to make the module compile. I know that if this procedure had been adopted universally, I would have saved a lot of time over the years trying to reuse other people’s code.

So there you have it. Now tell me, don’t you agree that #error is a really useful part of the preprocessor, worthy of your frequent use-and occasional praise?


This article was published in the September 1999 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “In Praise of the #error Directive” Embedded Systems Programming, September 1999.

Efficient C Code for 8-bit Microcontrollers

Thursday, December 17th, 2009 Nigel Jones

Also available in PDF version.

The 8051, 68HC11, and Microchip PIC are popular microcontrollers, but they aren’t necessarily easy to program. This article shows how the use of ANSI C and compiler-specific constructs can help generate tighter code.

Getting the best possible performance out of the C compiler for an 8-bit microcontroller isn’t always easy. This article concentrates mainly on those microcontrollers that were never designed to support high-level languages, such as members of the 8051, 6800 (including the 68HC11), and Microchip PIC families of microcontrollers. Newer 8-bit machines such as the Philips 8051XA and the Atmel Atmega series were designed explicitly to support high-level languages and, as such, may not need all the techniques I describe here.

My emphasis is not on algorithm design, nor does it depend on a specific microcontroller or compiler. Rather, I describe general techniques that are widely applicable. In many cases, these techniques work on larger machines, although you may then decide that the trade-offs involved aren’t worthwhile.

Before jumping into the meat of the article, let’s briefly digress with a discussion of the philosophy involved. The microcontrollers I’ve named are popular for reasons of size, price, power consumption, peripheral mix, and so on. Notice that “ease of programming” is conspicuously missing from this list. Traditionally, these microcontrollers have been programmed in assembly language. In the last few years, many vendors have recognized the desire of users to increase their productivity, and have introduced C compilers for these machines—many of which are extremely good. However, it’s important to remember that no matter how good the compiler, the underlying hardware has severe limitations. Thus, to write efficient C for these targets, it’s essential that we be aware of what the compiler can do easily and what requires compiler heroics. In presenting these techniques, I have taken the attitude that I wish to solve a problem by programming a microcontroller, and that the C compiler is a tool, no different from an oscilloscope. In other words, C is a means to an end, and not an end in itself. As a result, many of my comments will seem heretical to the high-level language purists out there.

ANSI C

The first step to writing a realistic C program for an 8-bit computer is to dispense with the concept of writing 100% ANSI code. This concession is necessary because I don’t believe it’s possible, or even desirable, to write 100% ANSI code for any embedded system, particularly for 8-bit systems.

Some characteristics of 8-bit systems that prevent ANSI compliance are:

  • Embedded software interacts with hardware, yet ANSI C provides extremely crude tools for addressing registers at fixed memory locations
  • All nontrivial systems use interrupts, yet ANSI C doesn’t have a standard way of coding interrupt service routines
  • ANSI C has various type promotion rules that are absolute performance killers on an 8-bit computer
  • Many older microcontrollers feature multiple memory banks, which have to be hardware swapped in order to correctly address the desired variable
  • Many microcontrollers have no hardware support for C’s stack (i.e., they lack a stack pointer)

This is not to say that I advocate junking the entire ANSI C standard. I take the view that one should use standard C as much as possible. However, when it interferes with solving the problem at hand, do not hesitate to bypass it. Does this interfere with making code portable and reusable? Absolutely. But portable, reusable code that doesn’t get the job done isn’t much use.

I’ve also noticed that every compiler has a switch that strictly enforces ANSI C and disables all compiler extensions. I suspect that this is done purely so that a vendor can claim ANSI compliance, even though this feature is practically useless. I have also observed that vendors who strongly emphasize their ANSI compliance often produce inferior code (perhaps because the compiler has a generic front end that is shared among multiple targets) when compared to vendors that emphasize their performance and language extensions.

Enough about the ANSI standard. Let’s now discuss specific actions that can be taken to make your code run efficiently on an 8-bit microcontroller. The most important, by far, is the choice of data types.

Data types

Knowledge of the size of the underlying data types, together with careful data type selection, is essential for writing efficient code on eight-bit machines. Furthermore, understanding how the compiler handles expressions involving your data types can make a considerable difference in your coding decisions. These topics are discussed in the following paragraphs.

Data type size

In the embedded world, knowing the underlying representation of the various data types is usually essential. I have seen many discussions on this topic, none of which has been particularly satisfactory or portable. My preferred solution is to include a file, <types.h>, an excerpt from which appears below:

#ifndef TYPES_H
#define TYPES_H
#include <limits.h>/* Assign a compiler-specific data type to BOOLEAN */
#ifdef _C51_
typedef bit BOOLEAN
#define FALSE 0
#define TRUE 1
#else
typedef enum {FALSE=0, TRUE=1} BOOLEAN;
#endif

/* Assign an 8-bit signed type to CHAR */
#if (SCHAR_MAX == 127)
typedef char CHAR;
#elif (SCHAR_MAX == 255)
/* Implies that by default chars are unsigned */
typedef signed char CHAR;
#else
/* No eight bit data types */
#error Warning! Intrinsic data type char is not eight bits
#endif

/* Rest of the file goes here */
#endif

The concept is quite simple. The file types.h includes the ANSI-required file limits.h. It then explicitly tests each of the predefined data types for the smallest type that matches signed and unsigned 1-, 8-, 16-, and 32-bit variables. The result is that my data type UCHAR is guaranteed to be an 8-bit unsigned variable, INT is guaranteed to be a 16-bit signed variable, and so forth. In this manner, the following data types are defined: BOOLEAN, CHAR, UCHAR, INT, UINT, LONG, and ULONG.

Several points are worth making:

  • The definition of the BOOLEAN data type is difficult. Many 8-bit processors directly support single-bit data types, and I wish to take advantage of this if possible. Unfortunately, since ANSI is silent on this topic, it’s necessary to use compiler-specific code
  • Some compilers define a char as an unsigned quantity, such that if a signed 8-bit variable is required, one has to use the unusual declaration signed char
  • Note the use of the #error directive to force a compile error if I can’t achieve my goal of having unambiguous definitions of BOOLEAN, UCHAR, CHAR, UINT, INT, ULONG, and LONG

In all of the following examples, the types BOOLEAN, UCHAR, and so on will be used to specify unambiguously the size of the variable being used.

Data type selection

There are two basic guidelines for data type selection on 8-bit processors:

  • Use the smallest possible type to get the job done
  • Use an unsigned type whenever possible

The reasons for this are simply that many 8-bit processors have no direct support for manipulating anything more complicated than an unsigned 8-bit value. However, unlike large machines, eight-bitters often provide direct support for manipulation of bits. Thus, the fastest integer types to use on an 8-bit CPU are BOOLEAN and UCHAR. Consider the typical C code:

int is_positive(int a)
{
(a>=0) ? return(1) : return (0);
}

The better implementation is:

BOOLEAN is_positive(int a)
{
(a>=0) ? return(TRUE) : return (FALSE);
}

On an 8-bit processor we can get a large performance boost by using the BOOLEAN return type because the compiler need only return a bit (typically via the carry flag), vs. a 16-bit value stored in registers. The code is also more readable.

Let’s take a look at a second example. Consider the following code fragment that is littered throughout most C programs:

int j;

for (j = 0; j < 10; j++)
{

}

This fragment produces horribly inefficient code on an 8051. A better way to code this for 8-bit CPUs is as follows:

UCHAR j;

for (j = 0; j < 10; j++)
{

}

The result is a huge boost in performance because we are now using an 8-bit unsigned variable (that can be manipulated directly) vs. a signed 16-bit quantity that will typically be handled by a library call. Note also that there is generally no penalty for coding this way on most big CPUs (with the exception of some RISC processors). Furthermore, a strong case exists for doing this on all machines. Those of you who know Pascal are aware that when declaring an integer variable, it’s possible, and normally desirable, to specify the allowable range that the integer can take on. For example:

type loopindex = 0..9;
var j loopindex;

Upon rereading the code later, you’ll have additional information concerning the intended use of the variable. For our classical C code above, the variable int j may take on values of at least –32768 to +32767. For the case in which we have UCHAR j, we inform others that this variable is intended to have strictly positive values over a restricted range. Thus, this simple change manages to combine tighter code with improved maintainability—not a bad combination.

Enumerated types

The use of enumerated data types was a welcome addition to ANSI C. Unfortunately, the ANSI standard calls for the underlying data type of an enum to be an int. Thus, on many compilers, declaration of an enumerated type forces the compiler to generate 16-bit signed code, which, as I’ve mentioned, is extremely inefficient on an 8-bit CPU. This is unfortunate, especially as I have never seen an enumerated type list go over a few dozen elements; it could usually easily be fit in a UCHAR. To overcome this limitation, several options exist, none of which is palatable:

  • Check your compiler documentation, which may show you how to specify via a (compiler-specific) command line switch that enumerated types be put into the smallest possible data type
  • Accept the inefficiency as an acceptable trade-off for readability
  • Dispense with enumerated types and resort to lists of manifest constants

Integer promotion

The integer promotion rules of ANSI C are probably the most heinous crime committed against those of us who labor in the 8-bit world. I have no doubt that the standard is quite detailed in this area. However, the two most important rules in practice are the following:

  • Any expression involving integral types smaller than an int have all the variables automatically promoted to int
  • Any function call that passes an integral type smaller than an int automatically promotes the variable to an int, if the function is not prototyped

The key word here is automatically. Unless you take explicit steps, the compiler is unlikely to do what you want. Consider the following code fragment:

CHAR a,b,res;

res = a+b;

The compiler will promote a and b to integers, perform a 16-bit addition, and then assign the lower eight bits of the result to res. Several ways around this problem exist. First, many compiler vendors have seen the light, and allow you to disable the ANSI automatic integer promotion rules. However, you’re then stuck with compiler-dependant code.

Alternatively, you can resort to very clumsy casting, and hope that the compiler’s optimizer works out what you really want to do. The extent of the casting required seems to vary among compiler vendors. As a result, I tend to go overboard:

res = (CHAR)((CHAR)a + (CHAR)b);

With complex expressions, the result can be hideous.

More integer promotion rules

A third integer promotion rule that is often overlooked concerns expressions that contain both signed and unsigned integers. In this case, signed integers are promoted to unsigned integers. Although this makes sense, it can present problems in our 8-bit environment, where the unsigned integer rules. For example:

void demo(void)
{
UINT a = 6;
INT b = -20;(a+b > 6) ?
puts(“More than 6”) :
puts(“Less than or equal to 6”);
}

If you run this program, you may be surprised to find that the output is “More than 6.” This problem is a very subtle one, and is even more difficult to detect when you use enumerated data types or other defined data types that evaluate to a signed integer data type. Using the result of a function call in an expression is also problematic.

The good news is that in the embedded world, the percentage of integral data types that must be signed is quite low, thus the potential number of expressions in which mixed types occur is also low. The time to be cautious is when reusing code that was written by someone who didn’t believe in unsigned data types.

Floating-point types

Floating-point arithmetic is required in many applications. However, since we’re normally dealing with real-world data whose representation rarely goes beyond 16 bits (a 20-bit A/D converter on an 8-bit machine is rare), the requirements for double-precision arithmetic are tenuous, except in the strangest of circumstances.

Again, the ANSI people have handicapped us by requiring that any floating-point expression be promoted to double before execution. Fortunately, a lot of compiler vendors have done the sensible thing, and simply defined doubles to be the same as floats, so that this promotion is benign. Be warned, however, that many reputable vendors have made a virtue out of providing a genuine double-precision data type. The result is that unless you take great care, you may end up computing values with ridiculous levels of precision, and paying the price computationally. If you’re considering a compiler that offers double-precision math, study the documentation carefully to ensure that there is some way of disabling the automatic promotion of float to dobuble. If there isn’t, look for another compiler.

While we’re on this topic, I’d like to air a pet peeve of mine. Years ago, before decent compiler support for 8-bit processors was available, I would code in assembly language using a bespoke floating-point library. This library was always implemented using 24-bit floats, with a long float consuming four bytes. I found that this was more than adequate for the real world. I’ve yet to find a compiler vendor that offers this as an option. My guess is that the marketing people insisted on a true ANSI floating-point library, the real world be damned. As a result, I can calculate hyperbolic sines on my 68HC11, but I can’t get the performance boost that comes from using just a 24-bit float.

Having moaned about the ANSI-induced problems, let’s turn to an area in which ANSI has helped a lot. I’m referring to the keywords const and volatile, which, together with static, allow the production of better code.

C’s static keyword

The keywords static, volatile, and const together allow one to write not only better code (in the sense of information hiding and so forth) but also tighter code.

Static variables

When applied to variables, static has two primary functions. The first and most common use is to declare a variable that doesn’t disappear between successive invocations of a function. For example:

void func(void) { static UCHAR state = 0; switch (state) { … } }

In this case, the use of static is mandatory for the code to work.

The second use of static is to limit the scope of a variable. A variable that is declared static at the module level is accessible by all functions in the module, but by no one else. This is important because it allows us to gain all the performance benefits of global variables, while severely limiting the well-known problems of globals. As a result, if I have a data structure which must be accessed frequently by a number of functions, I’ll put all of the functions into the same module and declare the structure static. Then all of the functions that need to can access the data without going through the overhead of an access function, while at the same time, code that has no business knowing about the data structure is prevented from accessing it. This technique is an admission that directly accessible variables are essential to gaining adequate performance on small machines.

A few other potential benefits can result from declaring module level variables static (as opposed to leaving them global). Static variables, by definition, may only be accessed by a specific set of functions. Consequently, the compiler and linker are able to make sensible choices concerning the placement of the variables in memory. For instance, with static variables, the compiler/linker may choose to place all of the static variables in a module in contiguous locations, thus increasing the chances of various optimizations, such as pointers being simply incremented or decremented instead of being reloaded. In contrast, global variables are often placed in memory locations that are designed to optimize the compiler’s hashing algorithms, thus eliminating potential optimizations.

Static functions

A static function is only callable by other functions within its module. While the use of static functions is good structured programming practice, you may also be surprised to learn that static functions can result in smaller and/or faster code. This is possible because the compiler knows at compile time exactly what functions can call a given static function. Therefore, the relative memory locations of functions can be adjusted such that the static functions may be called using a short version of the call or jump instruction. For instance, the 8051 supports both an ACALL and an LCALL op code. ACALL is a two-byte instruction, and is limited to a 2K address block. LCALL is a three-byte instruction that can access the full 8051 address space. Thus, use of static functions gives the compiler the opportunity to use an ACALL where otherwise it might use an LCALL.

The potential improvements are even better, in which the compiler is smart enough to replace calls with jumps. For example:

void fa(void) { … fb(); } static void fb(void) { … }

In this case, because function fb() is called on the last line of function fa(), the compiler can replace the call with a jump. Since fb() is static, and the compiler knows its exact distance from fa(), the compiler can use the shortest jump instruction. For the Dallas DS80C320, this is an SJMP instruction (two bytes, three cycles) vs. an LCALL (three bytes, four cycles).

On a recent project, rigorous application of the static modifier to functions resulted in about a 1% reduction in code size. When your ROM is 95% full, a 1% reduction is most welcome!

A final point concerning static variables and debugging: for reasons that I do not fully understand, with many in-circuit emulators that support source-level debug, static variables and/or automatic variables in static functions are not always accessible symbolically. As a result, I tend to use the following construct in my project-wide include file:

#ifndef NDEBUG #define STATIC #else #define STATIC static #endif

I then use STATIC instead of static to define static variables, so that while in debug mode, I can guarantee symbolic access to the variables.

C’s volatile keyword

A volatile variable is one whose value may be changed outside the normal program flow. In embedded systems, the two main ways that this can happen is either via an interrupt service routine, or as a consequence of hardware action (for instance, a serial port status register updates as a result of a character being received via the serial port). Most programmers are aware that the compiler will not attempt to optimize a volatile register, but rather will reload it every time. The case to watch out for is when compiler vendors offer extensions for accessing absolute memory locations, such as hardware registers. Sometimes these extensions have either an implicit or an explicit declaration of volatility and sometimes they don’t. The point is to fully understand what the compiler is doing. If you do not, you may end up accessing a volatile variable when you don’t want to and vice versa. For example, the popular 8051 compiler from Keil offers two ways of accessing a specific memory location. The first uses a language extension, _at_, to specify where a variable should be located. The second method uses a macro such as XBYTE[] to dereference a pointer. The “volatility” of these two is different. For example:

UCHAR status_register _at_ 0xE000;

This method is simply a much more convenient way of accessing a specific memory location. However, volatile is not implied here. Thus, the following code is unlikely to work:

while (status_register); /* Wait for status register to clear */

Instead, one needs to use the following declaration:

volatile UCHAR status_register _at_ 0xE000;

The second method that Keil offers is the use of macros, such as the XBYTE macro, as in:

status_register = XBYTE[0xE000];

Here, however, examination of the XBYTE macro shows that volatile is assumed:

#define XBYTE ((unsigned char volatile xdata*) 0)

(The xdata is a memory space qualifier, which isn’t relevant to the discussion here and may be ignored.)

Thus, the code:

while (status_register); /* Wait for status register to clear */

will work as you would expect in this case. However, in the case in which you wish to access a variable at a specific location that is not volatile, the use of the XBYTE macro is potentially inefficient.

C’s const keyword

The keyword const, which is by the way the most badly named keyword in the C language, does not mean “constant”! Rather, it means “read only”. In embedded systems, there is a huge difference, which will become clear.

Many texts recommend that instead of using manifest constants, one should use a const variable. For instance:

const UCHAR nos_atod_channels = 8;

instead of

#define NOS_ATOD_CHANNELS 8

The rationale for this approach is that inside a debugger, you can examine a const variable (since it should appear in the symbol table), whereas a manifest constant isn’t accessible. Unfortunately, on many eight-bit machines you’ll pay a significant price for this benefit. The two primary costs are:

  • The compiler creates a genuine variable in RAM to hold the variable. On RAM-limited systems, this can be a significant penalty
  • Some compilers, recognizing that the variable is const, will store the variable in ROM. However, the variable is still treated as a variable and is accessed as such, typically using some form of indexed addressing. Compared to immediate addressing, this method is normally much slower

I recommend that you eschew the use of const variables on 8-bit micros, except in the following circumstances.

const function parameters

Declaring function parameters const whenever possible not only makes for better, safer code, but also has the potential for generating tighter code. This is best illustrated by an example:

void output_string(CHAR *cp) { while (*cp) putchar(*cp++); } void demo(void) { char *str = “Hello, world”; output_string(str); if (‘H’ == str[0]) { some_function(); } }

In this case, there is no guarantee that output_string() will not modify our original string, str. As a result, the compiler is forced to perform the test in demo(). If instead, output_string is correctly declared as follows:

void output_string(const char *cp) { while (*cp) putchar(*cp++); }

then the compiler knows that output_string() cannot modify the original string str, and as a result it can dispense with the test and invoke some_function() unconditionally. Thus, I strongly recommend liberal use of the const modifier on function parameters.

const volatile variables

We now come to an esoteric topic. Can a variable be both const and volatile, and if so, what does that mean and how might you use it? The answer is, of course, yes (why else would it have been asked?), and it should be used on any memory location that can change unexpectedly (hence the need for the volatile qualifier) and that is read-only (hence the const). The most obvious example of this is a hardware status register. Thus, returning to the status_register example above, a better declaration for our status register is:

const volatile UCHAR status_register _at_ 0xE000;

Typed data pointers

We now come to another area in which a major trade-off exists between writing portable code and writing efficient code—namely the use of typed data pointers , which are pointers that are constrained in some way with respect to the type and/or size of memory that they can access. For example, those of you who have programmed the x86 architecture are undoubtedly familiar with the concept of using the __near and __far modifiers on pointers. These are examples of typed data pointers. Often the modifier is implied, based on the memory model being used. Sometimes the modifier is mandatory, such as in the prototype of an interrupt handler:

void __interrupt __far cntr_int7();

The requirement for the near and far modifiers comes about from the segmented x86 architecture. In the embedded eight-bit world, the situation is often far more complex. Microcontrollers typically require typed data pointers because they offer a number of disparate memory spaces, each of which may require the use of different addressing modes. The worst offender is the 8051 family, with at least five different memory spaces. However, even the 68HC11 has at least two different memory spaces (zero page and everything else), together with the EEPROM, pointers to which typically require an address space modifier.

The most obvious characteristic of typed data pointers is their inherent lack of portability. They also tend to lead to some horrific data declarations. For example, consider the following declaration from the Whitesmiths 68HC11 compiler:

@dir INT * @dir zpage_ptr_to_zero_page;

This declares a pointer to an INT. However, both the pointer and its object reside in the zero page (as indicated by the Whitesmith extension, @dir). If you were to add a const qualifier or two, such as:

@dir const INT * @dir const constant_zpage_ptr_to_constant_zero_page_data;

then the declarations can quickly become quite intimidating. Consequently, you may be tempted to simply ignore the use of typed pointers. Indeed, coding an application on a 68HC11 without ever using a typed data pointer is quite possible. However, by doing so the application’s performance will take an enormous hit because the zero page offers considerably faster access than the rest of memory.

This area is so critical to performance that all hope of portability is lost. For example, consider two leading 8051 compiler vendors, Keil and Tasking. Keil supports a three-byte generic pointer that may be used to point to any of the 8051 address spaces, together with typed data pointers that are strictly limited to a specific data space. Keil strongly recommends the use of typed data pointers, but doesn’t require it. By contrast, Tasking takes the attitude that generic pointers are so horribly inefficient that it mandates the use of typed pointers (an argument to which I am extremely sympathetic).

To get a feel for the magnitude of the difference, consider the following code, intended for use on an 8051:

void main(void) { UCHAR array[16]; /* array is in the data space by default */ UCHAR data * ptr = array; /* Note use of data qualifier */ UCHAR i; for (i = 0; i < 16; i++) *ptr++ = i; }

Using a generic pointer, this code requires 571 cycles and 88 bytes. Using a typed data pointer, it needs just 196 cycles and 52 bytes. (The memory sizes include the startup code, and the execution times are just those for executing main()).

With these sorts of performance increases, I recommend always using explicitly typed pointers, and paying the price in loss of portability and readability.

Implementing an assert() macro

The assert() macro is commonly used on PC platforms, but almost never used on small embedded systems. There are several reasons for this:

  • Many reputable compiler vendors don’t bother to supply an assert macro
  • Vendors that do supply the macro often provide it in an almost useless form
  • Most embedded systems don’t support a stderr to which the error may be printed

These limitations notwithstanding, it’s possible to gain the benefits of the assert() macro on even the smallest systems if you’re prepared to take a pragmatic approach.

Before I discuss possible implementations, mentioning why assert() is important (even in embedded systems) is worthwhile. Over the years, I’ve built up a library of drivers to various pieces of hardware such as LCDs, ADCs, and so on. These drivers typically require various parameters to be passed to them. For example, an LCD driver that displays a text string on a panel would expect the row, the column, a pointer to the string, and perhaps an attribute parameter. When writing the driver, it is obviously important that the passed parameters are correct. One way of ensuring this is to include code such as this:

void Lcd_Write_Str(UCHAR row, UCHAR column, CHAR *str, UCHAR attr) { row &= MAX_ROW; column &= MAX_COLUMN; attr &= ALLOWABLE_ATTRIBUTES; if (NULL == str) return; /* The real work of the driver goes here */ }

This code clips the parameters to allowable ranges, checks for a null pointer assignment, and so on. However, on a functioning system, executing this code every time the driver is invoked is extremely costly. But if the code is discarded, reuse of the driver in another project becomes a lot more difficult because errors in the driver invocation are tougher to detect.

The preferred solution is the liberal use of an assert macro. For example:

void Lcd_Write_Str(UCHAR row, UCHAR column, CHAR *str, UCHAR attr) { assert (row < MAX_ROW); assert (column < MAX_COLUMN); assert (attr < ALLOWABLE_ATTRIBUTES); assert (str != NULL); /* The real work of the driver goes here */ }

This is a practical approach if you’re prepared to redefine the assert macro. The level of resources in your system will control the sophistication of this macro, as shown in the examples below.

Assert 1

This example assumes that you have no spare RAM, no spare port pins, and virtually no ROM to spare. In this case, assert.h becomes:

#ifndef assert_h #define assert_h #ifndef NDEBUG #define assert(expr) \ if (expr) {\ while (1);\ } #else #define assert(expr) #endif #endif

Here, if the assertion fails, we simply enter an infinite loop. The only utility of this case is that, assuming you’re running a debug session on an ICE, you will eventually notice that the system is no longer running. In which case, breaking the emulator and examining the program counter will give you a good indication of which assertion failed. As a possible refinement, if your system is interrupt-driven, inserting a “disable all interrupts” command prior to the while(1) may be necessary, just to ensure that the system’s failure is obvious.

Assert 2

This case is the same as assert #1, except that in example #2 you have a spare port pin on the microcontroller to which an error LED is attached. This LED is lit if an error occurs, thus giving you instant feedback that an assertion has failed. Assert.h now becomes:

#ifndef assert_h #define assert_h #define ERROR_LED_ON() /* Put expression for turning LED on here */ #define INTERRUPTS_OFF() /* Put expression for interrupts off here */ #ifndef NDEBUG #define assert(expr) \ if (expr) {\ ERROR_LED_ON();\ INTERRUPTS_OFF();\ while (1);\ } #else #define assert(expr) #endif #endif

Assert 3

This example builds on assert #2. But in this case, we have sufficient RAM to define an error message buffer, into which the assert macro can sprintf() the exact failure. While debugging on an ICE, if a permanent watch point is associated with this buffer, then breaking the ICE will give you instant information on where the failure occurred. Assert.h for this case becomes:

#ifndef assert_h #define assert_h #define ERROR_LED_ON() /* Put expression for turning LED on here */ #define INTERRUPTS_OFF()/* Put expression for interrupts off here */ #ifndef NDEBUG extern char error_buf[80]; #define assert(expr) \ if (expr) {\ ERROR_LED_ON();\ INTERRUPTS_OFF();\ sprintf(error_buf,”Assert failed: “ #expr “ (file %s line %d)\n”, __FILE__, (int) __LINE__ );\ while (1);\ } #else #define assert(expr) #endif #endif

Obviously, this requires that you define error_buffer[80] somewhere else in your code.

I don’t expect that these three examples will cover everyone’s needs. Rather, I hope they give you some ideas on how to create your own assert macros to get the maximum debugging information within the constraints of your embedded system.

Heretical comments

So far, all of my suggestions have been about actively doing things to improve the code quality. Now, let’s address those areas of the C language that should be avoided, except in highly unusual circumstances. For some of you, the suggestions that follow will border on heresy.

Recursion

Recursion is a wonderful technique that solves certain problems in an elegant manner. It has no place on an eight-bit microcontroller. The reasons for this are quite simple:

  • Recursion relies on a stack-based approach to passing variables. Many small machines have no hardware support for a stack. Consequently, either the compiler will simply refuse to support reentrancy, or else it will resort to a software stack in order to solve the problem, resulting in dreadful code quality
  • Recursion relies on a “virtual stack” that purportedly has no real memory constraints. How many small machines can realistically support virtual memory?

If you find yourself using recursion on a small machine, I respectfully suggest that you are either (a) doing something really weird, or (b) you don’t understand the sum total of the constraints with which you’re working. If it is the former, then please contact me, as I will be fascinated to see what you are doing.

Variable length argument lists

You should avoid variable length argument lists because they too rely on a stack-based approach to passing variables. What about sprintf() and its cousins, you all cry? Well, if possible, you should consider avoiding the use of these library functions. The reasons for this are as follows:

  • If you use sprintf(), take a look at the linker output and see how much library code it pulls in. On one of my compilers, sprintf(), without floating-point support, consumes about 1K. If you’re using a masked micro with a code space of 8K, this penalty is huge
  • On some compilers, use of sprintf() implies the use of a floating-point library, even if you never use the library. Consequently, the code penalty quickly becomes enormous
  • If the compiler doesn’t support a stack, but rather passes variables in registers or fixed memory locations, then use of variable length argument functions forces the compiler to reserve a healthy block of memory simply to provide space for variables that you may decide to use. For instance, if your compiler vendor assumes that the maximum number of arguments you can pass is 10, then the compiler will reserve 40 bytes (assuming four bytes per longest intrinsic data type)

Fortunately, many vendors are aware of these issues and have taken steps to mitigate the effects of using sprintf(). Notwithstanding these actions, taking a close look at your code is still worthwhile. For instance, writing my own wrstr() and wrint() functions (to ouput strings and ints respectively) generated half the code of using sprintf. Thus, if all you need to format are strings and base 10 integers, then the roll-your-own approach is beneficial (while still being portable).

Dynamic memory allocation

When you’re programming an application for a PC, using dynamic memory allocation makes sense. The characteristics of PCs that permit and/or require dynamic memory allocation include:

  • When writing an application, you may not know how much memory will be available. Dynamic allocation provides a way of gracefully handling this problem
  • The PC has an operating system, which provides memory allocation services
  • The PC has a user interface, such that if an application runs out of memory, it can at least tell the user and attempt a relatively graceful shutdown

In contrast, small embedded systems typically have none of these characteristics. Therefore, I think that the use of dynamic memory allocation on these targets is silly. First, the amount of memory available is fixed, and is typically known at design time. Thus static allocation of all the required and/or available memory may be done at compile time.

Second, the execution time overhead of malloc(), free(), and so on is not only quite high, but also variable, depending on the degree of memory fragmentation.

Third, use of malloc(), free(), and so on consumes valuable EPROM space. And lastly, dynamic memory allocation is fraught with danger (witness the series from P.J. Plauger on garbage collection in the January 1998, March 1998, and April 1998 issues of Embedded Systems Programming).

Consequently, I strongly recommend that you not use dynamic memory allocation on small systems.

Final thoughts

I have attempted to illustrate how judicious use of both ANSI constructs and compiler-specific constructs can help generate tighter code on small microcontrollers. Often, though, these improvements come at the expense of portability and/or readability. If you are in the fortunate position of being able to use less efficient code, then you can ignore these suggestions. If, however, you are severely resource-constrained, then give a few of these techniques a try. I think you’ll be pleasantly surprised.

You may also want to read the following:


This article was published in the November 1998 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Efficient C Code for Eight-Bit MCUs” Embedded Systems Programming, November 1998.

Use Strings to Internationalize C Programs

Thursday, December 17th, 2009 Nigel Jones

Also available in PDF version.

Products destined for use in multiple countries often require user interfaces that support several human languages. Sloppy string management in your programs could result in unintelligible babble.

A decade or two ago, most embedded systems had an extremely limited user interface. In most cases, the interface was either non-existent, or it consisted of a few LEDs and the odd jumper or push button. As the cost of display technology has plummeted, alphanumeric user interfaces have become increasingly common. Simultaneously, a variety of technological, economic, and political pressures have brought about the need for products to be sold in many countries. As a result, the need for an embedded system to support multiple languages has become apparent.

This problem is even more acute for the non-embedded computer world. Part of the solution for that marketplace was the introduction of Unicode, wide character types, and so on. Unfortunately, these techniques require storage capabilities and display resolutions rarely found in embedded systems. Instead, most embedded systems with displays typically use a low-resolution LCD or vacuum fluorescent display (VFD) with a built-in character generator. It’s this type of display that I’ll be concentrating on.

Lessons learned

A few years ago, I worked on an industrial measurement system. The product was to be sold in both North America and Europe. Consequently, one of the essential design requirements was to support multiple European languages. The product in question has a 240 x 128-pixel LCD panel with a built-in character generator. The character generator contains a subset of the ASCII 256-character set, including “specialized” characters such as c, u, and e.

Anyway, as I pondered possible approaches to the design, I looked back at my previous attempts to solve this problem. They weren’t pretty! At the risk of ruining what little reputation I may have gained, I think it’s instructive to look at these previous attempts.

Lesson 1

The first product that I designed that had an alphanumeric display was implemented with the typical arrogance of a native English speaker. Namely, it never occurred to me to even consider the rest of the world. As a result, my code was littered with text strings. That is, you’d see the assembly language (yes, it was that long ago) equivalent of:

WrStr("Jet Black");

To make matters worse, this construct would be found in many functions, split between several files.

The foolishness of this approach struck home when I was asked to produce a version of the product for the German market. I quickly realized that I had to edit source code files to implement the translation. This had several ramifications:

  • A separate make environment was needed for each language.
  • Every time a change had to be made to the code, the same modification had to be made to each language’s version of the file. In short, it was a maintenance nightmare.
  • The source code had to be given to the translator. I think you can imagine the problems this caused.

Lesson 2

The next product was a big improvement, because I did what should have been done in the first place, which was to place all the text strings into one file. That is, there were alternative string files called english.c, german.c, and so on for additional languages, each containing all of the strings for a particular language. Each string file contained a single array definition that looked something like this:

char const * const strings[] =
{
    "Jet Black",
    "Red",
    ......
};

Thus, to display a particular string, my code looked like this:

WrStr(strings[0]);

Now all I had to do to enable support for a new language was to hand a copy of english.c to the translator, and have him produce the equivalent strings for the new language. Unfortunately, it didn’t work out that way. It turns out that the English language is extraordinarily terse when compared with certain other languages. For example, the German equivalent of Jet Black is Rabenschwarz.

Working from english.c, the translator assumed there were just nine characters into which to fit the translation. Thus, the translator was forced into abbreviating the German. However, in many cases, there was actually more space available on the display such that the abbreviation looked awkward in the product. The only way to find out was to execute the code and look at the results. This is a lot harder than it sounds, because many strings are only displayed in exceptional conditions. Thus one has to generate all the external stimuli such that those exceptions occur. In short, the translation effort remained a Herculean task.

Once it was complete, I still wasn’t out of the woods, because the different length strings caused the memory allocation to change significantly. Although it did not happen, theoretically I could have run out of memory.

Lesson 3

By the time I was working on my third product requiring multiple language support, I was a lot wiser and memory capacities had increased dramatically. As a result, I now ensured that every string in my string file was padded with spaces out to the maximum allowed field width. Furthermore, I had also learned the intricacies of conditional compilation and passing command line arguments to make, such that I included every language into the same text file. Thus, strings.c looked something like Listing 1.

#if defined(ENGLISH)
    char const * const strings[] =
    {
        "Jet Black ",
        ...
        "Good Bye ",
        ...
        "Evening"
    };
#elif defined(GERMAN)
    char const * const strings[] =
    {
        "Rabenschwarz",
        ...
        "Auf Wiedersehen ",
        ...
        "Abend "
    };
#endif

Listing 1. Multiple languages in a single C module

This third solution worked well, except that at the same time, the size of the alphanumeric displays and the complexity of the user interface had increased. While my first product had just 30 or 40 strings, this latest product had around 500. Thus, the bulk of the user interface code ended up looking like this:

WrStr(strings[27]);
WrStr(strings[47]);
WrStr(strings[108]);

This code doesn’t make clear what string I was actually displaying. So I was beginning to long for the original:

WrStr("Jet Black");

I ran into another major problem at this time. As the product evolved, so did the strings. I found myself wanting to delete certain strings that were no longer needed. But I couldn’t do that without destroying my indexing scheme. Thus, I was forced into changing unwanted text into null strings, such that strings.c now contained sequences like this:

char const * const strings[] = {
    "Jet Black  ",
    ...
    "", /* deleted */
    ...
    "Evening"
};

Although this saved the space consumed by the string, I was still wasting ROM on the pointer. In addition, it looked ugly and had “kludge” written all over it. I also ran into a more serious problem. From a maintenance perspective, it would be very useful if related strings were in contiguous locations. Thus if a particular field could contain “Jet Black,” “Red,” or “Pale Grey,” I would place these together in the string file. However, two years later, when marketing asked for “Yellow” to be added to the list of selections, I was forced to place “Yellow” at the end of the string list, well away from its partners. This pained me greatly.

There was one final problem with this implementation (and all the others) and that was the fact that the strings array was a global. I’ve become quite paranoid about globals in the last few years, such that when I look back at the code now, I have to confess that I cringe.

I also discovered a neat feature that was missing from all of the previous disasters. A few years ago, I saw a product demonstration in which the language was changeable on the fly. That is, without changing the operating mode or cycling power, the entire user interface could be changed to another language. The demonstration was incredibly slick. (Consider the following scenario. Your product is being introduced at a trade show. Some native French speakers come to the booth to look at the product. With the push of a button, you switch the user interface to French. You’re already halfway to a sale.)

In addition to its value as a sales tool, the ability to change language on the fly is also a valuable tool for validating a new translation. It’s particularly useful when the correct translation of a word depends heavily upon its context. When working through the string file, the translator can’t see the context, so having the ability to operate the product and switch back and forth between languages is invaluable.

An international approach

Being quite a few years wiser than when Lesson #3 was learned, I was determined to come up with a scheme that would address as many of the aforementioned problems as possible and add the ability to switch languages on the fly. What follows is my solution. It’s not perfect-but it is a lot better than any of the previous attempts.

The first decision I made was to separate the string retrieval mechanism and the string storage technique. There would be no more global strings array. Instead, strings would be accessed through a function call. This access function would take a string number as an argument and return a pointer to the desired string. Its provisional prototype is:

char const * strGet(int string_no);

This abstraction gave me freedom to implement the data storage part of the strings in whatever manner I saw fit. In particular, I realized that implementing the storage as an array of data structures would have considerable benefit. My data structure looked like this:

#define N_LANG4

typedef struct
{
    /*
     * Maximum length
     */
    int const len; 

    /*
     * Array of pointers to language-specific string
     */
    char const * const text[N_LANG]; 

} STRING;

This arrangement offered some serious benefits. First, the maximum allowed string length for the field is stored with the text. Second, the original string and all of the various translations of it are together in one place. This makes life a lot easier for the translator. The downside is, of course, that this uses a lot more ROM than a preprocessor-based selection. In my case, I had the ROM to spare. With this data structure in hand, the strings array now looked like Listing 2.

static const STRING strings[] =
{
    {
        15,
        {
            "Jet Black ", 		/* English */
            "Rabenschwarz	", 	/* German */
            ...
        }
    },
    {
        15,
        {
            "Red ",			/* English */
            "Rot ",			/* German */
            ...
        }
    },
    ...
};

Listing 2. A better string storage structure

The access function also had another valuable property. When the application requested a string, the access function could interrogate the relevant database to find out the current language, and return the requisite pointer. Voila! Language-independent code. The access function looks something like this:

char const * strGet(int str_no)
{
    return strings[str_no].text[GetLanguage()];
}

Further improvements

The previous approach certainly solved a few of my problems. However, it did nothing for the problems of indecipherable string numbers or adding and deleting strings in the middle of the array. I needed a means of referring to the strings in a meaningful manner, together with a way of automatically re-indexing the string table. My solution was to use an enumerated type. The members of the enumeration are given the same name as the strings to which they refer. An example should help clarify this.

Assume the first four strings that appear in the strings array are “Jet Black,” “Red,” “Pale Grey,” and “Yellow.” To display “Red,” I would have to call:

WrStr(strGet(1));

Instead, I define an enumerated list as follows:

typedef enum { JET_BLACK, RED, PALE_GREY, YELLOW, ... } STR;

I now change the prototype of strGet() to (with the changes in red):

char const * strGet(STR str_no);

Thus, to display the string “Red,” the code becomes:

WrStr(strGet(RED));

Furthermore, by defining a macro, Wrstr(X) as follows,

#define Wrstr(X) WrStr(strGet((X)))

we can write:

Wrstr(RED);

This is just as readable as the original WrStr(“Red”), but without any of the aforementioned problems. Furthermore, this technique allows one to insert or delete strings at will. For instance, I could insert “Pink” before “Red” in the strings array, do the same in the enumeration list, and recompile-and none of the code should be broken.

This was a major step forward, because I now had a system that met most of my goals

  • No global data
  • Easy to add additional languages
  • Language selection on the fly
  • Code is meaningful
  • Allows strings to be inserted and deleted at will

Gotchas of enumerated types

However, a couple of “gotchas” arise when using enumerated types in this way. The first, and most important, is portability. ANSI C requires only that the first 31 identifiers in a name be significant. If you can guarantee that all of your strings are shorter than this, there is no problem. If you cannot make that guarantee, these are some of your options:

  • Switch to a compiler that allows unlimited identifier lengths. Many compilers do have this feature.
  • Ensure that all strings are unique within the first 31 characters. Note that if they aren’t, the compiler should issue a re-declaration warning.

The next issue to watch out for occurs when you have a large number of strings. ANSI allows the compiler writer to implement enumerated types using an implementation-defined integer type. Thus, technically speaking, a compiler could limit the number of items in an enumeration to 127 (the largest positive number that can fit into an 8-bit integer). Thus, if you have rigid portability constraints, this technique may be problematic. However, practically speaking, most compilers appear to implement enumerations either as an int, or as the smallest integral type that can accommodate the enumerated list.

The third problem I ran into concerns the limited number of legal characters that can make up an identifier (that is, a-z, A-Z, 0-9, and _). For instance, it is impossible to exactly reproduce the string “3 Score & 10!”. In situations like this, I used _ wherever I couldn’t make the exact substitution. Thus, the enumerated value for “3 Score & 10!” would be _3_SCORE___10_, or possibly _3_SCORE_AND_10_. This isn’t perfect, but it’s still better than a meaningless identifier such as STRING_49.

The final issue was the absolute necessity of keeping the string file and the enumerated list synchronized. This proved to be quite difficult. To aid the process, I modified the string table slightly to include the enumerated type name in the comment field. Next, I ensured that the last entry in the enumerated list was always LAST_STR. This allowed the string array to be changed from being an incomplete declaration to a complete declaration. This means that the compiler will complain if the number of elements in the enumerated list does not exactly match the number of elements in the string array. This proved to be valuable in keeping the two files synchronized.

The winning design

The final enumerated list and string table declarations are as shown in Listing 3.

typedef enum
{
    JET_BLACK, RED, PALE_GREY, YELLOW, ..., LAST_STR
} STR;

static const STRING strings[LAST_STR] =
{
    { /* JET_BLACK */
        15,
        {
            "Jet Black ",		/* English */
            "Rabenschwarz ",	/* German */
            ...
        }
    },
    { /* RED */
        15,
        {
            "Red ",			/* English */
            "Rot ",			/* German */
            ...
        }
    },
    ...
};

Listing 3: Final string storage structure

I did all of this manually, but you could certainly develop a script to automate the process of creating the enumerated list in the header file (from the contents of the string file).

Having gone through this exercise, I realized that I could make a bit more use of enumerated lists to make my code more readable and more maintainable. When the string data structure was introduced, a manifest constant N_LANG was used to specify the number of languages supported. A better approach is as follows (with the changes in red):

typedef enum
    { ENGLISH, FRENCH, GERMAN, SPANISH, LAST_LANGUAGE }
    LANGUAGE;

Now, the STRING data structure is defined as:

typedef struct
{
    /*
     * Maximum length
     */
    int const len; 

    /*
     * Array of pointers to language-specific string
     */
    char const * const text[LAST_LANGUAGE]; 

} STRING;

This change may look minor, but it makes adding another language more intuitive.

So far, I haven’t mentioned the utility of storing the maximum string length in the STRING data structure. The use of this field arises when ROM is not plentiful, such that it is necessary to store strings without padding out to the maximum allowed field width. In cases like this, one has to be careful to clear the entire field before writing the string. This may be accomplished by using the string len parameter. If you can afford to pad all strings out to the allowed field width, it is permissible to drop the len parameter from the data structure.

Well, that’s my fourth attempt at producing an international product. After three disasters, I’m reasonably confident that this latest attempt will at least make it into the “not bad” category. If you have any refinements that you would care to share, please e-mail me. In the meantime, I’m going to ponder how to elegantly and robustly support languages such as Chinese, Arabic and Russian in embedded systems. If I manage to find a reasonable solution, I’ll let you know.


This article was published in the February 2001 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:

Jones, Nigel. “Support Multiple Languages” Embedded Systems Programming, February 2001.