embedded software boot camp

Lowering power consumption tip #4 – transmitting serial data

Thursday, May 20th, 2010 by Nigel Jones

This is the fourth in a series of tips on lowering power consumption in embedded systems. For this post I thought I’d delve into the common task of transmitting serial data. I compare polling and interrupting and show you how a hybrid approach can sometimes be optimal.

Almost every embedded system I have ever worked on has contained serial links. At its most abstract level, a serial link takes in parallel data and converts it to a serial stream. This serialization inherently takes longer than the write to the register that holds the data and thus to send multiple bytes back to back there is an inevitable delay. The process thus looks like this:

Store data to be transmitted
Wait for data to be sent out
Store data to be transmitted
Wait for data to be sent out
...

Store data to be transmitted
Wait for data to be sent out

From a power consumption perspective, the question is – how best to wait for the data to be sent out? Well, you have four basic approaches – open loop, polling, interrupting or a hybrid combination.  In assessing them from a power consumption perspective, what I look at is how many non-useful clock cycles I have to execute in order to transmit a byte of data.

Open Loop

I use the term open loop to describe a technique whereby you make use of the properties of a synchronous link to know (actually more accurately presume) that it is safe to send the next byte. This technique is only of use when the transmit frequency is very high in comparison to the CPU speed. For example, consider an SPI link between a CPU and a peripheral. In many cases, this link may be clocked at up to half the CPU clock frequency. In which case it takes a mere 16 CPU clocks to shift out an 8 bit datum. As a result one can simply delay 16 clock cycles between writing successive bytes. The code looks something like this:

SBUF = datum[0];
delay(16 - LOAD_TIME);
SBUF = datum[1];
delay(16 - LOAD_TIME);
...

LOAD_TIME is a constant that takes into account the number of cycles required to get the next datum from memory and write it to SBUF. Thus the number of non-useful clock cycles per byte is (16- LOAD-TIME).

Now most of you are probably thinking that I’m nuts for advocating this approach – and I’d tend to agree with you! It’s a technique I’ve only used a few times – and then only when I had to get the data out with the least possible latency and with the least amount of power consumed. I much prefer the next technique which can be almost as efficient – but a lot safer.

Polling

Polling differs from the open loop approach in that one polls a status register to determine when it is safe to write the next byte. This can be quite power efficient as long as, just for the previous example,  the transmit speed is very high in comparison to the CPU speed. Thus the SPI link given in the open loop example is also a good candidate for this approach. The code looks something like this:

SBUF = datum[0];
wait_for_sbuf_empty();
SBUF = datum[1];
wait_for_sbuf_empty();
...

The key to making this approach as efficient as possible is to code the wait function so that you read the status register on the first clock after you expect SBUF will become available.  In other words you still use a pre-calculated delay, but you throw in a check of the status register just to make sure before you load the next byte. By pre-fetching the next byte to be loaded and doing some other tweaking it’s often possible to get this approach almost as efficient as the open loop method. Notwithstanding these optimizations, the number of non-useful polling clock cycles will be greater than the number of CPU clocks required to transmit the data.

Interrupting

When the transmit frequency starts to slow down with respect to the CPU frequency, then the number of non-useful clock cycles quickly starts to rise if one uses a polling method. The classic example of this is of course asynchronous serial links running at standard baud rates.  In these cases, the transmit time is a large fraction of a millisecond and a polling approach consumes a huge number of CPU cycles (and hence power). The solution here is of course to turn to an interrupt driven approach. In this case the over-head of the ISR is ‘non-useful’ clock cycles.  As I showed in this article the overhead of even a simple looking ISR can be quite significant. Notwithstanding this, for asynchronous serial links, an interrupt based approach is nearly always the most efficient.

Hybrid

The final methodology is what I term the hybrid approach. It’s typically the most power efficient and is well suited to medium to fast serial links. The code for it looks like this:

SBUF = datum[0];
__sleep();
SBUF = datum[1];
__sleep();
...

__interrupt void sbuf_tx_isr(void)
{
 /* Empty */
}

In this approach, I enable the transmit interrupt, but have no code in the interrupt handler. After each write to SBUF I execute a sleep instruction, effectively stopping op code processing. Once SBUF has emptied, it generates an interrupt. The processor vectors to the empty ISR, returns immediately and then processes the next instruction which stores the next byte in SBUF. In this case the overhead is the number of clock cycles to enter and exit sleep mode, plus the number of cycles to vector to an ISR and return. Depending upon your processor architecture this can be anything from almost nothing to quite a lot. However it is always less than a full blown interrupt handler approach and is in my experience, often less than the polling or open loop methods.

Notwithstanding the above, this method has several weaknesses:

  1. It should be obvious that the only interrupt that can be enabled is the SBUF transmit interrupt. (Actually it’s more accurate to say that the only interrupt that can cause the processor to exit sleep mode is the SBUF transmit interrupt. The MSP430, for example, allows one to do this).
  2. While I don’t consider this a kludge, it’s certainly not crystal clear what is going on. Thus clear documentation is a must.

Summary

  1. If you feel the need for the utmost efficiency then go open loop. It’s a bit like drag-racing in that it’s fast, furious and undoubtedly gets you from A to B ASAP. Just don’t be surprised if you blow up in the process.
  2. If open-loop isn’t for you then polling may make sense provided you can crank up the transmit speed high enough. This makes for the simplest code – and that’s always a plus in my book.
  3. If you have an asynchronous link, then an interrupt based approach is the right answer 99% of the time.
  4. If you have a medium to high speed link, then the hybrid approach has much to commend it. Once you’ve seen it done a few times it becomes less weird looking.

Previous Tip

4 Responses to “Lowering power consumption tip #4 – transmitting serial data”

  1. Juergen says:

    You can also add a check of the status register to the last approach, that way it would be less dangerous.

  2. Miro Samek says:

    The problem with all discussed approaches, except interrupting, is that they are all taking 100% of the CPU for the job of transmitting data. This is the traditional sequential approach.

    This approach is necessary if you have no hardware assistance, so the CPU indeed needs to wait for each byte. But often you do have some sort of a FIFO (sometimes even a DMA) to offload transmission to a peripheral.

    If you do have such hardware, you could use another “hybrid” approach which is to poll the peripheral from a periodic interrupt. Please note that I don’t mean here interrupting for every single byte, which is highly inefficient and often inferior to simple polling.

    For example, a UART with a 16-byte FIFO transmitting at the high rate of 115200 baud needs CPU to reload the FIFO only every 1.38 ms. So if you interrupt every 1.2-1.3 ms (~800 Hz), you can keep the FIFO constantly full. An interrupt rate of 800Hz is high, but manageable (as opposed to interrupting at 12kHz for every single byte, which is highly inefficient).

    My point is that one should avoid if only possible tying up the CPU completely for longer periods of time. Such design is not only inefficient in terms of power consumption, but more importantly makes the system unresponsive. Too often software developers fall for this, instead of using existing hardware (such as FIFOs) or insisting that the hardware team provide an event-driven support (e.g., through an FPGA).

  3. Anonymous says:

    Uh oh, looks like WordPress decided to mangle this article as well. Everything after the first code sample has lost its line breaks.

Leave a Reply to Miro Samek

You must be logged in to post a comment.