embedded software boot camp

Tools to help lower power consumption

June 29th, 2010 by Nigel Jones

Regular readers will know that low power designs are an interest of mine. Indeed one of the very first blog posts I made lamented how difficult it is to ascertain how much energy it takes to perform various tasks typical to an embedded system. Thus it was a pleasant surprise to receive an IAR newsletter today announcing a tool (‘Power debugging’) that is explicitly designed to help one lower a system’s power consumption. The tool isn’t available yet, but if the propaganda is to be believed it should be a very interesting adjunct to the debugging arsenal. The sign up procedure to beta test the tool doesn’t seem to work properly, but on the assumption that I made it onto the beta tester list I will  post a review once I get my hands on it.

BTW I have to admit I found the name of the article / tool (‘Power debugging’) a bit confusing in the sense that I interpreted power in the vernacular sense (e.g. ‘power walking’, ‘power breakfast’) rather than the engineering sense. I guess I’m just a victim of so much marketing hyperbole that I can’t recognize plain talk any more. Oh well!

Evaluating embedded code

June 20th, 2010 by Nigel Jones

One of the interesting aspects of being an embedded systems consultant is that I get to look at a lot of code written by others. This can come about in a number of ways, but most commonly occurs when someone wants changes made to an existing code base and the original author(s) of the code are no longer available. When faced with a situation such as this, it is essential that I quickly get a sense for how maintainable the code is – and thus how difficult changes will be. As a result I have developed a few techniques to help me assess code maintainability which I thought I’d share with you.

SourceMonitor

After installing the code on my system, the first thing I do is run SourceMonitor over the code. SourceMonitor is a free utility that computes various metrics. The metrics and their values for a typical code base of mine are shown below.

Number of Files: 476

Lines of code: 139,013

Statements: 61,144

% branches: 6.3%

% comments: 41.7%

Functions: 2,509

Average statements / function: 11.8

Max Complexity: 158

Max depth: 9+

Average depth: 0.54

Average complexity: 2.38

Probably the only thing that needs explanation is ‘complexity’. The author of SourceMonitor is not computing the McCabe complexity index, but rather is computing complexity based upon Steve McConnel’s methodology. The details of the implementation aren’t particularly important to me, as I’m more interested in comparative values.

While SourceMonitor helps give me the big picture, it is nowhere near enough – and this is where it gets interesting.

Optimization Level

The next thing I look at are the optimization levels being used. These can be very revealing. For example if a high level of optimization is being used for the debug build then it might be indicative that a non-optimized build either will not fit into the available memory, or possibly that the code doesn’t run fast enough unless optimization is turned on. Either is indicative of a system that is probably going to be tough to maintain. Conversely if the release build doesn’t use full optimization then I take this to mean that the code probably doesn’t work when optimization is turned on. I have written about this in the past and consider this to be a major indicator of potential code quality problems.

C-V Qualifiers

Having looked at the optimization levels, I then perform a grep on the code base looking for the number of instances of ‘volatile‘ and ‘const‘. If the number of instances of volatile is zero (and it often is) and the optimization level is turned way down, then it’s almost certain that the author of the code didn’t understand volatile and that the code is riddled with potential problems. Whenever this happens, I get a sinking feeling because if the author didn’t understand volatile, then there is no chance that he had any appreciation for race conditions, priority inversion, non-atomic operations etc. In short, the author was a PC programmer.

The ‘const‘ count is less revelatory. If the author makes use of const then this is normally an indicator that they know their way around the compiler and understand the value of defensive programming. In short I take the use of const to be very encouraging. However, I can say that I have known some excellent embedded systems programmers who rarely used const, and thus its absence doesn’t fill me with the same despair as the absence of volatile.

Incidentally in my code base described above, there are 53 incidences of the use of ‘volatile’ (note that I have excluded compiler vendor supplied header files which define all the various hardware registers as volatile). There are also 771 incidences of the the use of const.

Static qualifiers

Regular readers of this blog will know I am a big fan of the ‘static‘ qualifier. Static not only makes for safer and more maintainable code, it also makes for faster code. In fact, IMHO the case for static is so overwhelming that I find its absence or infrequent use a strong indicator that the author of the code was an amateur. In my example code base, static appears 1484 times.

Case statements

Regular readers of this blog also know that I am not a big fan of the case statement. While it has its place, too often I see it used as a substitute for thought. Indeed I have observed a strong inverse correlation between programmer skill and frequency of use of the case statement. As a result, I will usually run a grep to see what the case statement frequency is. In my example code, a case statement occurs 683 times, or once every 90 statements.

Compilation

All of the above ‘tests’ can be performed without compiling the code. In some cases I own the target compiler (or can download an evaluation copy), in which case I will of course attempt to compile the code. When I do this I’m looking for several things:

  1. An absence of compiler warnings / errors. Alan Bowens has written concisely and eloquently on this topic. The bottom line – compilation warnings in the release build are a major issue for me. Note that I’m more forgiving of compiler warnings in the debug build, since by its nature debug often ignores things such as inline commands, which can generate warnings on some compilers.
  2. The compilation speed. Massive files containing very large functions compile very slowly. They are also a bear to maintain.
  3. The final image size. This is relevant both in absolute terms (8K versus 128K versus 2M) and also in comparison to the available memory. Small images using a small percentage of the available memory are much easier to maintain than large images that nearly fill the available memory.

Lint

The final test that I perform only rarely is to Lint the code base. I do this rarely because quite frankly it takes a long time to configure PC-Lint. Thus only if I have already created a PC-Lint configuration file for the target compiler do I perform this step. Previously un-linted code will always generate thousands of warnings. However, what I’m looking for are the really serious warnings – uninitialized variables, indexing beyond the end of an array, possible null pointer dereferences etc. If any of these are present then I know the code base is in bad shape.

I can typically run the above tests on a code base in an hour or so. At the end of it I usually have a great idea of the overall code quality and how difficult it will be to modify. I would be very interested to hear from readers that are willing to perform the same tests on their code base and to publish the results. (Incidentally, I’m not trying to claim that my metrics are necessarily good – they are intended merely as a reference / discussion point).

Lowering power consumption tip #4 – transmitting serial data

May 20th, 2010 by Nigel Jones

This is the fourth in a series of tips on lowering power consumption in embedded systems. For this post I thought I’d delve into the common task of transmitting serial data. I compare polling and interrupting and show you how a hybrid approach can sometimes be optimal.

Almost every embedded system I have ever worked on has contained serial links. At its most abstract level, a serial link takes in parallel data and converts it to a serial stream. This serialization inherently takes longer than the write to the register that holds the data and thus to send multiple bytes back to back there is an inevitable delay. The process thus looks like this:

Store data to be transmitted
Wait for data to be sent out
Store data to be transmitted
Wait for data to be sent out
...

Store data to be transmitted
Wait for data to be sent out

From a power consumption perspective, the question is – how best to wait for the data to be sent out? Well, you have four basic approaches – open loop, polling, interrupting or a hybrid combination.  In assessing them from a power consumption perspective, what I look at is how many non-useful clock cycles I have to execute in order to transmit a byte of data.

Open Loop

I use the term open loop to describe a technique whereby you make use of the properties of a synchronous link to know (actually more accurately presume) that it is safe to send the next byte. This technique is only of use when the transmit frequency is very high in comparison to the CPU speed. For example, consider an SPI link between a CPU and a peripheral. In many cases, this link may be clocked at up to half the CPU clock frequency. In which case it takes a mere 16 CPU clocks to shift out an 8 bit datum. As a result one can simply delay 16 clock cycles between writing successive bytes. The code looks something like this:

SBUF = datum[0];
delay(16 - LOAD_TIME);
SBUF = datum[1];
delay(16 - LOAD_TIME);
...

LOAD_TIME is a constant that takes into account the number of cycles required to get the next datum from memory and write it to SBUF. Thus the number of non-useful clock cycles per byte is (16- LOAD-TIME).

Now most of you are probably thinking that I’m nuts for advocating this approach – and I’d tend to agree with you! It’s a technique I’ve only used a few times – and then only when I had to get the data out with the least possible latency and with the least amount of power consumed. I much prefer the next technique which can be almost as efficient – but a lot safer.

Polling

Polling differs from the open loop approach in that one polls a status register to determine when it is safe to write the next byte. This can be quite power efficient as long as, just for the previous example,  the transmit speed is very high in comparison to the CPU speed. Thus the SPI link given in the open loop example is also a good candidate for this approach. The code looks something like this:

SBUF = datum[0];
wait_for_sbuf_empty();
SBUF = datum[1];
wait_for_sbuf_empty();
...

The key to making this approach as efficient as possible is to code the wait function so that you read the status register on the first clock after you expect SBUF will become available.  In other words you still use a pre-calculated delay, but you throw in a check of the status register just to make sure before you load the next byte. By pre-fetching the next byte to be loaded and doing some other tweaking it’s often possible to get this approach almost as efficient as the open loop method. Notwithstanding these optimizations, the number of non-useful polling clock cycles will be greater than the number of CPU clocks required to transmit the data.

Interrupting

When the transmit frequency starts to slow down with respect to the CPU frequency, then the number of non-useful clock cycles quickly starts to rise if one uses a polling method. The classic example of this is of course asynchronous serial links running at standard baud rates.  In these cases, the transmit time is a large fraction of a millisecond and a polling approach consumes a huge number of CPU cycles (and hence power). The solution here is of course to turn to an interrupt driven approach. In this case the over-head of the ISR is ‘non-useful’ clock cycles.  As I showed in this article the overhead of even a simple looking ISR can be quite significant. Notwithstanding this, for asynchronous serial links, an interrupt based approach is nearly always the most efficient.

Hybrid

The final methodology is what I term the hybrid approach. It’s typically the most power efficient and is well suited to medium to fast serial links. The code for it looks like this:

SBUF = datum[0];
__sleep();
SBUF = datum[1];
__sleep();
...

__interrupt void sbuf_tx_isr(void)
{
 /* Empty */
}

In this approach, I enable the transmit interrupt, but have no code in the interrupt handler. After each write to SBUF I execute a sleep instruction, effectively stopping op code processing. Once SBUF has emptied, it generates an interrupt. The processor vectors to the empty ISR, returns immediately and then processes the next instruction which stores the next byte in SBUF. In this case the overhead is the number of clock cycles to enter and exit sleep mode, plus the number of cycles to vector to an ISR and return. Depending upon your processor architecture this can be anything from almost nothing to quite a lot. However it is always less than a full blown interrupt handler approach and is in my experience, often less than the polling or open loop methods.

Notwithstanding the above, this method has several weaknesses:

  1. It should be obvious that the only interrupt that can be enabled is the SBUF transmit interrupt. (Actually it’s more accurate to say that the only interrupt that can cause the processor to exit sleep mode is the SBUF transmit interrupt. The MSP430, for example, allows one to do this).
  2. While I don’t consider this a kludge, it’s certainly not crystal clear what is going on. Thus clear documentation is a must.

Summary

  1. If you feel the need for the utmost efficiency then go open loop. It’s a bit like drag-racing in that it’s fast, furious and undoubtedly gets you from A to B ASAP. Just don’t be surprised if you blow up in the process.
  2. If open-loop isn’t for you then polling may make sense provided you can crank up the transmit speed high enough. This makes for the simplest code – and that’s always a plus in my book.
  3. If you have an asynchronous link, then an interrupt based approach is the right answer 99% of the time.
  4. If you have a medium to high speed link, then the hybrid approach has much to commend it. Once you’ve seen it done a few times it becomes less weird looking.

Previous Tip

Considerate coding

May 3rd, 2010 by Nigel Jones

One of my major recreational pursuits is bike riding. I live in a rural area with some great terrain, and more to the point a very low traffic density. Naturally on a 5 or 6 hour ride one does encounter some traffic and I’m always struck by the different degrees of consideration afforded to cyclists by motorists. Some are extremely solicitous and will wait so that they can pass you slowly and with a wide separation; others are complete jerks and will pass you as close and as fast as possible, often sounding their horn as they go by. Then there are the bulk of the drivers who will attempt to give you as much room as possible commensurate with slowing them down as little as possible. I was pondering this view of human nature yesterday while out on a ride, when it occurred to me that I see a similar range of consideration when it comes to embedded software. To see what I mean, read on …

I’ve mentioned several times in this blog that the main purpose of source code is not to generate a binary image but rather to educate those that come after you (including possibly a future version of yourself). You may or may not subscribe to this belief. However once you realize that source code often has a life of decades, and that the same source code may end up in dozens of products, then perhaps you may start to change your mind. So with this said, I think I can make a number of observations.

  1. It may be obvious to you, the author of the code, what the intended compilation platform is – after all it’s the one you are using. Alas it is not obvious to someone else who has been handed the source code and told to use it. ( I ran into this problem six months ago in which I had a vary large ARM project – but with no indication of which ARM compiler it was intended for).
  2. It may also be obvious to you what hardware platform the code is intended for – again it’s the one you are working on.
  3. It may also be obvious to you that the way to build the various targets is to change to directory X and invoke command Y with parameter Z – after all you do it ten times a day.
  4. It may also be obvious to you that the 27 warnings produced during the final build are benign – as after all you have checked them out.

However none of the above is clear to someone 5 years from now!

Clearly the above is just a partial list of what I call implicit information about a project. That is information that is essential to being able to use the code base, but which is often omitted from the documentation by the author. It’s my contention that the degree to which you explicitly provide this implicit information governs whether you are a jerk, a typical coder, or a considerate coder. Most of us (myself included) are typical coders (and I know this because I’ve seen a lot of code). If you want to make the move up to being a considerate coder, then here’s a few things I suggest you do.

  1. Place all the implicit information in main.c. Why is this you ask? Well if I was to dump three hundred source files on you, which one would you look at first? (An acceptable alternative is to state in main.c that useful information may be found in file X. Be aware however that non obvious source files sometimes get stripped out of source code archives).
  2. Include in main as a minimum information about the compiler (including its version), the intended hardware target, and how to build the code.
  3. Think for a minute or two about all the other information you are implicitly using in writing the source code and building it – and take the time to include it in main.c. Typically this includes additional tools, scripts etc.
  4. For an excellent discourse on why leaving warnings in your code is downright inconsiderate, see this posting from Alan Bowens.

If you do the above, then you are well on the way to becoming a ‘considerate coder’. Will doing this get you a pay increase, or at least a pat on the back from the boss – probably not. However just like the person who slows down and passes cyclists with a wide berth, you can go home at night knowing you aren’t a jerk. That has to be worth something.

Efficient C Tip #12 – Be wary of switch statements

April 10th, 2010 by Nigel Jones

This is the twelfth in a series of tips on writing efficient C for embedded systems. Like the previous topic, I suspect that this will be a bit controversial. As the title suggests, if you are interested in writing efficient C, you need to be wary of switch statements. Before I explain why, a little background will be useful. I did all of my early embedded systems programming in assembly language. This wasn’t out of some sense of machismo, it was simply a reflection of the fact that there were no high level languages available (with the possible exception of PL/M). Naturally as well as programming embedded systems I also did computer programming, initially in Pascal and BASIC, and later in C. One of the major differences I found in using the HLL was the wonderful switch / case statement. I found it to be a beautiful tool – with a few lines of source code I could do all sorts of powerful things that were simply very difficult or tedious to do in assembly language. Fast forward a number of years and C compilers began to become available for small embedded systems and so I naturally started using them, together with of course the attendant switch statement. All was well in paradise until the day I used a switch statement in an interrupt service routine and found to my horror that the ISR was taking about ten times longer to execute than I thought was reasonable.

This precipitated an investigation into how exactly switch statements are implemented by the compiler. When I did this, I discovered a number of things that should give one pause.

Heuristic Algorithms

The first thing I discovered is that compilers typically have a number of ways of implementing a switch statement. They seem to be loosely divided into the following trichotomy:

  1. An if-else-if-else-if chain. In this implementation, the switch statement is treated as syntactic sugar for an if-else-if chain.
  2. Some form of jump or control tables, or as they are sometimes called a computed goto. This is a favorite technique of assembly language programmers and the compiler writers can use it to great effect.
  3. A hybrid of 1 & 2.

Where it gets interesting is how the compiler decides which approach to use. If the case values are contiguous (e.g. zero through ten), then it’s likely the compiler will use some form of jump table. Conversely if the case values are completely disjointed (e.g. zero, six, twenty, four hundred and a thousand) then an if-else implementation is likely. However what does the compiler do when, for example, you have a bifurcated set of ranges such as zero-ten and ninety – one hundred? Well the answer is, that each compiler seems to have some form of heuristic algorithm for determining what is the ‘best’ way of implementing a given set of cases. Although some compilers allow you to force a particular implementation, for the most part you are at the mercy of the compiler.

Comparative Execution Speeds

If you think about it, it should become apparent that a jump table approach is likely to give a highly consistent time of execution through the decision tree, whereas the if-else -if chain has a highly variable time of execution depending upon the particular value of the switched variable.  Notwithstanding this, the jump table approach has a certain amount of execution overhead associated with it. This means that although its  mean execution time (which is normally the same as its worst and best execution time) may be dramatically better than the mean execution time of the if-else-if chain, the if-else-if chain’s best execution time may be considerably better. So what you say! Well in some cases, a particular value is far more likely to occur than the other values, thus it would be very nice if this value was tested first. However, as you will now see, this isn’t guaranteed…

Order of Execution

For many years I wrote switch statements under the assumption that the case values would be evaluated from top to bottom. That is, if the compiler chose to implement the switch statement as an if-else-if chain, then it would first test the first case, then the second case and so on down to the default case at the bottom of my source code. Well it turns out that my assumption was completely wrong. The compiler is under no such obligation, and indeed will often evaluate the values bottom to top. Furthermore, the compiler will often evaluate the default value first. For example consider a defaulted switch statement with contiguous case values in the range zero to ten. If the index variable is an unsigned int, then there are at least 65525 possible values handled by the default case, and so it makes sense to eliminate them first. Now if you know that the index variable can only possibly take on the values zero to ten, then you can of course eliminate the default statement – and then get excoriated by the coding standards / MISRA folks.

Maintenance

This is the area where I really get worried. Consider the case where you have a switch statement in an ISR. The code is working with no problems until one day it is necessary to make a change to the switch statement – by for example adding an additional case value. This simple change can cause the compiler to completely change the implementation of the switch statement. As a result, you may find that:

  1. The worst case execution time has jumped dramatically.
  2. The mean execution time has jumped dramatically.
  3. The stack space required by the ISR has jumped dramatically.

Any of these three possibilities can cause your program to fail catastrophically. Now of course one could argue ‘that’s why you test all changes’. However, in my opinion it’s far better to be proactive and to avoid putting yourself in this situation in the first place.

I’d also be remiss in not noting the dreaded missing break statement maintenance problem. However as a religious user of Lint, I’m not normally too concerned about this.

Switch statement alternatives

If performance and stability is your goal then I strongly recommend that you implement your code, the way you want it executed. This means either explicitly use an if-else-if chain or use function pointers. If function pointers scare you, then you might want to read this article I wrote on the subject.

Recommendations

Based on my experience, I have a number of things that I do when it comes to switch statements. If you find my analysis compelling, you may want to adopt them:

  1. Switch statements should be the last language construct you reach for – and not the first.
  2. Learn how to use function pointers. Once you do you’ll find a lot of the reasons for using switch statements go away.
  3. Try to keep all the case values contiguous.
  4. If you can’t keep the case values contiguous, go to the other extreme and make them disparate – that way you are less likely to have the compiler change the algorithm on you.
  5. If your compiler supports it, consider using pragmas to lock in a particular implementation.
  6. Be very wary of using switch statements in interrupt service routines or any other performance critical code.
  7. Use Lint to guard against missing break statements.

Comments welcome!

Next Tip

Previous Tip

Home