Archive for May, 2009

Efficient C Tips #9 – Use lookup tables

Thursday, May 28th, 2009 Nigel Jones

This the ninth in a series of tips on how to make your C code more efficient.

(Note if you are looking for basic information on lookup tables, you should read this).

Typically the fastest ways to compute something on a microcontroller is to not compute it all – but to simply read the result from a lookup table. For example this is regularly done as part of CRC calculations. Despite this I’ve noticed over the years what I’ll call the ‘look up tables are boring’ syndrome. What do I mean by this? Well when having to code a solution to a problem, it seems that most of us would rather code something that involves crunching numbers, rather than generate a table where we just look up the result. I’m sure that many of you are thinking that I’m dead wrong and that you use lookup tables all the time. Well I’m sure many of you do. However the question is whether you make full use of this capability?

What started me thinking about this is the person who ended up on this blog looking for an efficient algorithm for determining the day of the year. I have no idea if they were coding for an embedded system or not, nor whether they were looking for a fast solution, a minimal memory solution, or something in between. However, it did make me realize that it would be a simple albeit slightly contrived way of demonstrating my point about look up tables.

First off, I imposed some constraints

  • The Gregorian calendar is to be used.
  • Days, months and years are numbered from 1 and not zero, such that January 1 is day 1 and not day 0.

Here’s my first solution, that makes use of a small lookup table:

#define JAN_DAYS (31)
#define FEB_DAYS (28)
#define LY_FEB_DAYS (29)
...
#define NOV_DAYS (30)
#define MONTHS_IN_A_YEAR (12+1)

uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
 static const uint16_t days[2][MONTHS_IN_A_YEAR] =
 {
  { /* Non leap year table */
   0,    /* Padding because first month is not zero */
   0,    /* If month is january, then no days before it */
   JAN_DAYS,  JAN_DAYS + FEB_DAYS,...JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
  },
  { /* Leap year lookup table */
   0,    /* Padding because first month is not zero */
   0,    /* If month is January, then no days before it */
   JAN_DAYS,  JAN_DAYS + LY_FEB_DAYS,  ...JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
  }
 };

 uint16_t day_of_year;

 if ((year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0))
 {
   /* Leap year */
   day_of_year = days[1][month] + day;
 }
 else
 {
  /* Non leap year */
  day_of_year = days[0][month] + day;
 }
 return day_of_year;
}

For most applications I think this is an optimal solution in that it handles a very wide range of dates, uses a small amount of storage for the lookup tables and requires minimal computational effort to achieve the result. (On an ARM7 it requires 128 bytes of code space, 64 bytes for the lookup table and executes in about 40 cycles). However, what about if the code had to run as fast as possible? I’d guess that most folks would work on optimizing the details of the implementation and leave it at that. I’m not sure that many people would consider a gigantic look up table so that the code looks like this:

#define LAST_YEAR (2400 + 1)/* Last year to worry about */
uint16_t day_of_year(uint8_t day, uint8_t month, uint16_t year)
{
 static const uint16_t days[LAST_YEAR][[MONTHS_IN_A_YEAR] =
 {
  {
   /* Padding because first year is not zero */ 0,
   /* Padding because first month is zero */ 0,
   /* If month is January, then no days before it */
   JAN_DAYS,  JAN_DAYS + FEB_DAYS,...JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
  },

  {
    /* Year 1 - non leap year */0,
    /* Padding because first month is zero */0,
    /* If month is January, then no days before it */
    JAN_DAYS,  JAN_DAYS + FEB_DAYS,...JAN_DAYS + FEB_DAYS + ... + NOV_DAYS
  },

 ... 

  {
   /* Last year - a leap year */ 0,
   /* Padding because first month is zero */ 0,
   /* If month is January, then no days before it */
   JAN_DAYS,  JAN_DAYS + LY_FEB_DAYS,...JAN_DAYS + LY_FEB_DAYS + ... + NOV_DAYS
  },
};

return days[year][month] + day;
}

The lookup table would of course require at least 2401 * 13 * 2 = 62426 bytes. Evidently this would likely be unreasonable on an 8 bit processor. On a 32 bit processor with 8 Mbytes of Flash – not so unreasonable. I first learned this lesson many years ago in an application that required an 8051 processor to perform a complicated refresh of multiplexed LEDs at about 1 kHz (a significant load for the 8051). The initial implementation consisted of pure code. Over the next year or so the two of us working on it realized that we could speed it up by using lookup tables. We started off with a small look up table, and by the time we were done, the table was 48K (out of the 64K available to the 8051) while the execution time was a fraction of what it had been before. Thus next time you are faced with making something run faster consider using a look up table – even if it is huge. Sometimes it’s just the best way to go.

Next Tip

Previous Tip

Home

Checking the fuse bits in an Atmel AVR at run time

Friday, May 15th, 2009 Nigel Jones

In general I try and post on topics that have broad appeal in the embedded world. Today I’m going to partially break with that tradition to show how to check the fuse bits in an Atmel AVR class processor. However, before I do so, I’d like to discuss my motivations for wanting to do this.

The AVR processor family, together with the PIC and other processor families contain fuse / configuration bits. These bits are settable only at program time and are used to configure the behavior of the processor at run time. Typical parameters that are configured are oscillator types, brown out voltage detect levels and memory partitioning. Now as I lamented in this post, there is no great way of communicating to the production staff how you want these fuse bits programmed. As a result I consider there to be a very high probability that a mistake will be made in production – and that all my efforts on crafting perfect code will thus be for naught. Thus while it is much better to prevent mistakes, if you can’t do so, then the next best thing to do is to detect them. As a result on one of the products that I am working on, I have as one of the startup tests a check to ensure that the fuse bits are indeed what they are supposed to be. While I recognize that if the fuse settings are dreadfully wrong it is unlikely that my code will run, I’m actually more concerned with the case where the fuse bits are set mostly correct – and thus that the code works most of the time.

So how do I do this on an AVR? Well if you are using an IAR compiler the work is mostly done for you. Here it is:

#include <intrinsics.h>

/* Macros to read the various fuse bytes */
#define _SPM_GET_LOW_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0000U, 0x09U)
#define _SPM_GET_HIGH_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0003U, 0x09U)
#define _SPM_GET_EXTENDED_FUSEBITS()  __AddrToZByteToSPMCR_LPM((void __flash*)0x0002U, 0x09U)

/* Structure to store the fuse bytes */
typedef struct{
uint8_t  fuse_low;      /* The low fuse setting */
uint8_t  fuse_high;     /* The high fuse setting */
uint8_t  fuse_extended; /* The extended fuse setting */
uint8_t  lockbits;      /* The lockbits */
} FUSE_SETTINGS;

/* Storage for the fuse settings will be in EEPROM */
static __eeprom __no_init FUSE_SETTINGS Fuse_Settings @ FUSE_VALUES; 

void fuses_Read(void)
{
 FUSE_SETTINGS value;

 value.fuse_low = _SPM_GET_LOW_FUSEBITS();
 value.fuse_high = _SPM_GET_HIGH_FUSEBITS();
 value.fuse_extended = _SPM_GET_EXTENDED_FUSEBITS();
 value.lockbits = _SPM_GET_LOCKBITS();
 __no_operation();

 Fuse_Settings = value;
}

The macro __AddrToZByteToSPMCR_LPM() is defined in intrinsics.h. Essentially it takes care of all the necessary finicky register usage required to read the fuse bits. You’ll also notice that I have used a macro _SPM_GET_LOCKBITS() to read the lockbits. This macro is also found in intrinsics.h. The really observant reader may wonder why there isn’t a macro in intrinsics.h for reading the fuse bits? Well there is – it’s just for reading the low fuse byte – which is all the early AVR processors had. I’ve pointed this out to IAR and they have promised to address this in the next release (thanks Steve!).

Before I leave this topic, I’ll also point out that I don’t read the fuse settings directly into EEPROM. Instead I read them into RAM and then copy the entire structure to EEPROM. I do this because writing to EEPROM messes with the same registers used for reading the fuse bits – and thus bad things happen. This also explains the __no_operation() statement before the data are copied to EEPROM.

Incidentally, I don’t know of a way to read the configuration bits of a PIC at run time. Chalk this up as one more reason why an AVR is superior to a PIC!

Home

Signed versus unsigned integers

Saturday, May 9th, 2009 Nigel Jones

If you are looking for some basic information on signed versus unsigned integers, you may also find this post useful. That being said, on to the original post…

Jack Ganssle’s latest newsletter arrived the other day. Within it is an extensive set of comments from John Carter, in which he talks about and quotes from a book by Derek Jones (no relation of mine). The topic is unsigned versus signed integers. I have to say I found it fascinating in the same way that watching a train wreck is fascinating. Here’s the entire extract – I apologize for its length – but you really have to read it all to understand my horror.

“Suppose you have a “Real World (TM)” always and forever positive value. Should you represent it as unsigned?

“Well, that’s actually a bit of a step that we tend to gloss over…

“As Jones points out in section 6.2.5 the real differences as far as C is concerned between unsigned and signed are…

” * unsigned has a larger range.

” * unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

” * mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand.

“For example I have a bit of code that generates code … and uses __LINE__ to tweak things so compiler error messages refer to the file and line of the source code, not the generated code.

“Thus I must do integer arithmetic with __LINE__ include subtraction of offsets and multiplication.

“* I do not care if my intermediate values go negative.

“* It’s hard to debug (and frightening) if they suddenly go huge.

“* the constraint is the final values must be positive.

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works. I.e. Line numbers are constrained to be positive, but that has nothing to do representation. Use the most convenient representation.

“C’s “unsigned” representation is useless as a “constrain this value to be positive” tool. E.g. A device that can only go faster or slower, never backwards:

unsigned int speed; // Must be positive.
unsigned int brake(void)
{
–speed;
}

“Was using “unsigned” above any help to creating robust error free code? NO! “speed” may now _always_ be positive… but not necessarily meaningful!

“The main decider in using “unsigned” is storage. Am I going to double my storage requirements by using int16_t’s or pack them all in an array of uint8_t’s?

“My recommendation is this…

” * For scalars use a large enough signed value. eg. int_fast32_t
” * Treat “unsigned” purely as a storage optimization.
” * Use typedef’s (and splint (or C++)) for type safety and accessor functions to ensure constraints like strictly positive. E.g.

typedef int_fast32_t velocity; // Can be negative
typedef int_fast32_t speed; // Must be positive.
typedef uint8_t dopplerSpeedImage_t[MAX_X][MAX_Y]; // Storage optimization

I read this, and quite frankly my jaw dropped. Now the statements made by Carter / Jones concerning differences between signed and unsigned are correct – but to call them the real differences is completely wrong. To make my point, I’ll first of all address his specific points – and then I’ll show you where the real differences are:

Unsigned has a larger range

Yes it does. However, if this is the reason you are using an unsigned type you’ve probably got other problems.

Unsigned does modulo arithmetic on overflow (which is hardly ever what you intend)

Yes it does, and au contraire – this is frequently what I want (see for example this). However, far more importantly is the question – what does a signed integer do on overflow? The answer is that it is undefined. That is if you overflow a signed integer, the generated code is at liberty to do anything – including deleting your program or starting world war 3. I found this out the hard way many years ago. I had some PC code written for Microsoft’s Version 7 compiler. The code was inadvertently relying upon signed integer overflow to work a certain way. I then moved the code to Watcom’s compiler (Version 10 I think) and the code failed. I was really ticked at Watcom until I realized what I had done and that Watcom was perfectly within their rights to do what they did.

Note that this was not a case of porting code to a different target. This was the same target – just a different compiler.

Now let’s address his comment about modulo arithmetic. Consider the following code fragment:

uint16_t a,b,c, res;

a = 0xFFFF; //Max value for a uint16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow
res -= c;

Does res end up with the expected value of 0xFFFE? Yes it does – courtesy of the modulo arithmetic. Furthermore it will do so on every conforming compiler.

Now if we repeat the exercise using signed data types.

int16_t a,b,c, res;

a = 32767; //Max value for a int16_t
b = 1;
c = 2;

res = a;
res += b; //Overflow - WW3 starts
res -= c;

What happens now? Who knows? On your system you may or may not get the answer you expect.

Mixing signed and unsigned operands in an expression involves arithmetic conversions you probably don’t quite understand

Well whether I understand them or not is really between me and Lint. However, the key thing to know is that if you use signed integers by default, then it is really hard to avoid combining signed and unsigned operands. How is this you ask? Well consider the following partial list of standard ‘functions’ that return an unsigned integral type:

  • sizeof()
  • offsetof()
  • strcspn()
  • strlen()
  • strpsn()

In addition memcpy(), memset(), strncpy() and others also use unsigned integral types in their parameter lists. Furthermore in embedded systems, most compiler vendors typedef IO registers as unsigned integral types. Thus any expression involving a register also includes unsigned quantities. Thus if you use any of these in your code, then you run a very real risk of running into signed / unsigned arithmetic conversions. Thus IMHO the usual arithmetic conversions issue is actually an argument for avoiding signed types – not the other way around! So what are the real reasons to use unsigned data types? I think these reasons are high on my list:

  • Modulus operator
  • Shifting
  • Masking

Modulus Operator

One of the relatively unknown but nasty corners of the C language concerns the modulus operator. In a nutshell, using the modulus operator on signed integers when one or both of the operands is negative produces an implementation defined result. Here’s a great example in which they purport to show how to use the modulus operator to determine if a number is odd or even. The code is reproduced below:

int main(void)
{
 int i;

 printf("Enter a number: ");
 scanf("%d", &i);

 if( ( i % 2 ) == 0) printf("Even");
 if( ( i % 2 ) == 1) printf("Odd");

 return 0;
}

When I run it on one of my compilers, and enter -1 as the argument, nothing gets printed, because on my system -1 % 2 = -1. The bottom line – using the modulus operator with signed integral types is a disaster waiting to happen.

Shifting

Performing a shift right on a signed integer is implementation dependent. What this means is that when you shift right you have no idea whether the sign bit is preserved or if it is propagated. The implications of this are quite profound. For example, if foo is an unsigned integral type, then a shift right is equivalent to a divide by 2. However, if foo is a signed type, then a shift right is most certainly not the same as a divide by 2 – and will generate different code. It’s for this reason that Lint, MISRA and most good coding standards will reject any attempt to right shift a signed integral type. BTW while left shifts on signed types are safer, I really don’t recommend them either.

Masking

A similar class of problems occur if you attempt to perform masking operations on a signed data type.

Finally…

Before I leave this post, I just have to comment on this quote from Carter

“Either I must be _very_ careful to code and test for underflows _before_ each operation to ensure intermediate results do not underflow. Or I can say tough, convert to 32bit signed int’s and it all just works”.

Does anyone else find this scary? He seems to be advocating that rather than think about the problem at hand, he’d rather switch to a large signed data type – and trust that everything works out OK. He obviously thinks he’s on safe ground. However consider the case where he has a 50,000 line file (actually 46342 to be exact). Is this an unreasonably large file – well yes for a human generated file. However for a machine generated file (e.g. an embedded image file), it is not unreasonable at all. Furthermore let’s assume that his computations involve for some reason a squaring of the number of lines in the file: i.e. we get something like this:

int32_t lines, result;

lines = 46342;
result = lines * lines + some_other_expression;

Well 46342 * 46342 overflows a signed 32 bit type – and the result is undefined. The bottom line – using a larger signed data type to avoid thinking about the problem is not recommended. At least if you use an unsigned type you are guaranteed a consistent answer.

Home

Doxygen

Saturday, May 2nd, 2009 Nigel Jones

Today’s post was inspired by a new version notice from Dimitri van Heesch concerning his great documentation generator tool doxygen. If you aren’t aware of doxygen, then I strongly recommend reading about it and then using it.

So what is Doxygen exactly? Well it has a lot of capabilities, but in a nutshell it can parse your code (C, C++, Java and a host of others not usually used in the embedded space) and from it generate a very nice hyper-linked documentation set. It does this in part by looking for what I’ll call control directives embedded in comments. Now what I particularly like about Doxygen is that it allows you to trade off between adding control directives while still making your comments readable. For example, at one extreme you can do nothing special to your code and still end up with a reasonable documentation set. On the other extreme, you can embed so many control directives into your comments that the only sane way to read the comments is via Doxygen; however the documentation will be truly impressive! In my case, I find control directives to be very distracting, and so I opt to use a minimal set that doesn’t offend my sensibilities but still gives me very useful results.

So why do I do this? Well while this documentation set is very nice in its own right, I actually find it very useful in improving my code. As remarkable a claim as this is, it’s easily substantiated. Here are a few examples:

Call Trees

One of the very nice add-ons to Doxygen is graphviz. Using graphviz, Doxygen will generate call trees for all of your functions. I often find this very illuminating – both at a macro level and also a micro level. At the macro level, if I see a call tree that looks like your average two years old’s art work, then it’s a clear indication of muddled thinking – and impending doom. At the micro level it allows you to spot some errors. For example consider this code fragment, that is intended to update a parameter in an EEPROM data structure, together with its backup copy:

void params_NosChargesSet(uint16_t nos_charges)
{
 Factory_Params1.n_charges = nos_charges;
 update_factory1_crc();
 Factory_Params2.n_charges = nos_charges;
 update_factory1_crc();
}

I found the bug in this code not by testing it, but by simply browsing the Doxygen documentation and noticing that the call tree for this function was incorrect. What I liked about this is that this kind of bug is very difficult to detect through testing, and will not be noticed by static analysis. It was however clear as day by looking at its call tree.

Missing documentation

Sometimes when I’m anxious to solve ‘the real problem’, I find that I’m not as diligent as I should be about describing the use of manifest constants, variables etc. As a result I’ll sometimes end up with code that looks like this:

#define SHORT_TERM_BUF_SIZE (8U) /**< meaningful comment */
#define LONG_TERM_BUF_SIZE (32U)

You’ll notice that LONG_TERM_BUF_SIZE has no comment associated with it. However, it’s “obvious” what its use is because of the comment associated with SHORT_TERM_BUF_SIZE that immediately precedes it. Well when you generate the Doxygen documentation, and you click on the hyperlink associated with LONG_TERM_BUF_SIZE, guess what – no description. While some may think that this is a weakness in Doxygen, I actually think it’s a major strength. Here’s why:

  • My coding standard requires me to provide a comment for all manifest constants. Thus it is reminding me of the error of my ways.
  • Someone new coming to the code will typically be overwhelmed by what they are faced with. Having an ‘implicit comment’ is just one more hurdle for them to overcome. Thus Doxygen is accurately reflecting what someone will see when they read your code.

Is Doxygen perfect? No it’s not. It often hangs when I run it. However to be fair, that’s usually because I haven’t played by the rules. Despite this I find it a useful tool in my arsenal. I recommend you take a look at it.

Home