Archive for June, 2009

Thoughts on BCC's, LRC's, CRC's and being experienced

Saturday, June 20th, 2009 Nigel Jones

Those of us that have been working in this field for a long time are referred to as ‘experienced’. Experienced is taken to mean that we have been doing this for long enough that we have experienced many of the problems common to embedded systems and thus know how to solve them. Although this is true for many things, I think there is a downside to it – namely that because we’ve successfully solved a particular problem a number of times that we fall into the trap of thinking that our solution is optimal. In order to guard against this it is essential to be proactive in seeking out new solutions to old problems. To illustrate my point, I’ll take you on an abbreviated trip through the memory lane of my career when it comes to that most prosaic of problems – transmitting serial data between microcontrollers.

Back when I was a lad I was by definition naive and so I just transmitted the data without any thought to how to detect errors beyond the use of a parity bit on each byte. Well it didn’t take me long to work out that a simple parity bit wasn’t exactly a robust way of detecting errors, and so I started appending a simple additive checksum to the message.

Well that worked for a while until the day it dawned on me that an additive checksum without an initial seed value was vulnerable to a stuck channel (e.g. all zeros). From that day on I started seeding my checksum computations with initial values. I tended to favour 0x2B (with apologies to Hamlet).

Somewhere along the road I switched from performing an additive checksum to using an XOR operation. I can’t remember why I did this – but it just seemed ‘better’.

This approach served me well for many years until I started investigating cyclic redundancy checks (CRC). I’d known about CRC’s for a long time of course. However all the ones I knew about used 16 or 32 bit values and had certain wondrous but rather unspecified properties for detecting certain classes of errors. To put it bluntly they seemed like complete overkill for sending a short message between two microprocessors – and so I didn’t entertain them. However this all changed the day I came across an 8 bit CRC. This changed my perspective dramatically. An 8 bit CRC designed for protecting small messages – excellent! Thus henceforth I eschewed the use of an LRC and instead opted for an 8 bit CRC to protect my messages.

Well this continued for a number of years. I learned more about CRCs, I got older until one day I decided to ask myself the question – is the 8 bit CRC I am using optimal? For regular readers of this blog, you’ll probably have noticed that ‘optimal solutions’ is a recurring theme. Anyway, with this thought in mind, I set off on a hunt to determine whether in fact the 8 bit CRC I was using to protect small messages was indeed optimal. That’s when I came across this paper by Koopman and Chakravarty. It’s entitled ‘Cyclic Redundancy Code (CRC) Polynomial Selection for Embedded Networks’. It’s a highly readable and informative paper. They essentially investigate what constitutes ‘optimal’ for a CRC polynomial and then exhaustively explore optimal polynomials for different data lengths and different polynomial lengths. Most interestingly they slay some sacred cows along the way, including the popular CRC-8 polynomial (x8+x7+x6+x4+x1+1).

Having read the paper, I discovered that the CRC I was using (the so called ATM-8 polynomial(x8+x2+x1+1)) wasn’t bad for my application – but it wasn’t optimal. Upon reflection this was hardly surprising since I had essentially selected it on the basis that it was designed for a similar application to mine – and thus must be decent. However as Koopman shows – this can be a very foolhardy assumption. I just got lucky.

More importantly from my perspective is that using Koopman’s paper I now have a logical methodology for determining the optimal CRC for any application. Thus after close to 30 years of doing this I think I’m finally homing in on the truly optimal solution to this problem.

Of course, the larger lesson to be learned here is that just because you have done something a certain way for many years means nothing unless you know that it is the optimal way of doing it. That’s when you are truly ‘experienced’.

Do I have the technical skills to be a consultant?

Wednesday, June 17th, 2009 Nigel Jones

My previous post on being a consultant addressed the issue of how to market yourself. Today I’ll look at something a little more prosaic – how can you tell if you have the necessary technical skills to be a consultant? This post was motivated by an email I received from Victor Johns who basically asked the aforementioned question.

Before I answer this question, I should note that while technical skills are essential to being a successful consultant, they are by no means sufficient. I’ll leave it to another day to discuss the sales and business skills required to run a consulting business.

Anyway – on to the answer. Well my first and rather sardonic observation is that you don’t need to be technically competent at all. Just about every engineer I have ever met has unfortunately experienced the case of the clueless consultant – that is someone that does more harm than good. While these individuals do of course exist, they are by no means ‘successful’ as they have to spend an inordinate amount of time winning new business as no one ever hires them a second time.

If we ignore the aforementioned clueless consultant, then I think my answer depends a bit on what sort of consultant do you want to be? Some consultants are specialists and others are generalists. If you are a specialist, then essentially you are marketing yourself as the ‘go to guy’ in a narrow field. A good example might be Bluetooth. If you are promoting yourself as a Bluetooth expert then you had better know pretty much all there is to know about Bluetooth. However, what about the majority of consultants who are more generalists? In their case absolute knowledge is not as important as the ability to learn fast and to apply skills learned in one field to the field they are currently in. The reason I say this is because no sensible client will expect you to know ‘everything’ needed to do a particular job. Rather they expect that you have the fundamental skills upon which you can rapidly build in order to solve the problem. It’s for this reason that my ideal project is one with 30% ‘new stuff’. That is I know exactly how to do 70% of the project, whereas the remaining 30% will require me to learn new tools / skills.

This of course brings up the issue of how does one stay up to date? While there are many ways of doing this, I find textbooks to offer the best bang for the buck. Simply put, a $100 text book that saves me an hour on a project is a good investment. One that saves me a day is an outstanding investment. It’s for this reason that I have a stellar technical library.

As a parting comment I’ll note that we have all run into the occasional engineer who ‘knows’ they know it all – while actually being pedestrian. In my experience it’s the engineers that have a lot of confidence in their ability – but still realize that they can’t hope to ‘know it all’ that ultimately will succeed in this business. I’m talking about you Victor!

Three byte integers

Wednesday, June 10th, 2009 Nigel Jones

One of the enduring myths about the C language is that it is good for use on embedded systems. I’ve always been puzzled by this. While it is true that many other languages are dreadful for use on embedded systems, this merely means that C is less dreadful rather than ‘good’. While I have a host of issues with C, the one that constantly galls me is the lack of 3 byte integers. Using C99 notation these would be the uint24_t and int24_t data types. Now a quick web search indicates that there may be the odd compiler out there that supports 3 byte integers – but the vast majority do not.

So why exactly do I want a 3 byte integer? Well, there are two main reasons:

Intermediate results

When I look through my code, I find a huge number of incidences where I am performing an arithmetic operation on a 16 bit value, where intermediate values overflow 16 bits, yet the final value is 16 bits. For example:

uint16_t a, b;

a = (b * 51) / 64;

In this case, the code will fail if (b * 51) overflows 16 bits. As a result, I am forced to write:

a = ((uint32_t)b * 51) / 64;

However, examination of this code shows that (b * 51) could never overflow 24 bits for all 16 bit b. Thus I’d much rather write:

a = ((uint24_t)b * 51) / 64;

Now obviously on a 32 bit processor there would be zero benefit to doing this (indeed there may be a penalty). However on an 8 bit (and probably a 16 bit) processor, there would be a dramatic benefit to such a construct.

Real world values

I regularly find myself needing a static variable that requires more than 16 bits of range. However when I look at these variables they almost never require the staggering range of a 32 bit variable. Instead 24 bits would do very nicely. Needless to say I am forced to allocate 32 bits even though I know that the most significant byte will never take on anything other than zero. This is particularly galling when these variables are stored in EEPROM – with its associated cost and long write times.

Taking these two together across all the 8/16 bit embedded systems out there, the cost in wasted instruction cycles, memory, stack size and energy must be truly staggering. We could probably save a power plant or two world wide with all the energy being wasted!

So why don’t most compiler vendors support a 24 bit integer? I don’t know for sure, but I suspect it is some combination of:

  • No one has been asking for it.
  • They are more concerned with being C89 / C99 compliant than they are with being useful.
  • No one has ever implemented a compiler benchmark where support for a 3 byte integer would be useful.

If you happen to agree with me that a 3 byte integer would be very useful, then next time you see your friendly compiler vendor – complain (or at least point them to this blog). Who knows, change may yet come!


Division of integers by constants

Friday, June 5th, 2009 Nigel Jones

An issue that comes up frequently in embedded systems is division of an integer by a constant. Of course most of the time we try and arrange things such that the divisor is a power of two such that the division may be performed by shift operations. However, all too often we have to divide an integer by some non power of two value. Divisors that seem to crop up a lot are 10 & 100 (for obvious reasons), 3 (for no good reason), 60 (when dealing with time) and of course various combination’s of pi and root 2. In cases like these you can of course just code it ‘normally’ and let the compiler do the work for you. However, when you feel the need for speed, there are other techniques that are spectacularly good.

I learned about this subject in dribs and drabs over the years without ever coming across a good summary – until I located this paper by Douglas Jones (no relationship). It does a nice job of explaining most of what you need to know in order to perform division of an integer by a constant. I particularly like the fact that he has algorithms for CPUs that contain barrel shifters – and those that do not. I strongly recommend that you read the paper. One note of caution however – Jones like many academics is used to working on CPUs with 32 bit word lengths. As such, his code assumes that integers are 32 bits. If you use his code as is, then it will fail on 16 bit word length machines. It’s for reasons such as this that I really recommend everyone would use the C99 data types.

For those of you too lazy to read the paper, its basic premise is based upon the fact that division by a constant is equivalent to multiplication by the reciprocal of that constant. There is nothing of course earth shattering about this observation. However, Jones then goes ahead and explains about binary points, rounding etc in order to achieve the desired result. Since I had to reduce his paper to practice, I thought I’d go ahead and share the ‘recipe’ with you. Before doing so I should note that I work mostly with 8 & 16 bit CPUs that do not contain barrel shifters. As a result I am most interested in the techniques that use multiplication. If you are working with a 32 bit processor with a barrel shifter and an instruction cache then you should seriously look at his other implementations.

Division of a uint16_t by a constant K

In the steps that follow, there is no requirement that K be integer. It must however be greater than 1.
There are two recipes. The first works for many divisors – but not all and is the faster of the two. The second recipe will give better results for all inputs – but produces less efficient code. While I am sure that there is some analytical way of making the determination ahead of time, I’ve found it easier to use the first recipe and exhaustively test it. If it works – great. If not then switch to the second recipe.

In the following descriptions, Q is the quotient (i.e. the result) of dividing an unsigned integer A by the constant K.

Recipe #1
  1. Convert 1 / K into binary. There is a nice web based calculator here that will do the job.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 17 bits and add 1 and then truncate to 16 bits. This effectively rounds the result.
  4. Express the remaining 16 bits to the right of the binary point as a 4 digit hexadecimal number M of the form hhhh.
  5. Q = (((uint32_t)A * (uint32_t)M) >> 16) >> S
  6. Perform an exhaustive check for all A & Q. If necessary adjust M or try recipe #2.

Incidentally, you may be wondering why I don’t use the form espoused by Jones, namely:Q = (((uint32_t)A * (uint32_t)M) >> (16 + S))
The answer is that this requires a left shift 16 + S places of a 32 bit integer. By splitting the shift into two as shown and by making use of the C integer promotion rules, the expression becomes:

  1. Right shift a 32 bit integer 16 places and convert to a 16 bit integer. This effectively means just use the top half of the 32 bit integer.
  2. Right shift the 16 bit integer S places.

This is dramatically more efficient on an 8 or 16 bit processor. On a 32 bit processor it probably is not.

Recipe #2

  1. Convert 1 / K into binary.
  2. Take all the bits to the right of the binary point, and left shift them until the bit to the right of the binary point is 1. Record the required number of shifts S.
  3. Take the most significant 18 bits and add 1 and then truncate to 17 bits. This effectively rounds the result.
  4. Express the 17 bit result as 1hhhh. Denote the hhhh portion as M
  5. Q = ((((uint32_t)A * (uint32_t)M) >> 16) + A) >> 1) >> S;
  6. Perform an exhaustive check for all A & Q. If necessary adjust M.

Again I split the shifts up as shown for efficiency on an 8 / 16 bit machine.

Example 1 – Divide by 30

In this case I wish to divide a uint16_t by 30.

  1. Convert to binary. 1 / 30 = 0.000010001000100010001000100010001000100010001000100010001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 4 shifts and we get 0.10001000100010001000100010001000100010001000100010001. S is thus 4.
  3. Take the most significant 17 bits: 1000 1000 1000 1000 1
  4. Add 1: giving 1000 1000 1000 1000 1 + 1 = 1000 1000 1000 1001 0
  5. Truncate to 16 bits: 1000 1000 1000 1001
  6. Express in hexadecimal: M = 0x8889
  7. Q = (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4

An exhaustive check confirms that this expression does indeed do the job for all 16 bit values of A. It is also about 10 times faster than the compiler division routine on an AVR processor.

Example 2 – Divide by 100

In this case I wish to divide a uint16_t by 100. This is one of those cases where we need 17 bit resolution

  1. Convert to binary. 1 / 100 = 0.00000010100011110101110000101000111101011100001010001111011
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 6 shifts and we get 0.10100011110101110000101000111101011100001010001111011. S is thus 6.
  3. Take the most significant 18 bits: 1 0100 0111 1010 1110 0
  4. Add 1: 1 0100 0111 1010 1110 0 + 1 = 1 0100 0111 1010 1110 1
  5. Truncate to 17 bits: 1 0100 0111 1010 1110
  6. Express in hexadecimal: M = 1 47AE
  7. Q = ((((uint32_t)A * (uint32_t)0x47AE) >> 16) + A) >> 1) >> 6;

An exhaustive check shows that the division is not exact for all A. I thus incremented M to 0x47AF and got exact results for all A. This code was about twice as fast as the compiler division routine on an AVR processor.

Example 3 – Divide by π

This is an example where the resultant expression results in an approximate result. The approximation is very good though, with a quotient that is off by at most 1 for all A.

  1. Convert to binary: 1 / π = 0.010100010111110011000001101101110010011100100010001001
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 1 shift and we get
    10100010111110011000001101101110010011100100010001001. S is thus 1.
  3. Take the most significant 18 bits: 1 0100 0101 1111 0011 0
  4. Add 1: 1 0100 0101 1111 0011 0 + 1 = 1 0100 0101 1111 0011 1
  5. Truncate to 17 bits: 1 0100 0101 1111 0011
  6. Express in hexadecimal: M = 1 45F3
  7. Q = ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1;

An exhaustive check that compared the result of this expression to (float)A * 0.31830988618379067153776752674503f showed that the match was exact for all but 263 values in the range 0 – 0xFFFF. Where there was a mismatch it is off by at most 1. It’s also 23 times faster than converting to floating point. Not a bad trade off.

Example 4 – Divide by 10 on an 8 bit value

This technique is obviously usable on 8 bit values. One just has to adjust the number of bits. Here’s an example

  1. Convert to binary. 1 / 10 = 0.0001100110011001100110011001100110011001100110011001101
  2. Left shift until there is a 1 to the right of the binary point. In this case it requires 3 shifts and we get 0.1100110011001100110011001100110011001100110011001101. S is thus 3.
  3. Take the most significant 9 bits: 1100 1100 1
  4. Add 1: giving 110011001 + 1 = 110011010
  5. Truncate to 8 bits: 1100 1101
  6. Express in hexadecimal: M = 0xCD
  7. Q = (((uint16_t)A * (uint16_t)0xCD) >> 8) >> 3

An exhaustive check confirms that this expression does indeed do the job for all 8 bit values of A. It is also about 8 times faster than the compiler division routine on an AVR processor.


Using the values generated by Jones, together with some of the values I have computed, here’s a summary of some common divisors for unsigned 16 bit integers.

Divide by 3: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 1
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
Divide by 6: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 2
Divide by 7: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 2
Divide by 9: (((uint32_t)A * (uint32_t)0xE38F) >> 16) >> 3
Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
Divide by 11: (((uint32_t)A * (uint32_t)0xBA2F) >> 16) >> 3
Divide by 12: (((uint32_t)A * (uint32_t)0xAAAB) >> 16) >> 3
Divide by 13: (((uint32_t)A * (uint32_t)0x9D8A) >> 16) >> 3
Divide by 14: ((((uint32_t)A * (uint32_t)0x2493) >> 16) + A) >> 1) >> 3
Divide by 15: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 3
Divide by 30: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 4
Divide by 60: (((uint32_t)A * (uint32_t)0x8889) >> 16) >> 5
Divide by 100: (((((uint32_t)A * (uint32_t)0x47AF) >> 16U) + A) >> 1) >> 6
Divide by PI: ((((uint32_t)A * (uint32_t)0x45F3) >> 16) + A) >> 1) >> 1
Divide by √2: (((uint32_t)A * (uint32_t)0xB505) >> 16) >> 0

Hopefully you have spotted the relationship between divisors that are multiples of two. For example compare the expressions for divide by 15, 30 & 60.

If someone has too much time on their hands and would care to write a program to compute the values for all integer divisors, then I’d be happy to post the results for everyone to use.


Alan Bowens has risen to the challenge and has generated some nifty programs for generating coefficients for arbitrary 8 and 16 bit values. He’s also generated header files for all 8 and 16 bit integer divisors that you can just include and use. You’ll find it all at his blog. Nice work Alan.