Archive for February, 2011

An embedded systems hardware test – a collaborative effort

Friday, February 25th, 2011 Nigel Jones

Regular readers will probably be aware that back in 2000 I wrote an article for Embedded Systems Programming magazine entitled A ‘C’ Test: The 0×10 Best Questions for Would-be Embedded Programmers. In the intervening years I have often thought that it would be entertaining / useful to come up with a similar test—except this time I would be testing someone’s hardware knowledge. As a result over the years I have collected together a number of fun questions, which I intend to use in the forth-coming article. However it occurred to me that I have a lot of very smart readers and that collectively we could put together a far better test than I could do so on my own. Thus I’m looking for your hardware questions! Before you flood me with your suggestions here are the ground rules:

  1. Embedded systems design, not hardware design
    The test is intended to test the hardware knowledge of persons writing embedded code. It is NOT a test for persons that will be designing hardware. Thus questions about the minutiae of hardware filter design are not what I’m looking for.
  2. Traps
    The best questions will be examples from your past where someone got into trouble because they didn’t understand something about the hardware that you thought they should have.
  3. Why
    As well as posing the question (and giving the answer!), please explain why you think it’s important that someone should know what you are asking.
  4. Oscilloscope and logic analyzer
    I expect that the questions will cover circuits, processor architectures and tools. While I’m interested in all three, I’m particularly interested in elegant questions that will allow the questioner to determine if the candidate knows how to use an oscilloscope or logic analyzer.
  5. Original
    Please don’t send me any copyrighted or plagiarized material. Links are of course fine. (I mention this because not only is it legally and morally wrong – but I’m also tired of people ripping off my work and claiming it as their own).
  6. Attribution
    If I choose to use your suggestion, then tell me how you’d like it attributed. Full name + email address through anonymous are all fine.
  7. Early bird…
    If I get multiple similar suggestions, then the first one received gets the credit.
  8. Fame
    By sending me something you are agreeing to let me publish it. Other than attribution (and the accompanying fame 🙂 ), no other compensation will be given.

Anyway, if you’d like to participate then contact me

Thanks! I expect that I will publish the article in a few weeks.

Consulting as a leading economic indicator – update #2

Friday, February 25th, 2011 Nigel Jones

I have written before about consulting being a leading economic indicator. My hypothesis is that when companies need engineering help, but are unsure whether to take on employees, then they turn to consultants. Conversely when companies need to cut costs, the first to go are consultants and contractors. In short, consultants are the first to go in bad times and the first to be retained in good times. I posted an update in October 2010 where I reported that the consultants I know were seeing an increase in interest level – but not yet any real increase in actual work. So where are we 5 months later? Well my informal survey of other consultants confirms that the interest seen back in October has translated into a lot of work today. All the consultants I know are very busy; indeed their biggest problem seems to be managing demand. On this basis I’m quite confident that the embedded systems industry will see robust hiring here in the USA in the coming months. If you are looking to change jobs, it’s a good time to start dusting off the resume.

Efficient C Tip #13 – use the modulus (%) operator with caution

Tuesday, February 8th, 2011 Nigel Jones

This is the thirteenth in a series of tips on writing efficient C for embedded systems.  As the title suggests, if you are interested in writing efficient C, you need to be cautious about using the modulus operator.  Why is this? Well a little thought shows that C = A % B is equivalent to C = A – B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result it’s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely have to use the modulus operator. However in many cases it’s possible to restructure the code such that the modulus operator is not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.

Converting seconds to days, hours, minutes and seconds

In Embedded Systems Design there is an increasing need for some form of real time clock. When this is done, the designer typically implements the time as a 32 bit variable containing the number of seconds since a particular date. When this is done, it’s not usually long before one has to convert the ‘time’ into days, hours, minutes and seconds. Well I found myself in just such a situation recently. As a result, I thought a quick internet search was in order to find the ‘best’ way of converting ‘time’ to days, hours, minutes and seconds. The code I found wasn’t great and as usual was highly PC centric. I thus sat down to write my own code.

Attempt #1 – Using the modulus operator

My first attempt used the ‘obvious’ algorithm and employed the modulus operator. The relevant code fragment appears below.

void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 seconds = time % 60UL;
 time /= 60UL;
 minutes = time % 60UL;
 time /= 60UL;
 hours = time % 24UL;
 time /= 24UL;
 days = time;  
}

This approach has a nice looking symmetry to it.  However, it contained three divisions and three modulus operations. I thus was rather concerned about its performance and so I measured its speed for three different architectures – AVR (8 bit), MSP430 (16 bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted are for 10 invocations of the test code and include the test harness overhead:

AVR:  29,825 cycles

MSP430: 27,019 cycles

ARM Cortex: 390 cycles

No that isn’t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my claim that the modulus operator can be very inefficient is true for some architectures – but not all.  Thus if you are using the modulus operator on an ARM processor then it’s probably not worth worrying about. However if you are working on smaller processors then clearly something needs to be done  – and so I investigated some alternatives.

Attempt #2 – Replace the modulus operator

As mentioned in the introduction,  C = A % B is equivalent to C = A – B * (A / B). If we compare this to the code in attempt 1, then it should be apparent that the intermediate value (A/B) computed as part of the modulus operation is in fact needed in the next line of code. Thus this suggests a simple optimization to the algorithm.

void compute_time(uint32_t time)
{
 uint32_t    days, hours, minutes, seconds;

 days = time / (24UL * 3600UL);    
 time -= days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = time / 3600UL;
 time -= (hours * 3600UL);
 /* time now contains the number of seconds in the last hour */
 minutes = time / 60U;
 seconds = time - minutes * 60U;
 }

In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators in one (- * /).  Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results however were a little surprising:

AVR:  18,720 cycles

MSP430: 14,805 cycles

ARM Cortex: 384 cycles

Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no impact on the ARM code.  Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the ARM results, it’s clear that at least in this example, it’s possible to significantly speed up an algorithm by eliminating the modulus operator.

I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.

Attempt #3 – Data type size reduction

My naive implementation of the code looked like this:

void compute_time(uint32_t time)
{
 uint16_t    days;
 uint8_t     hours, minutes, seconds;
 uint16_t    stime;

 days = (uint16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}

All I have done is change the data types and to add casts where appropriate. The results were interesting:

AVR:  14,400 cycles

MSP430: 11,457 cycles

ARM Cortex: 434 cycles

Thus while this resulted in a significant improvement for the AVR & MSP430, it resulted in a significant worsening for the ARM. Clearly the ARM doesn’t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a lot more portable – and that is to use the C99 fast types. Doing this gives the following code:

Attempt #4 – Using the C99 fast data types

void display_time(uint32_t time)
{
 uint_fast16_t    days;
 uint_fast8_t    hours, minutes, seconds;
 uint_fast16_t    stime;

 days = (uint_fast16_t)(time / (24UL * 3600UL));    
 time -= (uint32_t)days * 24UL * 3600UL;
 /* time now contains the number of seconds in the last day */
 hours = (uint_fast8_t)(time / 3600UL);
 stime = time - ((uint32_t)hours * 3600UL);
 /*stime now contains the number of seconds in the last hour */
 minutes = stime / 60U;
 seconds = stime - minutes * 60U;
}

All I have done is change the data types to the C99 fast types. The results were encouraging:

AVR:  14,400 cycles

MSP430: 11,595 cycles

ARM Cortex: 384 cycles

Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast and portable.

Conclusion

Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for further optimizations. As a result with the AVR & MSP430 I was able to more than halve the execution time.

Converting Integers for Display

A similar problem (with a similar solution) occurs when one wants to display integers on a display. For example if you are using a custom LCD panel with say a 3 digit numeric field, then the problem arises as to how to determine the value of each digit. The obvious way, using the modulus operator is as follows:

void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value > 999)
 {
 value = 999;
 }

 lsd = value % 10;
 value /= 10;
 nsd = value % 10;
 value /= 10;
 msd = value;

 /* Now display the digits */
}

However, using the technique espoused above, we can rewrite this much more efficiently as:

void display_value(uint16_t value)
{
 uint8_t    msd, nsd, lsd;

 if (value > 999U)
 {
  value = 999U;
 }

 msd = value / 100U;
 value -= msd * 100U;

 nsd = value / 10U;
 value -= nsd * 10U;

 lsd = value;

 /* Now display the digits */
}

If you benchmark this you should find it considerably faster than the modulus based approach.

Previous Tip

Formatted output when using C99 data types

Tuesday, February 1st, 2011 Nigel Jones

Regular readers of this blog will know that I am a proponent of using the C99 data types. They will also know that I’m no fan of formatted output. Notwithstanding this, I do use formatted output (particularly vsprintf) on larger systems. Well if you use the C99 data types and you use formatted output, you will quickly run into a problem – namely what modifier do you give printf()  to print say a uint16_t variable? Now if you are working on an 8 or 16 bit architecture, then you’d probably be OK guessing that %u would work quite nicely. However if you are working on a 32 bit architecture, what would you use for say a uint_fast8_t variable? Well it so happens that the C99 folks were aware of this problem and came up with just about the ugliest solution imaginable.

inttypes.h

In order to solve this problem, you first of all need to #include a file inttypes.h. This header file in turn includes stdint.h so that you have access to the C99 data types. If you examine this file, you will find that it consists of a large number of definitions. An example definition might look like this:

#define PRId16 __INT16_SIZE_PREFIX__ "d"

If you are like me, when I first saw this I was a little puzzled. How exactly was this supposed to help? Well I’ll give you an example of its usage, and then explain how it works.

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %" PRId16, value);
}

So what’s going on here? Well let’s assume for now that __INT16_SIZE_PREFIX__ is in turn defined to be “h”.  Our code is converted by the preprocessor into the following:

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %" "h" "d", value);
}

At compile time, the successive strings “Value = %” “h” “d” are concatenated into the single string “Value = %hd”, so that we end up with:

#include <inttypes.h>
#include <stdio.h>

void print_int16(int16_t value)
{
 printf("Value = %hd", value);
}

This is legal syntax for printf. More importantly, the correct format string for this implementation is now being passed to printf () for an int16_t data type.

Thus the definitions in inttypes.h allow one to write portable formatted IO while still using the C99 data types.

Naming Convention

Examination of inttypes.h shows that a consistent naming convention has been used. For output, the constant names are constructed thus:

<PRI><printf specifier><C99 modifier><number of bits> where

<PRI> is the literal characters PRI.

<printf specifier> is the list of integer specifiers we all know so well {d, i, o, u, x, X}

<C99 modifier> is one of {<empty>, LEAST, FAST, MAX, PTR}

<number of bits> is one of {8, 16, 32,64 <empty>}. <empty> only applies to the MAX and PTR C99 modifiers.

Examples:

To print a uint_fast8_t in lower case hexadecimal you would use PRIxFAST8.

To print a int_least64_t in octal you would use PRIoLEAST64.

Formatted Input

For formatted input, simply replace PRI with SCN.

Observations

While I applaud the C99 committee for providing this functionality, it can result in some dreadful looking format statements. For example here’s a string from a project I’m working on:

wr_vstr(1, 0, MAX_STR_LEN, "%-+*" PRId32 "%-+4" PRId32 "\xdf", tap_str_len, tap, angle);

Clearly a lot of this has to do with the inherently complex formatted IO syntax. The addition of the C99 formatters just makes it even worse.

Personally I’d have liked the C99 committee to have bitten the bullet and introduced a formatted IO function that had the following characteristics:

  1. Explicit support for the C99 data types.
  2. No support for octal. Does anyone ever use the octal formatter?
  3. Support for printing binary – this I do need to do from time to time.
  4. A standard defined series of reduced functionality formatted IO subsets. This way I’ll know that if I restrict myself to a particular set of format types I can use the smallest version of the formatted IO function.

PC Lint

Regular readers will also know that I’m a major proponent of using PC-Lint from Gimpel. I was surprised to discover that while Lint is smart enough to handle string concatenation with printf() etc, it doesn’t do it with user written functions that are designed to accept format strings. For example, the function wr_vstr() referenced above looks like this:

static void wr_vstr(uint_fast8_t row, uint_fast8_t col, uint_fast8_t width, char const * format, ...)
{
 va_list  args;
 char  buf[MAX_STR_LEN];

 va_start(args, format);
 (void)vsnprintf(buf, MAX_STR_LEN, format, args);     /* buf contains the formatted string */

 wr_str(row, col, buf, width);    /* Call the generic string writer */

 va_end(args);                    /* Clean up. Do NOT omit */
}

I described this technique here. Anyway, if you use the inttypes.h constants like I did above, then you will find that PC-Lint complains loudly.

Final Thoughts

Inttypes.h is very useful for writing portable formatted IO with the C99 data types. It’s ugly – but it beats the alternative. I recommend you add it to your bag of tricks.