I have written before about consulting being a leading economic indicator. My hypothesis is that when companies need engineering help, but are unsure whether to take on employees, then they turn to consultants. Conversely when companies need to cut costs, the first to go are consultants and contractors. In short, consultants are the first to go in bad times and the first to be retained in good times. I posted an update in October 2010 where I reported that the consultants I know were seeing an increase in interest level – but not yet any real increase in actual work. So where are we 5 months later? Well my informal survey of other consultants confirms that the interest seen back in October has translated into a lot of work today. All the consultants I know are very busy; indeed their biggest problem seems to be managing demand. On this basis I’m quite confident that the embedded systems industry will see robust hiring here in the USA in the coming months. If you are looking to change jobs, it’s a good time to start dusting off the resume.
Consulting as a leading economic indicator – update #2
February 25th, 2011 by Nigel JonesEfficient C Tip #13 – use the modulus (%) operator with caution
February 8th, 2011 by Nigel JonesThis is the thirteenth in a series of tips on writing efficient C for embedded systems. As the title suggests, if you are interested in writing efficient C, you need to be cautious about using the modulus operator. Why is this? Well a little thought shows that C = A % B is equivalent to C = A – B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result it’s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely have to use the modulus operator. However in many cases it’s possible to restructure the code such that the modulus operator is not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.
Converting seconds to days, hours, minutes and seconds
In Embedded Systems Design there is an increasing need for some form of real time clock. When this is done, the designer typically implements the time as a 32 bit variable containing the number of seconds since a particular date. When this is done, it’s not usually long before one has to convert the ‘time’ into days, hours, minutes and seconds. Well I found myself in just such a situation recently. As a result, I thought a quick internet search was in order to find the ‘best’ way of converting ‘time’ to days, hours, minutes and seconds. The code I found wasn’t great and as usual was highly PC centric. I thus sat down to write my own code.
Attempt #1 – Using the modulus operator
My first attempt used the ‘obvious’ algorithm and employed the modulus operator. The relevant code fragment appears below.
void compute_time(uint32_t time)
{
uint32_t days, hours, minutes, seconds;
seconds = time % 60UL;
time /= 60UL;
minutes = time % 60UL;
time /= 60UL;
hours = time % 24UL;
time /= 24UL;
days = time;
}
This approach has a nice looking symmetry to it. However, it contained three divisions and three modulus operations. I thus was rather concerned about its performance and so I measured its speed for three different architectures – AVR (8 bit), MSP430 (16 bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted are for 10 invocations of the test code and include the test harness overhead:
AVR: 29,825 cycles
MSP430: 27,019 cycles
ARM Cortex: 390 cycles
No that isn’t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my claim that the modulus operator can be very inefficient is true for some architectures – but not all. Thus if you are using the modulus operator on an ARM processor then it’s probably not worth worrying about. However if you are working on smaller processors then clearly something needs to be done – and so I investigated some alternatives.
Attempt #2 – Replace the modulus operator
As mentioned in the introduction, C = A % B is equivalent to C = A – B * (A / B). If we compare this to the code in attempt 1, then it should be apparent that the intermediate value (A/B) computed as part of the modulus operation is in fact needed in the next line of code. Thus this suggests a simple optimization to the algorithm.
void compute_time(uint32_t time)
{
uint32_t days, hours, minutes, seconds;
days = time / (24UL * 3600UL);
time -= days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = time / 3600UL;
time -= (hours * 3600UL);
/* time now contains the number of seconds in the last hour */
minutes = time / 60U;
seconds = time - minutes * 60U;
}
In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators in one (- * /). Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results however were a little surprising:
AVR: 18,720 cycles
MSP430: 14,805 cycles
ARM Cortex: 384 cycles
Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no impact on the ARM code. Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the ARM results, it’s clear that at least in this example, it’s possible to significantly speed up an algorithm by eliminating the modulus operator.
I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.
Attempt #3 – Data type size reduction
My naive implementation of the code looked like this:
void compute_time(uint32_t time)
{
uint16_t days;
uint8_t hours, minutes, seconds;
uint16_t stime;
days = (uint16_t)(time / (24UL * 3600UL));
time -= (uint32_t)days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = (uint8_t)(time / 3600UL);
stime = time - ((uint32_t)hours * 3600UL);
/*stime now contains the number of seconds in the last hour */
minutes = stime / 60U;
seconds = stime - minutes * 60U;
}
All I have done is change the data types and to add casts where appropriate. The results were interesting:
AVR: 14,400 cycles
MSP430: 11,457 cycles
ARM Cortex: 434 cycles
Thus while this resulted in a significant improvement for the AVR & MSP430, it resulted in a significant worsening for the ARM. Clearly the ARM doesn’t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a lot more portable – and that is to use the C99 fast types. Doing this gives the following code:
Attempt #4 – Using the C99 fast data types
void display_time(uint32_t time)
{
uint_fast16_t days;
uint_fast8_t hours, minutes, seconds;
uint_fast16_t stime;
days = (uint_fast16_t)(time / (24UL * 3600UL));
time -= (uint32_t)days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = (uint_fast8_t)(time / 3600UL);
stime = time - ((uint32_t)hours * 3600UL);
/*stime now contains the number of seconds in the last hour */
minutes = stime / 60U;
seconds = stime - minutes * 60U;
}
All I have done is change the data types to the C99 fast types. The results were encouraging:
AVR: 14,400 cycles
MSP430: 11,595 cycles
ARM Cortex: 384 cycles
Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast and portable.
Conclusion
Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for further optimizations. As a result with the AVR & MSP430 I was able to more than halve the execution time.
Converting Integers for Display
A similar problem (with a similar solution) occurs when one wants to display integers on a display. For example if you are using a custom LCD panel with say a 3 digit numeric field, then the problem arises as to how to determine the value of each digit. The obvious way, using the modulus operator is as follows:
void display_value(uint16_t value)
{
uint8_t msd, nsd, lsd;
if (value > 999)
{
value = 999;
}
lsd = value % 10;
value /= 10;
nsd = value % 10;
value /= 10;
msd = value;
/* Now display the digits */
}
However, using the technique espoused above, we can rewrite this much more efficiently as:
void display_value(uint16_t value)
{
uint8_t msd, nsd, lsd;
if (value > 999U)
{
value = 999U;
}
msd = value / 100U;
value -= msd * 100U;
nsd = value / 10U;
value -= nsd * 10U;
lsd = value;
/* Now display the digits */
}
If you benchmark this you should find it considerably faster than the modulus based approach.
Formatted output when using C99 data types
February 1st, 2011 by Nigel JonesRegular readers of this blog will know that I am a proponent of using the C99 data types. They will also know that I’m no fan of formatted output. Notwithstanding this, I do use formatted output (particularly vsprintf) on larger systems. Well if you use the C99 data types and you use formatted output, you will quickly run into a problem – namely what modifier do you give printf() to print say a uint16_t variable? Now if you are working on an 8 or 16 bit architecture, then you’d probably be OK guessing that %u would work quite nicely. However if you are working on a 32 bit architecture, what would you use for say a uint_fast8_t variable? Well it so happens that the C99 folks were aware of this problem and came up with just about the ugliest solution imaginable.
inttypes.h
In order to solve this problem, you first of all need to #include a file inttypes.h. This header file in turn includes stdint.h so that you have access to the C99 data types. If you examine this file, you will find that it consists of a large number of definitions. An example definition might look like this:
#define PRId16 __INT16_SIZE_PREFIX__ "d"
If you are like me, when I first saw this I was a little puzzled. How exactly was this supposed to help? Well I’ll give you an example of its usage, and then explain how it works.
#include <inttypes.h>
#include <stdio.h>
void print_int16(int16_t value)
{
printf("Value = %" PRId16, value);
}
So what’s going on here? Well let’s assume for now that __INT16_SIZE_PREFIX__ is in turn defined to be “h”. Our code is converted by the preprocessor into the following:
#include <inttypes.h>
#include <stdio.h>
void print_int16(int16_t value)
{
printf("Value = %" "h" "d", value);
}
At compile time, the successive strings “Value = %” “h” “d” are concatenated into the single string “Value = %hd”, so that we end up with:
#include <inttypes.h>
#include <stdio.h>
void print_int16(int16_t value)
{
printf("Value = %hd", value);
}
This is legal syntax for printf. More importantly, the correct format string for this implementation is now being passed to printf () for an int16_t data type.
Thus the definitions in inttypes.h allow one to write portable formatted IO while still using the C99 data types.
Naming Convention
Examination of inttypes.h shows that a consistent naming convention has been used. For output, the constant names are constructed thus:
<PRI><printf specifier><C99 modifier><number of bits> where
<PRI> is the literal characters PRI.
<printf specifier> is the list of integer specifiers we all know so well {d, i, o, u, x, X}
<C99 modifier> is one of {<empty>, LEAST, FAST, MAX, PTR}
<number of bits> is one of {8, 16, 32,64 <empty>}. <empty> only applies to the MAX and PTR C99 modifiers.
Examples:
To print a uint_fast8_t in lower case hexadecimal you would use PRIxFAST8.
To print a int_least64_t in octal you would use PRIoLEAST64.
Formatted Input
For formatted input, simply replace PRI with SCN.
Observations
While I applaud the C99 committee for providing this functionality, it can result in some dreadful looking format statements. For example here’s a string from a project I’m working on:
wr_vstr(1, 0, MAX_STR_LEN, "%-+*" PRId32 "%-+4" PRId32 "\xdf", tap_str_len, tap, angle);
Clearly a lot of this has to do with the inherently complex formatted IO syntax. The addition of the C99 formatters just makes it even worse.
Personally I’d have liked the C99 committee to have bitten the bullet and introduced a formatted IO function that had the following characteristics:
- Explicit support for the C99 data types.
- No support for octal. Does anyone ever use the octal formatter?
- Support for printing binary – this I do need to do from time to time.
- A standard defined series of reduced functionality formatted IO subsets. This way I’ll know that if I restrict myself to a particular set of format types I can use the smallest version of the formatted IO function.
PC Lint
Regular readers will also know that I’m a major proponent of using PC-Lint from Gimpel. I was surprised to discover that while Lint is smart enough to handle string concatenation with printf() etc, it doesn’t do it with user written functions that are designed to accept format strings. For example, the function wr_vstr() referenced above looks like this:
static void wr_vstr(uint_fast8_t row, uint_fast8_t col, uint_fast8_t width, char const * format, ...)
{
va_list args;
char buf[MAX_STR_LEN];
va_start(args, format);
(void)vsnprintf(buf, MAX_STR_LEN, format, args); /* buf contains the formatted string */
wr_str(row, col, buf, width); /* Call the generic string writer */
va_end(args); /* Clean up. Do NOT omit */
}
I described this technique here. Anyway, if you use the inttypes.h constants like I did above, then you will find that PC-Lint complains loudly.
Final Thoughts
Inttypes.h is very useful for writing portable formatted IO with the C99 data types. It’s ugly – but it beats the alternative. I recommend you add it to your bag of tricks.
Configuring hardware – part 3
January 26th, 2011 by Nigel JonesThis is the final part in a series on configuring the hardware peripherals in a microcontroller. In the first part I talked about how to set / clear bits in a configuration register, and in the second part I talked about putting together the basic framework for the driver. When I finished part 2, we had got as far as configuring all the bits in the open function. It’s at this point that things get interesting. In my experience the majority of driver problems fall into three areas:
- Failing to place the peripheral into the correct mode.
- Getting the clocking wrong.
- Mishandling interrupts.
I think most people tend to focus on the first item. Personally I have learned that it’s usually better to tackle the above problems in the reverse order.
Mishandling interrupts
Almost all peripheral drivers need interrupt handlers, and these are often the source of many problems. If you have followed my advice, then at this stage you should have a skeleton interrupt handler for every possible interrupt vector that the peripheral uses. You should also have an open and close function. A smart thing to do at this stage is to download your code to your debug environment. I then place a break-point on every interrupt handler and then I call the open function. If the open function merely configures the peripheral, yet does not enable it, then presumably no interrupts should occur. If they do, then you need to find out why and fix the problem.
At this point I now add just enough code to each interrupt handler such that it will clear the source of the interrupt and generate the requisite interrupt acknowledge. Sometimes this is done for you in hardware. In other cases you have to write a surprising amount of code to get the job done. I strongly recommend that you take your time over this stage as getting an interrupt acknowledge wrong can cause you endless problems.
The next stage is to write the enable function, download the code and open and enable the peripheral. This time you need to check that you do get the expected interrupts (e.g. a timer overflow interrupt) and that you acknowledge them correctly. Just as importantly you also need to check that you don’t get an unexpected interrupt (e.g. a timer match interrupt). On the assumption that all is well, then you can be reasonably confident that there are no egregious errors in your setup of interrupts. At this point you will probably have to further flesh out the interrupt handlers in order to give the driver some limited functionality. Although I’m sure you’ll be tempted to get on with the problem at hand, I recommend that you don’t do this, but rather write code to help tackle the next problem – namely that of clocking verification.
Clocking
Most peripherals use a clock source internal to the microprocessor. Now modern processors have multiple clock domains, PLL based frequency multipliers, and of course multi-level pre-scalars. As a result it can be a real nightmare trying to get the correct frequency to a peripheral. Even worse it is remarkably easy to get the approximately correct frequency to a peripheral. This issue can be a real problem with asynchronous communications links where a 1% error in frequency may be OK with one host and fail with another. As a result I now make it a rule to always try and verify that I am indeed clocking a peripheral with the correct frequency. To do this, there is no substitute for breaking out the oscilloscope or logic analyzer and measuring something. For timers one can normally output the signal on a port pin (even if this is just for verification purposes). For communications links one can simply set up the port to constantly transmit a fixed pattern. For devices such as A2D converters I usually have to resort to toggling a port pin at the start and end of conversion. Regardless of the peripheral, it’s nearly always worth taking the time to write some code to help you verify that the peripheral is indeed being clocked at the correct frequency.
When you are doing this, there are a couple of things to watch out for:
- If your processor has an EMI reduction mode, then consider turning it off while performing clocking measurements. The reason for this is that ‘EMI reduction’ is actually achieved by dithering (quasi randomly varying) the clock frequency. Clearly a randomly varying clock isn’t conducive to accurate frequency measurements.
- Make sure that your system is indeed being clocked by the correct source. I mention this because some debuggers can provide the clock to the target.
Finally, if you find that you have an occasional problem with a peripheral, then checking that the clocking is precise is always a good place to start.
Mode
At this stage you have done the following:
- Considered every bit in every register in your open function.
- Verified that you have interrupts set up correctly.
- Written the enable function and at least part of the interrupt handler(s).
- Verified that you have the correct frequency clocks going to the peripheral.
You should now complete writing the driver. This is where you write the bulk of the application specific code. Clearly this part is highly application specific. Notwithstanding this, I can offer one piece of advice. Probably the single biggest mistake that I have made over the years is to assume that because the driver ‘works’ that it must be correct. I will give you a simple example to demonstrate what I mean.
It’s well known that the popular SPI port found on many devices can operate in one of four modes (often imaginatively called Mode0, Mode1, Mode2 & Mode3). These modes differ based on the phase relationship of the clock and data lines and whether the data are valid on the rising or falling edge of the clock. Thus it’s necessary to study the data sheet of the SPI peripheral to find out its required mode. Let’s assume that after studying the data sheet you conclude that Mode2 operation is called for – and you implement the code and it works. If you then walk away from the code then I humbly suggest you are asking for it. The reason is that it’s possible that a peripheral will ‘work’ in Mode 2, even though it should be operated in Mode 3. The peripheral ‘works’ in Mode 2 even though you are right on the edge of violating the various required setup and hold times. A different temperature or a different chip lot and your code will fall over. It’s for this reason that I strongly recommend that you break out the logic analyzer and carefully compare the signals to what is specified in the data sheet. There is nothing quite like comparing waveforms to what is in the data sheet to give you a warm fuzzy feeling that the driver really is doing its job correctly.
Final Thoughts
Driver writing is hard. Engineers that can take on this task and write clean, fast and correct drivers in a timely manner are immensely valuable to organizations. Thus even if you cringe at the thought of having to write a device driver, you might want to put the effort into learning how to do it – your career will thank you!
The best search terms of 2010
December 24th, 2010 by Nigel JonesIt’s that time of the year again when I look back over some of the more amusing search terms that drove people to this blog. I hope you enjoy them as much as I did!
C shoot yourself in the foot codes
The cynic in me immediately thought that the search would be better if he reversed the logic and looked for C constructs whereby you can’t shoot yourself in the foot.
Can electrical engineers do embedded systems?
A profound question. I have already given my thoughts on this matter.
IT consulting no technical skills
No wonder so many consultants have a bad reputation!
How to goof off at work without being caught
No wonder some employers have a jaded view of their employees!
Well written code doesn’t need debugging nigel jones
Yikes! Just for the record I will state unequivocally that all code needs debugging.
Can sprintf be sued in isr
Presumably this was a typo. However the use of sued just seems so much more appropriate when it came to sprintf and interrupt service routines.
Want c code projects to put fake experiance (sic)
I’m not sure whether to admire the guile of this person or to shudder at the thought of someone with fake experience working on a medical device.
Normal folks and the nerds in embedded system
I actually thought this was fair enough. I know my kids think I’m weird.
I’m not completely useless! I can be used as a bad example!
That’s what I call a positive outlook!
Bad stack overflow experience
Is there any such thing as a good stack overflow experience?
Tools to write to eeprom of msp430
Good luck with that – the MSP430 doesn’t have any EEPROM
Are people on stackoverflow all dicks?
My favorite term of the year. I consoled myself by assuming that the searcher was actually thinking of the people at the other stack-overflow. Having said that, I don’t think the people there are dicks either.
Anyway, thanks for reading. I will return to my usual fare with my next post.