Archive for the ‘General C issues’ Category

Shifting Styles

Thursday, November 27th, 2014 Nigel Jones

To say it’s been some time since I last posted is an understatement! I won’t bore you with the details other than to note that sometimes there just aren’t enough hours in a day.

Anyway, today’s post is about a stylistic issue I’ve noticed in just about all code I’ve ever looked at. Unless you are a closeted BASIC programmer, you probably don’t ever write something like this:

foo = foo + 6;

While there’s nothing particularly wrong with this, other than looking rather odd from a mathematical perspective, just about every C programmer would use the += operator, i.e.

foo += 6;

Indeed this is true for all the arithmetic and logical operators. I.e.

foo *= 6;
foo /= 6;
foo -= 6;
foo ^= 6;
foo |= 6;
foo &= 6;

However, when it comes to the shift operators, something odd seems to happen. Almost no one writes:

foo >>= 6;

or even rarer:

foo <<= 6;

Instead folks resort to the syntax of BASIC and use:

foo = foo >> 6;
foo = foo << 6;

Why exactly is this? This thought was triggered by me looking at some of my own code from about ten years ago. Sure enough right in the middle of what was an otherwise well written piece of code (in the sense that ten years later it was easily followed and was a breeze to adapt to my latest project) I found a:

foo = foo << 6;

I have no real explanation other than we are all creatures of habit and sometimes get into inconsistent programming styles. While I wouldn’t fault someone for doing this, I do think that if you quite happily use += but not >>= then you should ponder your rationale for being inconsistent. Perhaps it will trigger a bigger introspection?

 

Replacing nested switches with multi-dimensional arrays of pointers to functions

Monday, March 17th, 2014 Nigel Jones

It’s been way too long since I’ve written a blog post. To those kind souls that have written to inquire if I’m still alive and kicking – thank you.  The bottom line is that there simply aren’t enough hours in the day. Anyway in an effort to get back in the groove so to speak, I thought I’d answer an email from Francois Alibert who wrote to ask how to replace a nested switch statement with a multi-dimensional array of pointers to functions. Here’s a program that illustrates his conundrum:

#include <stdint.h>
#include <stdio.h>

typedef enum 
{State1, State2, State3, Last_State}
MainState_t;

typedef enum 
    {SubState1, SubState2, SubState3, SubState4, SubState5, SubState6, Last_SubState}
SubState_t;

void demo(MainState_t State,  SubState_t SubState);

/*     Functions called from nested switch statement. 
    First digit is main state, second digit is substate */

void fn11(void);
void fn16(void);

void fn24(void);

void fn32(void);
void fn33(void);
void fn35(void);

void main(void)
{
    MainState_t main_state;
    SubState_t sub_state;
    
    for (main_state = State1; main_state < Last_State; main_state++)
    {
        for(sub_state = SubState1; sub_state < Last_SubState; sub_state++)
        {
            demo(main_state, sub_state);
        }
    }
}

void demo(MainState_t State,  SubState_t SubState)
{
    switch (State)
    {
        case State1:
            switch (SubState)
            {
                case SubState1:
                fn11();
                break;
                
                case SubState6:
                fn16();
                break;
            
                default:
                break;
            }
        break;

        case State2:
            switch (SubState)
            {
                case SubState4:
                fn24();
                break;
    
                default:
                break;
            }
        break;
    
        case State3:
        {
            switch (SubState)
            {
                case SubState2:
                fn32();
                break;
                
                case SubState3:
                fn33();
                break;                
                
                case SubState5:
                fn35();
                break;
    
                default:
                break;
            }
        }
        break;
        
        default:
        break;
    }
}

void fn11(void)
{
    puts("State 1, substate 1");
}

void fn16(void)
{
    puts("State 1, substate 6");
}

void fn24(void)
{
    puts("State 2, substate 4");
}

void fn32(void)
{
    puts("State 3, substate 2");
}

void fn33(void)
{
    puts("State 3, substate 3");
}

void fn35(void)
{
    puts("State 3, substate 5");
}

The key points are that we have nested switch statements and the substate is sparse. That is the number of substates for main state 1 is different to that of the substates for main state 2 and so on. If you’ve ever been in the situation of having to write a nested state machine like this, you’ll rapidly find that the code becomes very unwieldy. In particular functions many of hundreds of lines long with break statements all over the place are the norm. The result can be a maintenance nightmare. Of course if you end up going to three levels, then the problem compounds. Anyway, before looking at a pointer to function implementation, here’s the output from the above code:

State 1, substate 1
State 1, substate 6
State 2, substate 4
State 3, substate 2
State 3, substate 3
State 3, substate 5

In addition, using IAR’s AVR compiler, the code size with full size optimization is 574 bytes and the execution time is  2159 cycles, with the bulk of the execution time taken up by the puts() call.

Let’s now turn this into a pointer to function implementation. The function demo becomes this:

void demo(MainState_t State,  SubState_t SubState)
{
    static void (* const pf[Last_State][Last_SubState])(void) = 
    {
        {fn11, fnDummy, fnDummy, fnDummy, fnDummy, fn16},
        {fnDummy, fnDummy, fnDummy, fn24, fnDummy, fnDummy},
        {fnDummy, fn32, fn33, fnDummy, fn35, fnDummy}
    };
    
    if ((State < Last_State) && (SubState < Last_SubState))
    {
        (*pf[State][SubState])();
    }
}

Note that the empty portions of the array are populated with a call to fnDummy(), which as its name suggests is a dummy function that does nothing. You can of course put a NULL pointer in the array, and then extract the pointer, check to see if its non-NULL and call the function, However in my experience its always faster to just call a dummy function.

So how does this stack up to the nested switch statements? Well as written, the code size has increased to 628 bytes and cycles to 2846. This is a significant increase in overhead. However the code is a lot more compact, and in my opinion dramatically more maintainable. Furthermore, if you can guarantee by design that the parameters passed to demo() are within the array bounds (as is the case with this example), then you can arguably dispense with the bounds checking code. In which case the code size becomes 618 bytes and the execution time 2684 cycles. It’s your call as to whether the tradeoff is worth it.

 

What’s in your main() header?

Saturday, February 2nd, 2013 Nigel Jones

One of the consequences of being in the consulting business is that I get to look at a lot of code written by other people. Usually it is necessary for me to get up to speed on the code as quickly as possible, and so to this end, one of the first things I do is look for main.c, or if it doesn’t exist the file that contains main(). Here’s what I usually find:

/********************************************************************************
 main.c
 Possibly a one line description.
 Legal notice. Sometimes many lines long.
 *********************************************************************************/

That’s it. Now maybe it’s just me, but I find this a bit inadequate. Before I describe what I put in the header for main.c, I should first note my motivation. Anyone that has written code for many years realizes that the whole point of writing code is to allow it to be maintained. The person maintaining the code may be a future version of yourself, but often is some poor sod who gets thrown a bunch of code and told to get on with it. As a result, it is imperative that this future maintainer be told as much as possible about what it is they are maintaining. Now I realize that a lot of what I describe below could be described elsewhere. However, it’s my experience that Word documents and other non source code related documents tend to get lost over time (or perhaps more accurately not packaged with the source code when it is given to someone else), and so by putting this information in main.c, you pretty much guarantee that the maintainer will receive the information. With this as a background, here’s what I think should be in the header for main.c.

A product description

I typically write somewhere between 10 and 100 lines of text describing the product, what it does, how it does it, what makes it unique and the things about it that make it difficult. Note I’m not describing the code. I always find this a challenge because it forces me to really get to the core of the product. I can sometimes take many hours on this stage, as I try to refine and precis my description to include as much useful information as possible. Who has this sort of time you ask? Well if you think about it, if you can’t write a concise, yet detailed description of the product, surely you aren’t ready to start writing code? Thus if you go through this exercise and find yourself stymied, then you simply shouldn’t be sitting in front of a text editor.

Text Editor settings

Talking of text editors, the next thing you should have is an entry that describes how you have your text editor configured. I’m not interested in getting into a discussion about what your text editor settings should be – I’d just like to know what they are so that I can configure my text editor to match. This is a critical step as there is no bigger time waster than trying to understand code that looks like a disaster because you used tabs with an indentation of 2 and my editor is using an indentation of 4.

Development Environment

The text editor is of course part of the larger development environment. While it’s obvious to you what build environment you are using, it isn’t to someone else. Thus if you are using an IDE from Keil then say so. Conversely if you are from the IDE’s are evil camp and instead rely upon makefiles, well make that clear as well. Note the presence of a makefile in the source code directory does not IMHO constitute adequate documentation that this is how you intend the code to be built.

Compiler make and version

Almost every project I look at fails to make it clear what compiler make and version the code was written for. This always blows me away, because I’ve never yet seen an embedded system that doesn’t rely on compiler specific facets for it to successfully compile. Thus you should spell out exactly what compiler you used – even if you don’t think it really matters much.

Libraries

If you are using libraries, particularly ones from a third party, then you should really be spelling this out and of course specifying what version of the library you used. If there are special licensing restrictions on the use of the library, then this isn’t a bad place to mention it either.

Other tools

If you use other tools, particularly code generation tools, then it would be really nice to let the reader know that your code relies upon tool X, version Y. If you are using make rather than an IDE, it would also be nice to let us all know what version of make you used.

CPU configuration

Many CPUs are configurable via fuse bits of some type (PICs and AVRs are prime examples). These configuration bits usually have a dramatic impact on how the CPU behaves, and so it is critical that you document what fuse bit settings you are assuming. It’s possible to waste many hours debugging a system that in fact has no code problems per se, but rather is simply misconfigured at the fuse bit level.

How to build

Finally, it would be really nice if you told everyone how to actually make the executable. I’m constantly amazed at the number of projects I see where either the method of building is unclear, or worse, the ‘obvious method’ (e.g. typing make) results in a build failure because prior to e.g. invoking make, it is necessary to run some batch file etc.

While I think there’s a lot more project specific information that can go in the header, I think the above is a pretty decent start. I’d be interested in hearing about other information that you put in your main.c header.

 

 

Real world variables

Wednesday, January 16th, 2013 Nigel Jones

Part of what makes embedded systems fun for me is that they normally interact with the physical world. The physical world contains real parameters which we measure using transducers, signal conditioning circuits and so on, such that ultimately we end up with a variable in our embedded code that purports to represent this real world parameter. For example, we might have this:

uint16_t pressure;     /* Pressure */

Don’t laugh. I have seen this a million times. What’s wrong with this you ask? Well, when dealing with real world variables, it is crucial that you as the author of the code make crystal clear at least four things about the real world variable:

  1. What the parameter actually is.
  2. The units of the variable.
  3. The resolution of the variable, otherwise known as the value of a LSB.
  4. The dynamic range of the variable.

Identifying the parameter

Many embedded systems measure a multitude of real world parameters, many of which may be of the same ‘type’ (for example, ‘pressure’). Thus while it is blindingly obvious to you when you write the code which pressure you are thinking of, it most certainly isn’t to someone coming into the code cold. Even if there is only one measured pressure in your system, the chances are there are various flavors of it. For example, a common architecture when measuring real world variables is to:

  1. Have the raw value. This is typically the value resulting from converting the latest ADC reading. This raw value is then:
  2. Median filtered so as to eliminate egregious outliers. The median value is then:
  3. Low pass filtered so as to remove Gaussian noise.

There are thus at least three representations of the pressure in the system I have described. My preference is that the variable name identify which instance you are dealing with. However, if for various reasons this makes for unwieldy names then at the very least make sure the comment that accompanies the variable declaration spells it out. For example:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure */

Units

Failure to specify the units drives me up the wall. For example there are literally dozens of units of pressure, there are at least four units of temperature and a huge number of  units for speed. While it is of course obvious to you the author that you are measuring pressure in lbs/square inch, you are likely to find that the rest of world that works on the SI system probably aren’t even aware that such an arcane unit exists. The bottom line: specify your units. My example now becomes:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. Units: bar */

Related to this is the requirement that the units are consistent with the variable name. For example, most of us would criticize a declaration that looks like this:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. Units: Celsius */

This is clearly wrong. However, what about this declaration:

uint16_t symbol_rate;    /* Symbol rate reported by demodulator. Units: bps */

What’s confusing about this declaration is that a symbol rate should have units of symbols per second. For the special case, where there is one bit per symbol, the symbol rate does happen to equal the bit rate, which is usually measured in bits per second or bps. Thus upon seeing a declaration such as this, I’m left in a quandary, as the following are all possibilities:

  1. I’m dealing with the special case where the symbol rate and bit rate are the same.
  2. The author was sloppy in his commenting and meant to write ‘sps’ rather than ‘bps’ and so the variable genuinely does reflect a symbol rate.
  3. The author was sloppy in his variable naming, such that the variable actually represents a bit rate with units of bps.

That’s a lot of confusion to sow for failure to ensure that the variable name and its units are consistent.

Resolution

Closely allied with units is the resolution or scaling. In a nutshell what does 1LSB of the variable represent? Again I find that this all too important parameter is deemed obvious by the author of the code. While it is common that 1 LSB = 1 unit, systems that need to maximize dynamic range will often use a different scaling. For example: 1 LSB = 0.0125 bar. The bottom line: if you don’t specify what 1LSB represents you are really doing future readers of your code a major disservice. It’s common to incorporate the resolution and units together, such that our example now becomes:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. 1 LSB = 0.0125 bar */

Dynamic range

The last thing that should be reported is the expected dynamic range of the variable. This covers a multitude of issues:

  1. What is the legal range of values? For example, consider the popular LM35 series of temperature sensors. These devices are inherently designed to report temperatures above zero Celsius. As such negative temperatures are not expected when using this sensor. Other sensors will of course have other physical limits which it’s important to report.
  2. What is the sensor offset? For example, absolute pressure sensors will typically have a non zero output when exposed to atmospheric pressure.

Thus it is essential that you specify the expected range. If you do so, the resolution can be implicit. However I still like to make it explicit. For example:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. 
                                    Range 0x0000 - 0x3FFF = 1.000 - 204.7875 bar. 
                                    1 LSB = 0.0125 bar. */

I urge you to take a look at your code and see if you are doing this for your real world variables. If you aren’t then consider making some changes. Future generations will thank you.

All variables are equal, but some are more equal than others

Tuesday, July 31st, 2012 Nigel Jones

With all due apologies to George Orwell for the title, I thought I’d offer a little tidbit on the practice of the following construct:

uint8_t a,b,c,d;
a = b = c = d = 0;

This code declares four variables (a,b,c,d) and sets them all equal to 0. The question is, is this a good, bad or indifferent practice? Well, I think it is an excellent practice in one very limited case, but otherwise should be avoided. Consider these two examples:

#define MAX_SPD (42U)
void fna(void)
{
uint8_t spd1 = MAX_SPD;
uint8_t spd2 = MAX_SPD;
...
}
void fnb(void)
{
uint8_t spd1, spd2;
spd1 = spd2 = MAX_SPD;
...
}

What difference, if any, is there between these two functions? Well clearly they both declare two variables and assign them the value MAX_SPD. However, I would suggest that there is a very subtle difference. In fna() there are two variables that happen to have the same initialized value, whereas in fnb() there are two variables that are initialized to the same value, which happens to be MAX_SPD. So what you ask? Well consider someone maintaining this code. For fna() all they know is that the two variables happen to be initialized to the same value, and thus a change such as this is perhaps a reasonable thing to do:

void fna(void)
{
uint8_t spd1 = MAX_SPD;
uint8_t spd2 = MAX_SPD - 1;
...
}

Conversely, for fnb() there is a barrier to and a subtle hint against making spd1 different from spd2. Thus if the algorithm requires that spd1 and spd2 always be initialized to the same value then the construct in fnb() is better. Conversely, if it is essentially happenstance that spd1 and spd2 share the same initial value then fna() is better.

To put it another way, if you find yourself using this construct to save on typing or lines of code then the chances are you are doing the wrong thing. Conversely if you find yourself doing this to impart a subtle hint then the chances are you are doing the right thing.

A comment on comments. One can (and should) argue that in the case of fnb() one should have a comment to the effect that spd1 and spd2 must be initialized to the same value. While I agree wholeheartedly, I always try and use code constructs that minimize reliance on someone actually reading the comments.

As a final thought. I have seem coding standards that ban this practice. If your coding standard does ban it, perhaps its time to revisit it?