embedded software boot camp

Effective C Tip #8 – Structure Comparison

Friday, December 4th, 2009 by Nigel Jones

This is the eighth in a series of tips on writing effective C. Today’s topic concerns how best to compare two structures (naturally of the same type) for equality. The conventional wisdom is that the only safe way to do this is by explicit member by member comparison, and that one should avoid doing a straight bit (actually byte) comparison using memcmp() because of the problem of padding bytes.

To see why this argument is advanced, one must understand that a compiler is free to place pad bytes between members of a structure so as produce more favorable alignment of the data in memory. Furthermore, the compiler is not obligated to initialize these pad bytes to any particular value. This code fragment illustrates the problem:

typedef struct
{
 uint8_t x;
 uint8_t pad1;  /* Compiler added padding */
 uint8_t y;
 uint8_t pad2;  /* Compiler added padding */
} COORD;
void foo(void)
{
 COORD p1 p2;
  p1.x = p2.x = 3;
  p1.y = p2.y = 4;
  /* Note pad bytes are not initialized */
  if (memcmp(&p1, &p2, sizeof(p1)) != 0)
  {
    /* We may get here */
  }
 ...
}

Thus, it’s clear that to avoid these kinds of problems, we must do a member by member comparison. However, before you rush off and start writing these member by member comparison functions, you need to be aware of a gigantic weakness with this approach. To see what I mean, consider the comparison function for my COORD structure. A reasonable implementation might look like this:

bool are_equal(COORD *p1, COORD *p2)
{
 return ((p1->x == p2->x)  &&  (p1->y == p2->y));
}
void foo(void)
{
 COORD p1 p2;
 p1.x = p2.x = 3;
 p1.y = p2.y = 4;
 if (!are_equal(&p1, &p2))
 {
   /* We should never get here */
 }
 ...
}

Now consider what happens if I add a third member z to the COORD structure. My structure definition and function foo() become:

typedef struct
{
 uint8_t x;
 uint8_t pad1;  /* Compiler added padding */
 uint8_t y;
 uint8_t pad2;  /* Compiler added padding */
 uint8_t z;
 uint8_t pad3;  /* Compiler added padding */
} COORD;
void foo(void)
{
 COORD p1 p2;
 p1.x = p2.x = 3;
 p1.y = p2.y = 4;
 p1.z = 6;
 p2.z = 5;
 if (!are_equal(&p1, &p2)
 {
  /* We will not get here */
 }
 ...
}

The problem is that I now have to remember to also update the comparison function. Now clearly in a simple case like this, it isn’t a big deal. However, in the real world where you might have a 500 line file, with the comparison function buried miles away from the structure declaration, it is way too easy to forget to update the comparison function. The compiler is of no help. Furthermore it’s my experience that all too often these sorts of problems can exist for a long time before they are caught. Thus the bottom line, is that member by member comparison has its own set of problems.

So what do I suggest? Well, I think the following is a reasonable approach:

  1. If there is no way that your structure can change (presumably because of outside constraints such as hardware), then use a member by member comparison.
  2. If you are working on a system where structure members are aligned on byte boundaries (which is true to the best of my knowledge for all 8 bit processors, and also most 16 bit processors), then use memcmp(). However, you need to think about doing this very carefully if there is the possibility of the code being ported to a platform where alignment is not on an 8 bit boundary.
  3. If you are working on a system that aligns on a non 8 bit boundary, then you must either use member by member comparison, or take steps to ensure that all the bytes of a structure are initialized using memset() before you start assigning values to the actual members. If you do this, then you can probably use memcmp() with a reasonable amount of confidence.
  4. If speed is a priority, then clearly memcmp() is the way to go. Just make sure you aren’t going to fall into a pothole as you blaze down the road.

Before I leave this topic, I should mention a few esoteric things for you to consider.

If you use the memcmp() approach you are checking for bit equality rather than value equality. Now most of the time they are the same. Sometimes however, they are not. To illustrate this, consider a structure that contains a parameter that is a boolean. If in one structure the parameter has a value of 1, and in the other structure it has a value of 2, then clearly they differ at the bit level, but are essentially the same at a value level. What should you do in this case? Well clearly it’s implementation dependent. It does however illustrate the perils of structure comparison.

Finally I should mention issues associated with structures that contain pointers. CS guys like to distinguish between deep and shallow structure comparison. I rarely write code where a deep comparison is required, and so for me it’s mostly a non-issue.

Next Tip

Previous Tip
Home

7 Responses to “Effective C Tip #8 – Structure Comparison”

  1. Anonymous says:

    I came here after the answer to a different question, but your structure comparison question is interesting so I'll comment on that first.First up, I haven't used C for a long time and have no experience with embedded stuff (more on that later) and have never worried about compiler padding. That out of the way, I'd probably get the pooter to define the structure and comparison functions, my personal preference being with something like M4 at first.So, instead of defining struct + function, perhaps you'd use something like this (my idiosyncratic M4; it doesn't have to look like this. Also, entirely off top of my head so not tested. Also comments here follow m4_dnl, don't ask why)m4_defstruct([:COORD:], m4_dnl structure name[: m4_dnl start pairs/triples of components[:uint8_t, x:],[:uint8_t, y:]:] m4_dnl end pairs/triples of components:]) m4_dnl end macrowith suitable definitions i will get this to unfold into a structure + a suitable function. Add another pair/triple and it all gets updated.I mentioned triples above because it's possible that structure components won't always be == comparable and would need a suitable comparison function, in which case you'd write something like:[:fancystruct, fncy, fancystructs_equals_func:]Note that a macro can also expand into extra stuff like a structure copier and/or cloner function if required, and a whole load of other stuff.Have used this technique to generate compiler unit tests + plenty of other stuf.If M4 gets too painful, write a proper DSL. Actually I really don't understand while this approach isn't used more often. People do so much by hand which is the job of the machine. Note that this is all off the top of my head, can try to do it properly if you want to contact me (will provide email if asked).Right, about my 2nd question, which I can't find any answer to: real-time handlers. I've always wondered how one writes such a handler to have 1) a guarantee of response within a given time, and 2) guessing there must be a limit to the number of these handlers that an OS can support.On 1) I'm guessing that handlers must be non-virtual-memory-pageable (well duh) but beyond that you assume that everywhere possible you get the worst case eg. perhaps one instruction in the pipeline + rest bubbles, branch misprediction always, L0/1/2 cache misses always so ram read penalty every time, etc. Then you count this cost down all paths and pick the worst. This is the guaranteed worst case.Is this how it's done? Does the OS have to be given this worst case info, because to address point 2) the os cannot support and infinite number of these (duh again) so it must budget for some and reject the rest if they try to install themselves. I'm also guessing as point 3) that RT handlers are almost always very simple – true?

  2. Anonymous says:

    dammit, when I meant struct + function or similar, I meant the comparison function you required. Unclear, sorry.

  3. Nigel Jones says:

    Let's get the answer to your question out of the way first. You've used the term 'handler'. It's not clear whether you mean a 'task' or an 'interrupt service routine'. For tasks, one can use Rate Monotonic Analysis (see http://en.wikipedia.org/wiki/Rate-monotonic_scheduling) to ensure that a task always meets its deadline.For interrupt service routines (which is what I think you are asking about), then the job gets a lot harder. The response time to an ISR is usually dominated by two things:1. How long interrupts get disabled by the operating system.2. The execution time of higher priority interrupts In the first case, an operating system needs to disable interrupts in order to do certain things such as task switching. All good real time operating systems will specify the worst case interrupt disable time. However if you are running on a non real time OS (such as Windows) then I suspect you are going to find it very hard to obtain the interrupt disable time.In the second case, most CPUs impose an interrupt priority structure. It is common (but not universal) for a higher priority interrupt to be be able to interrupt a lower priority interrupt. Thus the worst case response time for an interrupt of a given priority based upon this criterion is the execution time of all the interrupt service routines that are of a higher priority. Note that in some extreme cases, it's actually possible for a very high priority routine to trigger multiple times while a lower priority routine is waiting.Once you get past these two roadblocks, then you are into the realm of things such as how long it takes a CPU to recognize an interrupt, the number of registers it has to stack (I wrote an article on this – you'll find it at http://www.embedded.com/columns/technicalinsights/196800005?_requestid=1093456 ), and then the actual execution time of the instructions, since almost by definition, the ISR will not be in the instruction cache, and so the cache will have to flushed and loaded with the ISR code.As you can see, writing an interrupt service routine such as to guarantee a certain response can be very challenging indeed. It's part of what makes embedded systems either fun, or a nightmare, depending upon your perspective.As to whether there is a limit to the number of handlers. Well clearly there is. It's surprisingly easy to make a CPU 'interrupt bound'. That is it spends its entire time getting in and out of ISRs and as a result never gets any other work done. Inexperienced embedded folks often find themselves in this unhappy situation.Finally, to your point about real time handlers being simple. As a general rule – yes. The name of the game is usually to get in, do the least work possible, and push the remainder of the work off onto non real time code. The division of work between the interrupt handler and foreground tasks is one of the bigger challenges in embedded systems.OK, with that out of the way, let's turn to your comments on structure comparison. I don't disagree with you. C just doesn't have these sorts of capabilities. Now you can get close to what you want to do with C++, but it still isn't automated. Now for me, using a macro processing language such as M4 in order to generate / maintain embedded code brings in more problems than it's worth. However, it's an issue of personal preference without a great argument to back it up. Anyone else have an opinion on this?

  4. Anonymous says:

    (2nd try, didn't go through)Thanks for your reply. Yes I was using the term handler to mean ISR (and ta for the ref to the RMA algorithm; not my usual line of work but the more I know the better).Your reply is shows some intimidating issues I'd not appreciated and I feel my perspective of this line of work tends much towards the nightmare end. I'll stick to databases & data processing, thanks. But you have answered my core questions, thanks.Going back to the M4 question, I'll elaborate slightly. There's no need to use M4; use whatever's appropriate. If you need a domain specific language, go for it. I happen to use M4 as that because it suits me at the mo but M4 isn't the answer, it's /an/ answer. The underlying technique is the importance, not the particular tool (maybe you can use the c preprocessor? Maybe too inflexible).Regarding the 'more problems than it's worth', my experience (I've a fair bit with M4 but NOT in C) is that it introduces problems but they are emphatically worth it. It's frustrating when a macro decides to play up suddenly – there's never a shortage of aggro with M4, sadly – but when it works it *really* works. Things fly. I've used it to generate test cases for past projects, and a present one – clear one line macros become comprehensive 10-line tests – and I can reuse some of the larger test macros to generate much larger test case lumps – great! I actually use it as an IDL to generate javascript for a library, the idea being that later I can modify the macros and spit out another lang. Not got that far so we'll see, but even ignoring that I can spit out fully asserted code (with dirty detail nicely concealed), or no assertions, or in future partial assertions, or tracing code (normally you'd rely on the debugger but MS' JS debugger stinks), and hide one of MS' rather nasty JS bugs, and more.Summary: my experience says that once you're familiar with this technique and use it creatively and intelligently and appropriately and not to excess, it's painful but significantly less painful than not using it. I am sure this experience will extend from javascript + various test case generations to C.

  5. Anonymous says:

    Why didn't you mention the obvious and oft-used approach of using memset? E.g. memset(&p1, 0, sizeof(COORD));

  6. Nigel Jones says:

    I did. See point 3. Granted I didn't exactly sky write it!

Leave a Reply to Nigel Jones

You must be logged in to post a comment.