embedded software boot camp

Evaluating embedded code

Sunday, June 20th, 2010 by Nigel Jones

One of the interesting aspects of being an embedded systems consultant is that I get to look at a lot of code written by others. This can come about in a number of ways, but most commonly occurs when someone wants changes made to an existing code base and the original author(s) of the code are no longer available. When faced with a situation such as this, it is essential that I quickly get a sense for how maintainable the code is – and thus how difficult changes will be. As a result I have developed a few techniques to help me assess code maintainability which I thought I’d share with you.

SourceMonitor

After installing the code on my system, the first thing I do is run SourceMonitor over the code. SourceMonitor is a free utility that computes various metrics. The metrics and their values for a typical code base of mine are shown below.

Number of Files: 476

Lines of code: 139,013

Statements: 61,144

% branches: 6.3%

% comments: 41.7%

Functions: 2,509

Average statements / function: 11.8

Max Complexity: 158

Max depth: 9+

Average depth: 0.54

Average complexity: 2.38

Probably the only thing that needs explanation is ‘complexity’. The author of SourceMonitor is not computing the McCabe complexity index, but rather is computing complexity based upon Steve McConnel’s methodology. The details of the implementation aren’t particularly important to me, as I’m more interested in comparative values.

While SourceMonitor helps give me the big picture, it is nowhere near enough – and this is where it gets interesting.

Optimization Level

The next thing I look at are the optimization levels being used. These can be very revealing. For example if a high level of optimization is being used for the debug build then it might be indicative that a non-optimized build either will not fit into the available memory, or possibly that the code doesn’t run fast enough unless optimization is turned on. Either is indicative of a system that is probably going to be tough to maintain. Conversely if the release build doesn’t use full optimization then I take this to mean that the code probably doesn’t work when optimization is turned on. I have written about this in the past and consider this to be a major indicator of potential code quality problems.

C-V Qualifiers

Having looked at the optimization levels, I then perform a grep on the code base looking for the number of instances of ‘volatile‘ and ‘const‘. If the number of instances of volatile is zero (and it often is) and the optimization level is turned way down, then it’s almost certain that the author of the code didn’t understand volatile and that the code is riddled with potential problems. Whenever this happens, I get a sinking feeling because if the author didn’t understand volatile, then there is no chance that he had any appreciation for race conditions, priority inversion, non-atomic operations etc. In short, the author was a PC programmer.

The ‘const‘ count is less revelatory. If the author makes use of const then this is normally an indicator that they know their way around the compiler and understand the value of defensive programming. In short I take the use of const to be very encouraging. However, I can say that I have known some excellent embedded systems programmers who rarely used const, and thus its absence doesn’t fill me with the same despair as the absence of volatile.

Incidentally in my code base described above, there are 53 incidences of the use of ‘volatile’ (note that I have excluded compiler vendor supplied header files which define all the various hardware registers as volatile). There are also 771 incidences of the the use of const.

Static qualifiers

Regular readers of this blog will know I am a big fan of the ‘static‘ qualifier. Static not only makes for safer and more maintainable code, it also makes for faster code. In fact, IMHO the case for static is so overwhelming that I find its absence or infrequent use a strong indicator that the author of the code was an amateur. In my example code base, static appears 1484 times.

Case statements

Regular readers of this blog also know that I am not a big fan of the case statement. While it has its place, too often I see it used as a substitute for thought. Indeed I have observed a strong inverse correlation between programmer skill and frequency of use of the case statement. As a result, I will usually run a grep to see what the case statement frequency is. In my example code, a case statement occurs 683 times, or once every 90 statements.

Compilation

All of the above ‘tests’ can be performed without compiling the code. In some cases I own the target compiler (or can download an evaluation copy), in which case I will of course attempt to compile the code. When I do this I’m looking for several things:

  1. An absence of compiler warnings / errors. Alan Bowens has written concisely and eloquently on this topic. The bottom line – compilation warnings in the release build are a major issue for me. Note that I’m more forgiving of compiler warnings in the debug build, since by its nature debug often ignores things such as inline commands, which can generate warnings on some compilers.
  2. The compilation speed. Massive files containing very large functions compile very slowly. They are also a bear to maintain.
  3. The final image size. This is relevant both in absolute terms (8K versus 128K versus 2M) and also in comparison to the available memory. Small images using a small percentage of the available memory are much easier to maintain than large images that nearly fill the available memory.

Lint

The final test that I perform only rarely is to Lint the code base. I do this rarely because quite frankly it takes a long time to configure PC-Lint. Thus only if I have already created a PC-Lint configuration file for the target compiler do I perform this step. Previously un-linted code will always generate thousands of warnings. However, what I’m looking for are the really serious warnings – uninitialized variables, indexing beyond the end of an array, possible null pointer dereferences etc. If any of these are present then I know the code base is in bad shape.

I can typically run the above tests on a code base in an hour or so. At the end of it I usually have a great idea of the overall code quality and how difficult it will be to modify. I would be very interested to hear from readers that are willing to perform the same tests on their code base and to publish the results. (Incidentally, I’m not trying to claim that my metrics are necessarily good – they are intended merely as a reference / discussion point).

Tags: ,

31 Responses to “Evaluating embedded code”

  1. Anders says:

    Since I’m not really a programmer anymore I don’t have any code metrics to share. But I have another observation on code (or rather ‘programmer’ ) quality that is not quite so easy to measure, but still can be worthwhile to look for:
    Loop nesting – I’ve stumbled upon too many triple-nested loops to consider it a coincidence that almost always a triple-nested loop is a bad solution to the problem it intends to solve. And if there is at least one such ‘bad’ nested loop in the code base it’s an indication that the programmer is not really paying attention to what he is doing.

    Of course there are situations where a triple loop is the only way, due to well-known algorithmic or speed constraints etc, but it seems to me at least that almost always is it possible to simplify such loops; either by solving the problem in a more efficient manner or by hiding one or two nesting levels behind function calls. (Surprisingly often in such cases the innermost loop performs functionality already available in the standard C library, like memcpy or similar…)

    • Nigel Jones says:

      I agree Anders. The triple nested loop is always a bear to maintain and can usually be avoided using some of the techniques you mentioned. Even if the construct is unavoidable (and hence is not a reflection on the programmer), it is still indicative of code that is difficult to maintain.

  2. Vaibhav says:

    What do these tell you?

    Parameter Value
    ========= =====
    Project Name test
    Checkpoint Name Baseline
    Created On 21 Jun 2010, 13:33:32
    Files 2
    Lines 11,749
    Statements 6,830
    Percent Branch Statements 32.3
    Percent Lines with Comments 35.6
    Functions 107
    Average Statements per Function 99.6
    Line Number of Most Complex Function {undefined}
    Name of Most Complex Function get_resp()
    Complexity of Most Complex Function 467
    Line Number of Deepest Block {undefined}
    Maximum Block Depth 9+
    Average Block Depth 3.94
    Average Complexity 18.67

    • Nigel Jones says:

      On the assumption that the 2 files encompass the entire project then you should be breaking the code up into more files. The general rule is that one module does one thing. For example code to use a serial EEPROM would be split into two or three modules – one to handle the low level communications, one to handle ‘file’ operations and a top level one that provides the API to the rest of the code.

      Your average function length is too long. At 99.6 statements per function, your functions are nearly ten times as long as mine. Your code will compile faster, will have fewer bugs and will be easier to maintain if you get in the habit of using much smaller functions. This is reinforced by the fact that your average function complexity at 18.67 is 7.8 times greater than mine.

      The final thing that struck me was that a third of your statements are branch statements. In my case it’s a mere 6%. For an explanation of why my percentage is so low, you might want to check this out.

      I hope you found this useful.

  3. Gauthier says:

    Hi!
    I’m pretty sure you are aware of the “miscompiled volatile” issue: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
    I suspect that this could be the cause of some projects not using optimisation, what do you think?
    Then of course, even if this problem forces me to shut off optimisation, I’d still mark the variables as volatile…

    • Nigel Jones says:

      Indeed I am. John Regehr and I correspond from time to time and have even co-written a blog post which partly addresses this issue. Notwithstanding this, as you point out, it’s still no excuse for not using volatile. Personally I wish everyone in the industry would complain very loudly every time their compiler fails to do its job, such that this becomes an issue of the past.

      • Gauthier says:

        I remember you posted about compilers, and that you were not a big fan of gcc (sorry if this is wrong). Do your preferred compilers deal with volatile correctly? If not, what are you doing about it? From this post it seems you use optimisation, but how do you make sure volatile variables are checked? I find the solution proposed by John Regehr quite cumbersome (but got no better alternative of course).

        • Nigel Jones says:

          I have no problem with gcc per se, other than the fact it is unsupported – and unsupported compilers are too costly in terms of my time. Anyway I use IAR compilers for most of my work and yes they seem to handle volatile correctly. For example see this post. Would IAR compilers pass John Regehr’s torture test – I don’t know. However I expect they would do very well based upon my experience. To check that the compiler does what it is supposed to, I simply look at the code that is generated. Incidentally this is one of the reasons that I think it’s essential that one have at least a basic understanding of the assembly language for a target processor.

          • John Regehr says:

            My educated but otherwise uninformed guess is that the IAR products do well at volatile, because their customers care about it. I’d also guess that we would find some silent wrong-code bugs in these compilers, because no compiler we’ve tested is free of them.

          • Nigel Jones says:

            Perhaps you should offer to test them John? I know folks from IAR read this blog and so I presume they know about your work. It would seem to me that it would be a minor, albeit still significant, marketing coup for the first compiler that can claim to pass your battery of tests.

          • John Regehr says:

            Hi Nigel- I’d like to do this but don’t know people at IAR. Perhaps if you have suitable contacts you could let me know offline who to talk to? I’d had fairly bad luck cold-calling (cold-emailing?) compiler people. They tend to interpret “I can find bugs” as “I can create useless work for your busy compiler team” :). My view is that all good compiler people want their compilers to be more correct, but in practice people are busy and perhaps these bugs are not causing lost tool sales, if you see what I mean.

          • Gauthier says:

            Do you need their help or consent to analyse the volatile behaviour of their compiler? I thought you were analysing the assembly output in different cases.
            Although I would understand if you did not want to do it without them hiring you for it.

          • Nigel Jones says:

            I can’t see why we would need their consent. However their help is indeed useful. I’m also happy to report that IAR was enthusiastic about working with John Regehr. I suspect that John will be reporting on the results at some point.

        • Ashleigh says:

          Part of the reason volatile works reasonably well on IAR is the bugs I reported in their compiler a few years back 🙂

          I recently did a benchmarking of GCC for MSP430 vs IAR – compiling the same code (about 600 lines of source) which included an interrupt handler. For both compilers I set maximum optimisation (for smallest code side). I was kind of stunned by the results. The code side generated for GCC was about 50% larger than IAR. This was all confirmed by inspecting the generated code for each and counting bytes.

          While at it I checked a number of other compilers and was astonished to find that some of them can’t even generate a proper compiler listing file. How on earth can a serious embedded engineer work with a compiler where you can’t see what it does? (Looking at the mnemonics does not count, I want to see the byte stream and the size / offset from the start for every function.)

          • Nigel Jones says:

            I’m not sure if you are regular reader of this blog. However ‘volatile’ is a topic that I frequently touch upon. I thought this post was entertaining. I also from time to time discuss compiler selection and have expressed the opinion that IAR compiler’s are really very good and that GCC is not for me.

            BTW, I’m really enjoying your comments!

  4. John Regehr says:

    Also I like this article. My students will read it this Fall.

  5. JB says:

    Sounds fun, I’ll play! This was the project I was working on this morning (not something cherry picked):

    Metrics Summary For Checkpoint ‘Baseline’
    ——————————————————————————————–

    Parameter Value
    ========= =====
    Project Directory
    Project Name
    Checkpoint Name Baseline
    Created On 22 Jun 2010, 20:48:40
    Files 101
    Lines 11,566
    Statements 3,888
    Percent Branch Statements 18.6
    Percent Lines with Comments 41.1
    Classes Defined 2
    Methods Implemented per Class 5.00
    Average Statements per Method 9.8
    Line Number of Most Complex Function {undefined}
    Name of Most Complex Function
    Maximum Complexity 34
    Line Number of Deepest Block {undefined}
    Maximum Block Depth 9+
    Average Block Depth 1.34
    Average Complexity 4.13
    Functions 190

    ——————————————————————————————–
    Block Depth Statements

    0 1,336
    1 1,216
    2 592
    3 399
    4 174
    5 129
    6 35
    7 5
    8 1
    9+ 1
    ——————————————————————————————–

    I suspect a few source files are blowing up my branch percentages, due to some pseudo-polymorphic C switch tables. Otherwise, it compiles with 0 warnings and no lint errors. (My lint config is pretty strict.) Thoughts?

    • Nigel Jones says:

      Very interesting. Your metrics look very good. However I’m most intrigued by the fact that this is a C++ project. I wonder to what extent its C++ that naturally lends itself to you having very small files (you have an average of 114 lines per file) and also very low complexity, or whether it’s your coding style? SourceMonitor appears to add a complexity ‘point’ per case statement, so a function with a large switch construct is flagged as being very complex. It’s reasonably well known that well written OOP allows one to virtually dispense with switch statements (which is supported by your last statement) and so this may be the reason your complexity values are so low. Anyway the bottom line for me is that:
      1. I would have zero qualms about having to maintain your code as it would clearly be a cake walk. It’s not often I say that!
      2. This is potentially a good case for why we should be using an OOP language instead of C.

      If you have an equivalent project written in C that you’d care to share with us JB, then I would be fascinated to see the results. Thanks for commenting!

      • JB says:

        I wouldn’t consider it a well written or easy to maintain codebase at all. As a matter of fact, I find it rather embarassing. It’s pretty much all C with some small experimental bits in C++ (two true classes, some use of default args, but no serious object orientation). Even though it’s running under a realtime kernel, there’s still quite a bit of global variable action, no variable/function naming conventions are in use, very few variables are declared as static (but many could be), and most of those globals are not declared volatile (oh, the horror).

        On the positive side, PC-Lint has led me to reduce a lot of the casting and mark args as const when possible. I’m still not happy with the amount of casting to deal with signed/unsigned, but that’s tied to an old library and I’ve not had time to dig in and fix it.

        I do think it’s a fun tool, even if it can spit out misleading results. Maybe it would be better used to monitor a project over time for improvements, rather than as a one time check. I’d also like to run it over our entire codebase, but I didn’t see any easy way to script it from a command line. Pointing and clicking to build over 100+ projects just didn’t sound appealing.

      • JB says:

        Project Name rio dsp vile
        Files 37 23 28
        Lines 2,803 2,843 6,690
        Statements 701 892 4,079
        Percent Branch Statements 12.8 21.0 21.8
        Percent Lines with Comments 50.1 36.4 11.4
        Functions 65 49 199
        Average Statements per Function 11.6 23.0 21.0
        Complexity of Most Complex Function 11 37 139
        Maximum Block Depth 4 7 9+
        Average Block Depth 0.95 1.90 1.69
        Average Complexity 2.52 4.39 5.87

        Block Depth Statements
        0 274 231 1,374
        1 258 267 1,039
        2 112 134 638
        3 47 69 321
        4 10 55 243
        5 0 86 251
        6 0 45 134
        7 0 5 59
        8 0 0 14
        9+ 0 0 6

        These three are plain ol’ C, but are much smaller than the first project (so it’s not really a fair fight). The first two are quite small and simple. #1 is a really tiny microcontroller project. #2 does some funky signal processing, and hasn’t had near the polish–I consider it still somewhat rough and experimental (and the stats bear this out).

        #3 isn’t mine, but I had to work with it. It’s much, much, worse than the numbers would indicate. Trying to understand and reverse engineer it was so painful at times I thought I would break down and cry. It’s the kind of code that makes you angry enough to leave work early. (I think the blocks of commented out code and machine generated tables are skewing the comment stats in this one.)

        I’ve got a decently complicated project coming up. Maybe I’ll track it with this app and see how the numbers change over time. (Worst case, I can go back later and run it on each CVS commit to chart the trends.)

        • Nigel Jones says:

          Thanks for the information JB. Could you also run a grep on const, volatile, static and case? I ask because it would be very interesting to see what metric (if any) most closely correlates with your experience as to what was actually the worst to maintain.

          • JB says:

            Sure, why not:

            *.c:
            const: 15 25 50
            volatile: 0 0 2
            static: 17 28 40
            case: 0 33 49

            *.h
            const: 10 15 0
            volatile: 0 0 0
            static: 0 3 0

            Project “vile” was seriously space constrained and had (too!) many initialized tables in ROM.

            Is there a metric for meaningless and duplicated identifers? (write(float* Who), Dog(), do_it(char Who), etc.)

  6. GroovyD says:

    Here is a folder of code I work on myself…

    Checkpoint Name Baseline
    Created On 25 Jun 2010
    Files 71
    Lines 10,835
    Statements 3,926
    % Branches 18.1
    % Comments 31.7
    Class Defs 0 (? i use struct for all classes as i am not interested in hiding any members during development)
    Methods/Class 14.42
    Avg Stmts/Method 10.6
    Max Complexity 65
    Max Depth 8
    Avg Depth 1.43
    Avg Complexity 3.69
    Functions 89

    How does this look?

    • Nigel Jones says:

      Pretty good. Your commenting percentage is a bit low (but still reasonable) and may well be justified because your average complexity is quite low. I like the fact that your average statements per method is low.

  7. Lundin says:

    I think that this method works for detecting utterly horrible, broken code. For anything less coarse, I don’t think it will work.

    I have one example of two programs. They are both running on the same hardware, they are nearly identical in size and functionality (safety-critical field bus systems). But the code quality is very different.

    One of the programs contains excellent code. Very easy to read, throughoutly tested with static analyzers and with full MISRA-C compliance. The other program contains sloppy, complex and fairly unreadable/unmaintainable code, with lots of MISRA-C violations.

    Results below. I stripped away all search hits of volatile that directly referred to hardware register maps, as that seems highly irrelevant to the quality of the program. Also, both programs are dealing with dip-SWITCHes, so the number of switch hits is mostly from code comments regarding dip-switches… Lesson learnt from that is that one should probably only run “grep” on raw pre-processor output.

    Files: 36
    Lines: 14,269
    Statements: 5,227
    % Branches: 10.9
    % Comments: 40.6
    Functions: 171
    Avg stms/func: 19.3
    Max complexity: 69
    Max depth: 9+
    Avg depth: 0.86
    Avg complexity: 4.25
    volatile: 41
    const: 197
    static: 117
    switch: 20

    Files: 41
    Lines: 15,427
    Statements: 5,866
    % Branches 12.8
    % Comments 38.2
    Functions: 189
    Avg stms/func: 22.3
    Max complexity: 133
    Max depth: 9+
    Avg depth: 1.09
    Avg complexity: 4.87
    volatile: 43
    const: 201
    static: 171
    switch: 23

    As you can see, the results are almost identical. Just by the metrics, it is impossible to tell which one that is the good program (it is the 1st one).

  8. Took me time to read all the comments, but I really enjoyed the post. It proved to be really useful to me and I’m sure to all of the commenters right here! It’s usually good when you can not only be informed, but additionally engaged! I am certain you had joy writing this article.

  9. Patrick says:

    I only just came across this interesting post. I work on a reasonably sized embedded controller project and thought I might give this tool a run for your interest. When doing this type of checking I also like to do a quick grep for goto statements.

    Here’s the stats:

    Metrics Summary For Checkpoint ‘Baseline’
    ——————————————————————————————–

    Parameter Value
    ========= =====
    Project Directory Z:\home\patrick\src\e7\prex\
    Project Name e7
    Checkpoint Name Baseline
    Created On 11 Apr 2011, 12:42:03
    Files 2071
    Lines 320,507
    Statements 151,780
    Percent Branch Statements 15.6
    Percent Lines with Comments 25.7
    Classes Defined 82
    Methods Implemented per Class 5.93
    Average Statements per Method 7.1
    Maximum Complexity 109
    Maximum Block Depth 9+
    Average Block Depth 1.37
    Average Complexity 4.44
    Functions 5,961

    ——————————————————————————————–
    Block Depth Statements

    0 32,735
    1 67,619
    2 25,782
    3 17,484
    4 5,092
    5 2,111
    6 749
    7 160
    8 39
    9+ 9
    ——————————————————————————————–

    const: 5635
    volatile: 209
    static: 4995
    case: 4307
    goto: 1891

    • Nigel Jones says:

      1891 incidences of goto – what are the authors doing to justify that number?

      • Patrick says:

        The vast majority of them are in code that looks like:


        sched_lock()
        if (error_1)
        goto out;

        if (error_2)

      • Patrick says:

        The vast majority of them are in code that looks like:


        sched_lock()
        if (error_1) {
        err = EINVAL;
        goto out;
        }

        if (error_2) {
        err = ENOMEM;
        goto out;
        }

        // real work happens here

        out:

        sched_unlock();
        return err;

  10. Michael Tempest says:

    In general, I like your list for evaluating code quality.

    Metrics don’t tell the whole story, but they tell more than most engineers will admit. I have reviewed many pieces of code with terrible metrics. In almost all cases, the engineers responsible knew about the metrics and tried to justify them, rather than recognise that there was a problem. The most common justification was “There is no other way to do it”, and we almost always found a better (faster, simpler, more readable, more maintainable, more robust) way. And the metrics were better too.

    My experience of volatile and compiler optimizations is different, possibly because of my background. I have tended to write software for low-volume production runs (less than 100 systems to be built), for safety-related airborne systems. Total procurement cost is relevant for these systems but minimal hardware cost is not, so it isn’t unusual to have way more computing resources than strictly necessary. That can make compiler optimization irrelevant. (Even so, several of these systems did grow to use all available resources. Some things are universal :). )

    For a start, I follow the “fly what you test, test what you fly” principle, so I do not have separate debug and release builds i.e. there is only one set of compiler options.

    Compiler optimizations are frowned upon by (my) certification authorities, so I generally disable optimizations. If I don’t actually need them, then that is one less thing I may have to argue about/motivate later.

    In this context, changing compiler options invalidates previous test results. Those test results are valuable, so if I reuse code from another project, I will also try to reuse the compiler options. Similarly, I generally won’t change compiler options after formal testing has started.

    In order to maximize reuse in this context, it helps if there is a standard set of compiler options. This worked very well in practice, but it meant that a set of compiler options for a given processor was frozen (flaws and all) for more than 10 years.

    I’ve also found that compiler bugs manifest more often when the optimization level is increased (e.g. I saw variables being stored in registers and then having those registers used for intermediate values, thus corrupting the variables). It may be that I encountered this more than is typical for the embedded software industry because I tended to be using older compilers, in large part because moving to a new compiler (like changing compiler options, but more so) entailed reverifying all reused code. It is not uncommon for me to use a particular compiler version long after the supplier ceased support for that version.

    As a result of all of these factors, most the systems I’ve worked on tended to have most optimizations disabled. Some were good and others not (yes – it is possible to develop horrible code that works and is certifiably safe), yet you could not tell the difference by looking at the level of compiler optimizations.

    Also, most of those systems use volatile sparsely. That may be a consequence of long-term development with compiler optimizations disabled, but there are other contributing factors (like architectural features that eliminated the need for most regular uses of volatile). Whatever the cause, you could not tell the good code from the bad by looking for volatile.

    I would add a few process-related checks to add to your list. Neither is conclusive, but both add to the picture.
    * Was the code routinely subject to static analysis during development? If yes, then the code is probably better for it.
    * Were unit tests ever developed for this code? If yes, then the code is generally better for it, even if the unit tests were not subsequently maintained (unfortunately, I’ve seen that happen, mostly because of poorly-written unit tests).
    * Was the code subject to meaningful peer review? The meaningful part is important, and my idea of a meaningful review isn’t totally consistent with certification requirements. Rubberstamp reviews that only generate a paper-trail are meaningless, as are reviews done after the code is tested (“But it works, so xxx can’t be a problem…”). In contrast, informal reviews from which no records are kept can lead to vastly superior code (and ironically they are worthless from a certification perspective).
    * Was source control used? “Yes” does not mean good code, but “No” is a warning. I didn’t expect this to have any bearing on code quality, but I have seen that it does. Perhaps it is because source control is sometimes voluntary (even when it should not be!), and someone who cares to put their code into source control also takes more care when coding?

    I can see that it isn’t always possible to ask these questions, but they can help paint the picture.

    Michael

Leave a Reply to GroovyD