embedded software boot camp

Apple’s #gotofail SSL Security Bug was Easily Preventable

Monday, March 3rd, 2014 by Michael Barr

If programmers at Apple had simply followed a couple of the rules in the Embedded C Coding Standard, they could have prevented the very serious `Gotofail` SSL bug from entering the iOS and OS X operating systems. Here’s a look at the programming mistakes involved and the easy-to-follow coding standard rules that could have easily prevent the bug.

In case you haven’t been following the computer security news, Apple last week posted security updates for users of devices running iOS 6, iOS 7, and OS X 10.9 (Mavericks). This was prompted by a critical bug in Apple’s implementation of the SSL/TLS protocol, which has apparently been lurking for over a year.

In a nutshell, the bug is that a bunch of important C source code lines containing digital signature certificate checks were never being run because an extraneous goto fail; statement in a portion of the code was always forcing a jump. This is a bug that put millions of people around the world at risk for man-in-the-middle attacks on their apparently-secure encrypted connections. Moreover, Apple should be embarrassed that this particular bug also represents a clear failure of software process at Apple.

There is debate about whether this may have been a clever insider-enabled security attack against all of Apple’s users, e.g., by a certain government agency. However, whether it was an innocent mistake or an attack designed to look like an innocent mistake, Apple could have and should have prevented this error by writing the relevant portion of code in a simple manner that would have always been more reliable as well as more secure. And thus, in my opinion, Apple was clearly negligent.

Here are the lines of code at issue (from Apple’s open source code server), with the extraneous goto in bold:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, ...)
{
    OSStatus  err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

The code above violates at least two rules from Barr Group‘s Embedded C Coding Standard book. Importantly, had Apple followed at least the first of these rules, in particular, this dangerous bug should almost certainly have been prevented from ever getting into even a single device.

Rule 1.3.a

Braces shall always surround the blocks of code (a.k.a., compound statements), following if, else, switch, while, do, and for statements; single statements and empty statements following these keywords shall also always be surrounded by braces.

Had Apple not violated this always-braces rule in the SSL/TLS code above, there would have been either just one set of curly braces after each if test or a very odd looking hard-to-miss chunk of code with two sets of curly braces after the if with two gotos. Either way, this bug was preventable by following this rule and performing code review.

Rule 1.7.c

The goto keyword shall not be used.

Had Apple not violated this never-goto rule in the SSL/TLS code above, there would not have been a double goto fail; line to create the unreachable code situation. Certainly if that forced each of the goto lines to be replaced with more than one line of code, it would have forced programmers to use curly braces.

On a final note, Apple should be asking its engineers and engineering managers about the failures of process (at several layers) that must have occurred for this bug to have gone into end user’s devices. Specifically:

  • Where was the peer code review that should have spotted this, or how did the reviewers fail to spot this?
  • Why wasn’t a coding standard rule adopted to make such bugs easier to spot during peer code reviews?
  • Why wasn’t a static analysis tool, such as Klocwork, used, or how did it fail to detect the unreachable code that followed? Or was it users of such a tool, at Apple, who failed to act?
  • Where was the regression test case for a bad SSL certificate signature, or how did that test fail?

Dangerous bugs, like this one from Apple, often result from a combination of accumulated errors in the face of flawed software development processes. Too few programmers recognize that many bugs can be kept entirely out of a system simply by adopting (and rigorously enforcing) a coding standard that is designed to keep bugs out.

Tags: , , , , , , , , ,

60 Responses to “Apple’s #gotofail SSL Security Bug was Easily Preventable”

  1. Lundin says:

    Well that’s just silly. How could they even have tested that function? Wouldn’t it always fail?

    I wouldn’t necessary label the use of goto in this case bad though. They always use it to jump downwards, to an error handler, and not for spaghetti-making purposes. But of course, it might have been better to return from the function with an error code, one per individual error, so you would get better diagnostics. And then you can easily do a “quick & dirty” code coverage test yourself: while testing the program, try to invoke every error possible and see what happens.

    But the problem isn’t goto, it isn’t even the missing braces… it is the complete lack of a software development procedure. Whoever is in charge of that programming team needs to get fired.

    Its funny though, how my dev team consisting of just a few people in a small company, have a development process which is apparently light years ahead of giant companies like Apple and Toyota…

    • Michael Barr says:

      Your point about the gotos all going to a single label at the bottom is fair, but then why is the label called fail. That code is always executed and thus it’s more of a cleanup or final section. Thus a different label is in order, for clarity.

      • Lundin says:

        It does quite a lot reassemble the classic “On Error GoTo” from old Basic/Visual Basic. Those languages used the very same goto mechanics as the “de facto” way to do error handling. I suppose it is possible that the programmer who wrote this C code comes from such a programming background.

        Indeed a label name like “finally” might be better, as some kind of attempt to emulate languages with exception handling (try-catch-finally). Personally, I prefer just to pass on an error code to the caller and centralize all error handling elsewhere, outside the specific module.

        • Malcolm says:

          a macro like “#define ERR_CHECK if (err) goto fail;” would also have prevented this error

          • Matt says:

            Macros themselves can be very dangerous… in your example, you have an “if” followed by an immediate executable statement with no braces. This will cause unexpected/unintentional results if the original if-block looked like this:

            if (a == b)
            ERR_CHECK;
            else
            printf(“Success!”);

            With the macro described as above, this would actually cause the “else” block (of the original code) to be tied to the “if (err)” logic statement in the macro, and not the “if (a == b)” comparison as intended.

            This is just a very simple example of where macros can steer a developer toward unintended consequences (especially if they were not the macro creator). Granted, this example would result in a compilation error (two else statements in a row), but you can see how a macro can cause unintended headaches down the road for both compile and runtime errors.

          • ErikCumps says:

            A macro is a good idea and it can be implemented safely.

            My standard way of defining the minimal version of such a macro would be:

            do { if (err) goto fail; } while (0)

    • Ashleigh says:

      I’ll add: Most decent compilers would complain that there is unreachable code. It might only be a warning, but warning should be investigated.

      So it looks like its either a lousy compiler, or warnings were being ignored.

      EITHER way, there are multiple levels of process failure here.

  2. We just put a very similar testcase through our local PC-Lint configuration.
    If the warning 527 is enabled, the “Unreachable code” is reported properly.

    However, this warning level is not very high. The severity of this error should be significantly higher. Unreachable code should alert anyone directly. Some compilers report this, others do not by default. Is there a way for PC-Lint to handle this as a true exception and abort?

    Otherwise I agree with Lundin that a proper unittest would help aswell to avoid surfacing of the problem.

  3. Darrell Wright says:

    Where was the dead code check? Every compiler I’ve used recently will report this.

    • Frank says:

      Only if you turn on -Wunreachable-code. Neither gcc nor clang include it in -Wall, which is counterintuitive at best.

  4. Walt Sellers says:

    I believe a more fundamental mistake was the overall assumptions of the code. It should have assumed failure until success was proven but it seems to assume success until failure is proven. If the code assumed failure, then the bug would have given false failures instead of false successes. The bug would then be noticed swiftly when legitimate network connections failed and user security never compromised.

    The compiler setting for this in Xcode is -Wunreachable-code (strangely not part of the -Wall setting.) This detected the dead code when I tested it in Xcode 5 which was using the Clang compiler. I have since read the GCC compiler had issues with this setting which may be why it was not included in -Wall.

  5. Warning that @walt mentioned should have caught it, but most shops I visit don’t have a no warnings practice in place.

    Had the code been Test-Driven the mistake would never have become a local or world famous bug. This looks like a likely copy/paste mistake. I make copy/paste mistakes all the time, but my tests catch them.

  6. Salvatore Ragucci says:

    static OSStatus
    SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, …)
    {
    OSStatus err;

    for (;;) {
    if (err = SSLHashSHA1.update(&hashCtx, &serverRandom)) {
    break;
    }
    if (err = SSLHashSHA1.update(&hashCtx, &signedParams)) {
    break;
    }
    if (err = SSLHashSHA1.final(&hashCtx, &hashOut)) {
    break;
    }

    return err;
    }
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
    }

    • Eric Shufro says:

      SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, …)
      {
      OSStatus err;

      if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
      {
      /* Log, perform corrective action, translate error code, etc. */
      }
      else if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
      {
      /* Log, perform corrective action, translate error code, etc. */
      }
      else if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
      {
      /* Log, perform corrective action, translate error code, etc. */
      }
      else
      {
      /* All previous checks passed. */
      }

      SSLFreeBuffer(&signedHashes);
      SSLFreeBuffer(&hashCtx);
      return err;
      }

    • munch says:

      That is very elegant! I hadn’t thought of using for(;;) in that way before, but it works great. Thanks for teaching me something new! And, it can be nested for multiple scopes of acquired resources too (with care, of course, but then, it’s probably better to refactor the code into multiple functions).

    • munch says:

      However, that doesn’t fundamentally alter the nature of the bug. if(cond) break; break; would have yielded the same issue. Bracing and proper code review (along with static analysis tools like lint) would have prevented the bug.

      Still, I love the for(;;) approach. 🙂

    • Paul Hills says:

      Or more elegantly…

      . if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) == 0)
      . {
      . if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) == 0)
      . {
      . if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) == 0)
      . {
      . /* Do useful stuff */
      . }
      . }
      . }
      .
      . if (err != 0)
      . {
      . SSLFreeBuffer(&signedHashes);
      . SSLFreeBuffer(&hashCtx);
      . return err;
      . }

      • Cev says:

        Yup, this is the best way to do code like this. First, it allows you to review the code following the path of no error without distractions; it also allows you to write rather symmetrical code where the place to clean up naturally falls before the closing brace of the no-error path; then it helps prevent going into the wrong execution path because basically there is none — either you follow the no-error path or you follow the error path, but cannot do both. Finally, it leaves the error handling out of the way where no one will be distracted with it. Code like this is easier to understand that any other style you might use, and it is the least likely to attract bugs. Code reviews go very quickly and very well when using this style.

        Anytime I see my engineers write code like if() doThis(); if() doThat(); they get to rewrite a little code.
        When they write if() {doThis();} elseif() {doThat();} elseif() they get a piece of my mind and get to rewrite the code. That style drives me crazy because it is the most difficult to follow and inspect easily.

        I don’t know that anyone should be so hard on Apple though. The problem looks to me like an automatic merge problem that was never even brought up to the attention of the programmer by the tool. Braces around the goto would have certainly saved the day, but given the huge body of software created at Apple, does anyone really expect them to catch all the bugs? We have all screwed up production software before. This one was in production for at least a year according to Mr. Barr. Give them credit for catching it before the bad guys did.

        But here is my bold assertion: real men do use goto’s. 😉 I am not afraid of them and there are times when they are the best and right thing to do. But they are just for the pros to use. Don’t do goto’s at home kids if you are too green & naive to know how to use them, or too lazy and use them for expediency.

        Cheers,

        Cev

        • Michael Barr says:

          Apple either has development processes and procedures in place to ensure software quality, through certain cracks of which this very bad code slipped, or Apple does not have adequate software quality processes and procedures. Based on the generally higher-than-average experience I have had with their products, I would like to believe the former. However, if Apple had a world-class software quality process this specific bug should not have slipped through.

          This bug slipped through way too many layers of standard best practices, each of which should have caught it on its own: Compiler warning checks? Static analysis checks? Peer code review? Code coverage analysis? Unit testing? Integration testing? Regression testing?

          That this bug slipped through so many layers, especially because it was in critical security code that should have been more heavily inspected and tested than average code, appears to indicate that Apple doesn’t use some or all of these process layers. That may explain some of the other security and product reliability bugs users have encountered in recent years.

          For example, who writes SSL certificate checking code and doesn’t create a test for a bad certificate. That is what we are talking about here in a nutshell. Not rocket science.

    • Philipp Finke says:

      I’ve never seen anything like this before. There are some points to be considered:

      1. You are creating additional and unnecessary code by using the the loop.
      2. You are creating another chance for a bug by staying inside the infinite loop, but this should be recognized very quick.
      3. This implmementation is wrong at all. You never reach the clean up section when there’s no error. So this one is not an equivalent to the original code.

      Best regards

      Philipp Finke

    • Bernhard Weller says:

      I wouldn’t use an assignment inside the if(), it’s not 100% clear, if that’s really what the programmer wanted to do – maybe he just forgot the second =?
      You’ll get a PC-Lint message about that if I’m not mistaken.
      So maybe the next programmer adds all those missing = and breaks the code.

      Also I don’t find it very descriptive to just use if(err) (at least call it error, better maybe errorCode) err is not a boolean, it’s like asking someone “If apple”, the reply would probably be: “What?”. Sadly it’s not what the C/C++ compiler asks.

      Although it’s more work (but not much):
      err = SSLHashSHA1.update(&hashCtx, &serverRandom);
      if (err != 0) {
      break;
      }

      • Julian Day says:

        Instead of:
        for (;;) {

        }

        It would be better to use:
        do {

        } while(0);

        for similar effect, but this obviously doesn’t loop. A halfway decent compiler will notice that and not generate extra code.

        But agree, this doesn’t actually solve the problem, which would be break; break; in this case. I suspect that would possibly be harder to spot just by reading, especially if the code doesn’t have braces. Even proper indenting would have made the original code look odd.

        I would certainly expect a better level of testing for security critical code, such as this. What a good reason to use a FIPS 140-2 validated cryptographic module: http://csrc.nist.gov/groups/STM/cmvp/documents/140-1/140val-all.htm.

    • Kamil Wroblewski says:

      Using the infinite loop isn’t the best idea to make code more reliable.

      static OSStatus
      SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, …)
      {
      OSStatus err = 0;
      …

      err == 0 && err = SSLHashSHA1.update(&hashCtx, &serverRandom);
      err == 0 && err = SSLHashSHA1.update(&hashCtx, &signedParams);
      err == 0 && err = SSLHashSHA1.final(&hashCtx, &hashOut);
      …

      SSLFreeBuffer(&signedHashes);
      SSLFreeBuffer(&hashCtx);
      return err;
      }

      Elegant or not elegant, just error prone.

  7. Frank says:

    I keep reading peoples’ disgust at Apple’s complete lack of process (i.e. code review). However, when I look at this code, it looks like a merge error. The source control system made an error or a human made an error resolving a conflict, etc. A code review wouldn’t catch this if the code was reviewed before it was merged into the baseline.

    • Angelo says:

      I agree with your point Frank. I think any large scale software development effort is likely to experience merge errors. It is expensive to review the same code multiple times, especially when minor changes are made during testing and incremental development. This further highlights the need for a comprehensive development process that includes multiple opportunities for problem detection, at all stages of development. Diligent use of code coverage analysis tools and the like would have caught this one.

    • Daniel Klein says:

      On the other hand a continous integration system with automatic static code analysis and/or module tests would catch this. This is *exactly* why we have these tools.

    • Bernhard Weller says:

      How can you mark a code as fully reviewed if the merge happens after the review? Surely the new merged code has to be reviewed again?

    • Jeff Ratcliff says:

      It’s totally out of vogue these days but there are still some benefits to the old school VCS approach. Exclusive locks on files. Merging is something done offline as an agreement between developers. It works pretty well with small groups of co-located developers. Sometimes unilateral automatic merging doesn’t eliminate a hashing-out discussion with another developer, it just delays it.

  8. Tausif K says:

    Now I know my housemaid is qualified enough to work at Apple.

  9. Daniel Klein says:

    I do not like the solution provided by Salvatore. Using do {…} while(false) with breaks or for(;;){… return x;} with breaks is just using gotos-in-disguise. I would rather people structure code properly and not use language hacks to give them gotos thet won’t be cought by a static code analysis tool.

    I’m happy that this article is also filed under ‘ethics’ category. It’s simply unethical to deliver code without using the most basic verification tools that are available. Even free tools (and compilers, as Walt pointed out) catch unreachable code. This could be easily missed in a hurried review, but still… this is just too basic of a bug to be unnoticed for a year.

  10. Paul Hills says:

    I suspect the double goto line probably came from a merge operation on their Version Control System – I’ve had similar looking things since our way of working is to create a new branch in the VCS, implement the new feature, then auto-merge the branch back to the trunk. The auto-merge (especially in Subversion) can result in this sort of thing sometimes.

    However!
    1. My IAR compiler reports:
    “Warning[Pe111]: statement is unreachable”
    2. My PC-Lint reports:
    “Info 801: Use of goto is deprecated”
    “Warning 539: Did not expect positive indentation from line 609”
    “Warning 527: Unreachable code at token ‘if'”
    3. My code coverage tool identifies that lines after the second goto are never run.

    Even if this did get past a code review (and reviews are not 100% guaranteed to find anything of course), and their tests didn’t find it (which is possible I suppose), then the fact that their software process allowed code to be released with compiler warnings and static analysis errors is rather worrisome.

    I suspect, however, this may have resulted from a last minute “We need to get this feature in straight away, we’re shipping tomorrow, we’ll bypass the process just this once!” type scenario. Anyone here who can admit to never having done anything like that before in their career can look as smug as they like. The rest of us will just smile.

  11. Salvatore Ragucci says:

    for (;;) { if (cond) break; break;} is indeed just a goto in disguise, the point being that goto is perfectly acceptable in ‘do-or-die gauntlets’, as I like to call them. But ever since EWD’s advice has been distorted into outlawing the goto by elevating form over substance, I find that the for (;;) idiom is a useful in that reviewers will pause long enough to verify the correctness of the code fragment after instantly comprehending what is going on.

    What I would love would be to do away with the if and the braces and have something like perl’s
    for (;;)

    (err = SSLHashSHA1.update(hashCtx, signedParams)) && (last);
    (err = SSLHashSHA1.final(&hashCtx, &hashOut)) && (last);

    last;
    }

    Setjmp/longjmp anyone?

    • Michael Barr says:

      I do not think the loop is an improvement in any way.

    • Salvatore Ragucci says:

      Above you state that improvements would include adding braces, Rule 1.2.a, and eliminating gotos, Rule 1.2.c. The for (;;) do-or-die gauntlet does makes both improvements and adheres to the orthodoxy of the coding standard (as near as I can tell).

    • Daniel Klein says:

      >> for (;;) idiom is a useful in that reviewers will pause long enough
      >> to verify the correctness of the code fragment after instantly comprehending what is going on.

      While I find this argument a very amusing psychological hack, it does not convince me. People will eventually get used to seeing this construction in the code and you will lose the element of surprise, ending up with the same problem. s/goto/break/g
      I would still preffer a cascade of if (noError) {…} blocks over the loop solution – it’s clean and avoids both deep-nesting and any form of goto.

  12. Karl Tubalkain says:

    I find this error to be primarily a formatting error. If the second goto was at the same level of indentation as the if statement that precedes it, then the error would have jumped out at anyone reviewing the code. The way it is indented makes it easy to miss, especially for the casual reviewer or the author.

  13. Any verification routine should assume failure until success is found. Rather than using err as a return value from individual tests, it should have been set with an error code. In that case, a branch to “fail” would cause the routine to incorrectly fail rather than incorrectly succeed. Better code is:

    OSStatus err = FAIL, ret;

    if (ret = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
    goto fail; /* Bogus code */

    err = OK;

    fail:

    if (err == FAIL)
    return ret;
    else
    return OK;

    I find using contrived code “for (;;)” or “do { … } while (0);” to avoid a goto to be more confusing than helpful. Especially if the intent of the “for loop” or “do/while loop” is that no looping is to be performed. All you have done is translate “goto” into “break”. Even this doesn’t work if there is any nesting of loops, since C doesn’t have non-local break statements.

    The idea of using setjmp/longjmp for a simple error exit from a function is overkill.

    I’ve seen a number of programs where the compiler generated warning messages but the programmers (and the project) ignored these warnings, because they were not errors. I’ve spent many hours debugging failures which would have been avoided if the original programmer resolved the warning message. I think of it like ignoring “bridge out ahead” signs on the road; if you keep going, something bad is likely to happen. Some projects set compiler flags to treat all warnings as errors.

    All of the comments about test and validation seem on point. If there’s an obvious failure mode, there should be a test which exercises it. If this got through QA, it shouldn’t have. Then again, I’ve worked with QA test suites which would pass a program even if it crashed, since it didn’t generate an error message.

  14. What’s wrong with this? No goto’s, no loops, easy to read.

    SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, …)
    {
    OSStatus err;

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    {
    ;
    }
    else if((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    {
    ;
    }
    else if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    {
    ;
    }
    else{
    ;
    }

    if(err != 0){
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
    }

    • Kamil Wroblewski says:

      This solution is great until you want to handle specific errors in this routine. Otherwise you’ll have to leave conditional blocks empty (what is still better than gotos, but it looks suspiciously). And there is also an issue when you want to do something more between subroutine/method calls. But it’s a pretty neat solution with a one catch: with the usage of “elses” at first glance it seems that only first subroutine/method is called and others are skipped. But still it’s pretty clean in this specific case.

      • Kamil, thank you very much for your comment. But I don’t fully understand your first sentence (“This solution is great until you want to handle specific errors in this routine.”) Would you be kind enough to expand on this? What do you mean by specific errors? It seems to me the conditional code block available when each test finds an error can be used to handle each specific error. Unless I am misunderstanding what you mean by “specific error”? Are you referring to a case in which you would want to handle each error generated by ALL the tests (rather than just fail out when you find at least one error)?

        In this particular case this solution seems perfect to me. In the more general case, it has the limitation that it only finds the first error and 2) it assumes success, which might or might not be a weakness. To me, it seems that its great strength is that there are no gotos or breaks, which I dislike.

        It is certainly true that you cannot do anything between the function calls.

  15. Phil Ouellette says:

    All the posters are ignoring the main takeaway of this for me.

    All blocks of code that run conditionally must be surrounded by brackets. This includes one line blocks of code. I even do this for simple one line if statements. I really hate seeing the lazy kind of coding shown in the faulty code. There is zero runtime penalty to doing this correctly and doing so totally eliminates entire classes of errors. Playing tricks with loops that don’t is no solution.

    This is just like counting on precedence in evaluating statements in logical expressions. My rule is multiplication/division before addition/subtraction and everything else is controlled by the use of braces (). I sometimes use indenting on a complicated logical expression in order to convince myself I am doing what I really want to do.

    White space is free people.

  16. Phil Ouellette says:

    And no production software should be released until it compiles without any warnings. If you really have to do something the compiler doesn’t like then verify you are in fact doing the right thing and figure out a legal way to suppress the warning.

    Often eliminating warnings is as simple as defining a variable type consistently so there aren’t unnecessary variable type conversions being done.

    Or if it is the case of not using a parameter passed to a function that is prototyped elsewhere and you can’t modify it. Screw it, do something meaningless with the parameter and add a comment to the code that you are doing this just to suppress a compiler warning.

    • Bernhard Weller says:

      I treat compiler warnings like PC-Lint messages, there have to be none, if the code is finished.

      Of course, there are cases where the messages don’t really apply. Or you can’t really do it in another way. Or the way of doing it so that the message goes away is so complicated, that it’s likely to introduce more bugs than problems are solved.

      So for that cases you can suppress individually and with fine granularity the messages (for a single line of code). Do that with a nice comment why exactly you think that it’s okay to break the rule – in a review these suppressions are checked and then it should be fine. Just don’t use it light-heartedly, because then you end up with just the same code sprinkled with suppression comments.

      For example I got messages about undefined access order for volatile variables in one statement. Actually it was a pointer working on the flash with constant values – until something new is written there in our calibration process. To make the compiler not optimize away the computation with the actual values in the flash, the pointer was declared volatile. But the values would never change while the computation was done, so in that case I could suppress the warning – of course reenabling it after those lines.

  17. wtf2014lol says:

    Oh my, some one used a “GOTO”, it must be the end of the world, LOL. Either remove it from C/C++ or STFU.

  18. Tyler Durden says:

    Something fishy here. The extra goto will make the function fail every time. Thus SSLVerifySignedServerKeyExchange will never succeed. That’s not a security breach. That’s a non-functional system. Because it will NEVER VERIFY the security test! I smell some FUD!

    • Kamil Wroblewski says:

      Until any of “update” methods return an error it’ll never fail. “err” variable is set to “0” after calling “update” methods, so this subroutine also returns “0” which means no error.

      • Tyler Durden says:

        Good point. I assumed that there is other “success” code at the end of the gauntlet that would get skipped thus causing it to fail. Looking at it again I am in agreement with you, thanks for pointing that out.

  19. Tyler Durden says:

    Goto is proper in a do-or-die gauntlet, more readable than using breaks, which are often worse than goto’s in long multi-nested loops in figuring out where you are “breaking” to. At least the goto has an explicit label it goes to, unlike break. Some of the worst code I have ever seen is large nested switch statements with a zillion breaks!

    Also, please, don’t ever use for(;;), when a simple while(1) is so much more readable!

    And never, ever use
    while(count– > 0)

    instead use
    while(count > 0)
    {
    count–;

    Senior programmers have debated me how their clever implicit brevity (top version) is superior to explicit (bottom version). Not so! Implicit brevity is the enemy!

    • Bernhard Weller says:

      Which reminded me of a good laugh at stackoverflow:
      http://stackoverflow.com/questions/1642028/what-is-the-name-of-this-operator

      Don’t confuse people with implicit operations. The next question which will pop up is when the — is actually performed, before or after the check (certainly something you should know). So it’s better if you do it in an explicit way, so even the most junior programmer can understand what you are trying to do.

      Remember that using people without any background on your product are often the best to find bugs? Maybe do the same in a code review (the only real measure is WTFs/min).

  20. Tyler Durden says:

    One final comment. In many cases there is no right or wrong answer, only opinions. And you know what they say about opinions… (if you don’t know, try googling “opinions are like”)

    Anyway, I completely agree with the premise that they should have made better use of code checking tools! And having company standards for coding is important – not that one method is right and another wrong, but so that everyone is doing things the same way, which is important.

  21. Daniel in Irvine, CA says:

    Lint or code review should have caught this bug easily, so I think a sly hacker slipped in this bug as a backdoor. This is why I always warn people that any OS or application that copies code from open source projects has high security risk.

  22. Ricky Bennett says:

    “if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)”

    Another rule broken that I didn’t see in the comments is setting a variable inside an if. This is so prone to bugs and misunderstanding. Readability goes into freefall here. “Did you mean = or ==?” or “Oh, looks like Bob missed an = here. I’ll just change it to == for him. He’ll thank me later.”

    err = SSLHashSHA1.update (&hashCtx, &serverRandom);
    if (err != 0)

    And how about sprucing up the error codes?

    typedef enum Error_e
    {
    SUCCESS,
    FAILURE, // Additional errors here
    } Error_t;

    Now we have:

    Error_t err;
    err = SSLHashSHA1.update (&hashCtx, &serverRandom);
    if (err != SUCCESS)
    {

    Readability is key in my office. I typedef everything I can get my hands on. Two weeks ago I wrote a stepper motor controller that contained an A4988 driver and an ARM Cortex-M3 PWM timer driver, both written from scratch. 2500 lines of code (including comments), and only 2 compiler errors. I fixed the errors and the controller ran perfectly first time!

    If you write the code for readability, you will have taken a big step towards writing bug free code. Don’t worry about trying to combine lines to make tighter code, that’s the optimizer’s job. Make it READABLE!

    And yes, I have “Embedded C Coding Standard”. Best book I’ve ever bought!

    • Michael Barr says:

      Good point about the assignment. The Embedded C Coding Standard prohibits that as well. I chose not to mention it as it was not where or how the bug occurred in this instance. But you are absolutely right that a lot of preventable bugs do happen that way.

    • What’s the point of the typedef? I certainly agree with you on setting a variable in an if.

      • Using typedefs prevents errors by limiting what you can assign to a variable. For example, “Error_t err = 37;” will cause at least a compiler warning to be output because err can only accept SUCCESS or FAILURE, not the value of 37. All my structs and enums are typedef’d to make sure that the variable will only accept a value of the same type. It also makes things a lot more readable.

    • Bernhard Weller says:

      The only thing I don’t like about enums is, that the actual size in bits is compiler specific, which can cause trouble using the same enum on different machines (when you need to transfer the value via UART or something).

      As a workaround for that I use static const variables, which sadly reduces the type safety, but at least combines readability with defined size.

      I’m looking forward to typed enums, but C++11 seems still a bit off for embedded stuff (at least from IAR, which is the suite we are using).

  23. stefano says:

    I agree with the comments about the lack of best practices in the SW development process which lead to leave this bug undetected in production code.
    However, having a look at the code itself I think the error originates from the fact that the function tries to do too many things.
    The function prototype is:
    static OSStatus SSLVerifySignedServerKeyExchange(…)
    Then the function should do just what it says: “verify”.
    Specifically, it should not do any “free buffer”, which should be delegated to the caller (why would it return the status otherwise?)

    A possible solution:

    – extract a function which just frees buffers:
    static void SSLFreeBuffers(…..)
    {
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    }

    – modify existing function as follows:
    static OSStatus SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams, …)
    {
    ….

    OSStatus err = SSLHashSHA1.update(&hashCtx, &serverRandom);
    if(err != OSStatusNOERR) { return err; }

    err = SSLHashSHA1.update(&hashCtx, &signedParams);
    if(err != OSStatusNOERR) { return err; }

    err = SSLHashSHA1.final(&hashCtx, &hashOut);
    if(err != OSStatusNOERR) { return err; }

    return OSStatusNOERR; // verified: no error found
    }

    – finally, where the above verify function is called, its return value should be used, eaxmple:
    if(SSLVerifySignedServerKeyExchange(…)) != OSStatusNOERR)
    {
    ….
    SSLFreeBuffers(…);
    ….
    }

    The disadvantage of this code is that it may create some code duplication in case the verify function is called in more places.
    The biggest advantage of this code is *readability*.

Leave a Reply