embedded software boot camp

Minimizing memory use in embedded systems tip#2 – Be completely consistent in your coding style

Friday, September 4th, 2009 by Nigel Jones

This is the second in a series of postings on how to minimize the memory consumption of an embedded system.

As the title suggests, you’ll often get a nice reduction in code size if you are completely consistent in your HLL coding style. To show how this works, its necessary to take a trip into assembly language.

When you write in assembly language you soon find that you perform the same series of instructions over and over again. For example, to add two numbers together, you might have pseudo assembly language code that looks something like this:

LD X, operand1 ; X points to operand 1
LD Y, operand2 ; Y points to operand 2
LD R0,X        ; Get operand 1
LD R1,Y        ; Get operand 2
ADD            ;
ST R0          ; Store the result in R0

After you have done this a few times, it becomes clear that the only thing that changes from use to use is the address of the operands. As a result, assembly language programmers would typically define a macro. The exact syntax varies from assembler to assembler, but it might look something like this:

MACRO ADD_BYTES(P1, P2)
LD X, P1  ; X points to parameter 1
LD Y, P2  ; Y points to parameter 2
LD R0,X   ; Get operand 1
LD R1,Y   ; Get operand 2
ADD       ;
ST R0     ; Store the result in R0
ENDM

Thereafter, whenever it is necessary to add two bytes together, one would simply enter the macro together with the name of the operands of interest. However, after you have invoked the macro a few dozen times, it probably dawns on you that you are chewing up memory un-necessarily and that you can save a lot by changing the macro to this:

MACRO ADD_BYTES(P1, P2)
LD X, P1  ; X points to parameter 1
LD Y, P2  ; Y points to parameter 2
CALL LDR0R1XY
ENDM

It is of course necessary to now define a subroutine ‘LDR0R1XY’ that looks like this:

LDR0R1XY:
LD R0,X  ; Get operand 1
LD R1,Y  ; Get operand 2
ADD      ;
ST R0    ; Store the result in R0
RET

Clearly this approach starts to save a few bytes per invocation, such that once one has used ADD_BYTES several times one achieves a net saving in memory usage. If one uses ADD_BYTES dozens of times then the savings can be substantial.

So how does this help if you are programming in a HLL? Well, decent compilers will do exactly the same optimization when told to perform full size optimization. However, in this case, the optimizer looks at all the code sequences generated by the compiler and identifies those code sequences that can be placed in a subroutine. A really good compiler will do this recursively in the sense that it will replace a code sequence with a subroutine call, and that subroutine call will in turn call another subroutine and so on. The results can be a dramatic reduction in code size – albeit at a potentially big increase in demand on the call stack.

Now clearly in order to take maximal advantage of this compiler optimization, it’s essential that the compiler see the same code sequences over and over again. You can maximize the likelihood of this occurring by being completely consistent in your coding style. Some examples:

  • When making function calls, keep the parameter orders consistent. For example if you call a lot of functions with two parameters such as a uint8_t and a uint16_t, then ensure that all your functions declare the parameters in the same order.
  • If most of your variables are 16 bit, with just a handful being 8 bit, then you may find you get a code size reduction if you convert all your variables to 16 bits.
  • Don’t flip randomly between case statements and if-else-if chains.

Note that notwithstanding the fact that being completely consistent can save you a lot of code space, I also think that code that is extremely consistent in its style has other merits as well, not the least of which is readability.As a final note, does anyone know the formal name for this type of optimization?

Next Tip

Previous Tip

Home

Tags:

8 Responses to “Minimizing memory use in embedded systems tip#2 – Be completely consistent in your coding style”

  1. darkwzrd says:

    "As a final note, does anyone know the formal name for this type of optimization?"I'm guessing "Common Subexpression Elimination", or CSE?

  2. Nigel Jones says:

    That was my first thought. However upon investigation, CSE seems to apply at the source code level – i.e. identifying common sub expressions in the source code. This technique applies at the object code level. As such it appears to be a combination of peep hole optimization + CSE.

  3. Anders says:

    Well, There are several names but the most common are probably cross-jump and cross-call. They differ in what scope they have for the optimization.) One of the first papers on this method as applied to file scope called it a combination of cross-jump and procedural abstraction.

  4. Anders says:

    I forgot to say that this kind of optimization is generally most efficient for 8-bit CPU's and accumulator based architectures. This is because there will often be a lot of common sequences to work with, even if the programmer is inconsistent in his coding style.(The paper I mentioned is named "Analyzing and compressing assembly code" by Fraser et al. from 1984)

  5. Nigel Jones says:

    Thanks for the information Anders – if I'd thought about it I'd have asked you first! Regarding the effectiveness on different architectures. If you are programming an 8051 then pretty much regardless of what you write in a HLL the object code comes out the same – for the simple reason that there really is only one way of doing many things in the 8051. Conversely CPUs that are register rich with orthogonal instruction can do a task in dozens of different ways. As a result I'd expect that being consistent in your coding style would actually be more beneficial with these types of CPUs. Having said that, I've never run a test to support my hypothesis

  6. Will says:

    Threaded is the term you are looking for.

  7. Garry says:

    Would you please give a reference to a HLL language *compiler* that does the “subroutines of subroutines” code compression?

    I have used more than 20 different programming languages (FORTRAN, BASIC, Pascal, C, LISP, Ada, Postscript, Objective-C, C++, Scheme, Java, Erlang, Ruby, Python, JavaScript, several assemblers, several “4GLs”, etc.), and numerous implementations of them, I’ve read and read about about many more programming systems.

    The ONLY language system I have ever seen, or read about, that use the mechanism the article describes are “threaded” or “threaded and knotted” systems, which I believe are rare and unusual. Forth and its progeny is one branch, and postscript another. All that I know of are interpreters, and not compilers.

    • Nigel Jones says:

      Sure. One compiler I use all the time that does it is IAR’s AVR compiler. IAR calls it ‘Cross call’ optimization. You can specify the number of passes the compiler will use. For example, one project of mine compiles down to 42156 bytes with full size optimization except for cross call optimization. If I turn on unlimited cross call optimization, then the size reduces to 37006 – an impressive reduction of 12.2% . The compile time does of course go up slightly.

Leave a Reply to Nigel Jones

You must be logged in to post a comment.