embedded software boot camp

Insects of the computer world

Monday, March 9th, 2009 by

The recent Jack Ganssle’s “Breakpoints” blog on Embedded.com makes an excellent point that the same forces (the Moore’s law), which drive down the prices of high-end processors open even more market opportunities at the low-end of the price spectrum. I also agree that the most deciding factor for the price of a single-chip microcontroller (MCU) is the efficiency of its memory use, in other words, the code density. This becomes obvious when one looks at the silicon die of any MCU, which is completely dominated by the ROM and RAM blocks, the CPU being almost insignificant somewhere in the corner.

But, I would disagree with Jack’s statement that “tiny (8-bit) processors make more efficient use of memory”. From my experience with several single-chip MCUs I draw a different conclusion: the CPU size (8-, 16-, 32-bits) almost doesn’t matter for the code density. The deciding factor is how old a design is, whereas the newer instruction set architectures (ISAs) generally far outperform the older ISAs.

To support the point, I present below a table that shows the code size of a tiny state machine framework written in C (called QP-nano), which has been compiled for a dozen or so very different single-chip MCUs. The code consists of a small hierarchical state machine processor (called QEP-nano), and a tiny framework (called QF-nano). The QEP-nano consists mostly of a conditional logic to execute hierarchical state machines. QF-nano contains an event queue, a timer module, and a simple event loop. I believe that this code is quite representative to typical projects that run on these small MCUs.

CPU type          C Compiler         QEP-nano   QF-nano

(bytes)   (bytes)
PIC18                MPLAB-C18         3,214     2,072

(student edition)

8051 (SiLabs)      IAR EW8051            952       603


PSoC (M8C)        ImageCraft M8C       2,765     2,425


68HC08          CodeWarrior HC(S)08       957      660


AVR (ATmega)     IAR EWAVR                541      650


AVR (ATmega)      WinAVR(GNU)             998      810


MSP430           IAR EW430                552      460


M16C             HEW4/NC30                984      969


TMS320C28x       C2000               369 words 331 words (Piccolo)                            738 bytes 662 bytes


ARM7(ARM/THUMB)  IAR EWARM          588(THUMB)  1,112(ARM)


ARM Cortex-M3    IAR EWARM          524         504



Interestingly, the winner is MSP430, which is a 16-bit architecture.
It seems that the 16-bit ISA hits somehow the “sweet spot” for the best code density, perhaps because the addresses are also 16-bit wide and are handled in a single instruction. In contrast, 8-bitters need multiple instructions to handle 16-bit addresses.

I would also point out the excellent code density (and C-friendliness) of the new ARM Cortex-M3, which is a modern 32-bit ISA, and still far outperforms all 8-bitters, including the good ol’8051.

On the other hand, the venerable PIC architecture is by far the worst (or, C un-friendly). That’s interesting, because this is the 8-bit market leader. I honestly don’t understand how Microchip makes money when their chips require the most silicon for given functionality. Clearly some other forces than just technical merits must be at work here.

In conclusion, I understand that my data is highly subjective and different code sets (and different compilers) could perhaps produce different results. However, I believe that the general trend is true and this is an important lesson for engineers selecting MCUs.

10 Responses to “Insects of the computer world”

  1. Bryce Schober says:

    Wow, that’s pretty eye-opening!and:s/EQP-nano/QEP-nano/s/framework writing in C/framework written in C/

  2. GregK says:

    PIC18 is not modern architecture, I think (I am not expert) there is not fully support for stack in RAM.But You did not provide optimalization options what is very important, special in PIC18.I wonder how it looks on PIC24, I believe it will be much better then PIC18, special with -Os options. If I know how You did this test I can try do this for PIC24.

  3. Miro Samek says:

    Yes, Greg brings up an important point. I have been using the Student Edition of the MPLAB-PIC18 compiler, which does *not* allow most optimizations.But still, even if the code size were to improve 100% (which I doubt), PICmicro would still be the second worst CPU from the whole pack as far as code density is concerned. It is just mindboggling how bad the old 8-bit PIC is…PIC24 is a newer 16-bit ISA and according to my claim should fare much better than the old 8-bit PIC. In fact, one of the posts to the discussion forum at Embedded.com provides some benchmark data for PIC24. Please check the comments to the “Small is Beautiful” blog at http://www.embedded.com/design/215801305.

  4. GregK says:

    I have just read your comment from embedded.com, quote:”(…) In this context, the ROM size versus cost for an 8-bit PIC looks like a great bargain, but remember that 1KB of ROM in the PIC is really worth only as much as 200 bytes of ROM in MSP430.”I wonder how it change if we consider long term usage of ROM, what about 1bit data corruption in flash memory after let’s say 5years. 1bit is enough to break down all program.In the same flash technology we have 4/5 more probability that our chip will be useless after X years.(I do not know too much about silicon, is this consideration relevant or not?)

  5. Miro Samek says:

    Poor code density is bad in every way you look at. If you worry about flash ROM data retention, the probably of flash failure is proportional to the die area taken by the flash (assuming equivalent process technology, which must be pretty much the same for all silicon vendors if they want to stay competitive). So the probability of bits falling off the flash is roughly 5 times worse in PICmicro-based MCU than MSP430-based MCU implementing the same functionality in software.

  6. GregK says:

    It is very interesting subject, and there is something alarming me in this table.Every test with IAR’s tool is quite good. just look at ATmega results with IAR and GNU tools. It is huge different! Some time ago I have worked with IAR and MSP430, after some strange thinks when code size rapidly change size of 30% during small change! I realize (and proof) that IAR’s linker absolutely brilliant remove unreferenced functions from end code (including nested references from function called function etc..), as well as unreferenced variables from RAM (any size, also big buffers) if not referenced or volatile. In additional IAR’s compiler is really good commercial compiler, specially designed for such job like squeeze code size.When I started work with tools based on GCC I can not find such feature in linker, only compiler can remove unreferenced static functions and variables from module scope.Supposedly every line of IAR’s tool have that feature.TMS320 c2000 result looks also really not bad. I know from experience (actually some DSP architecture) that compilers from TI has nice feature: with the highest level optimization compiler treat all modules in project like one file! what give compilator the same possibility to remove all unreferenced stuff from code (RAM and ROM), it is a little different approach to IAR’s tool where that was done on linking stage.My doubts are: if this test is really relevant if we compile only framework, without project what really use this framework. I do not know how this test looks so it is my doubts.Everything apart of IAR’s and TI’s tools looks really poor.Is really strange that IAR can save 810-650 = 160B of RAM in the same project. it is huge amount of RAM! in this case!!What normally we can observe in changing level of optimization and/or switching compiler is simply change ROM size with really, really small change in RAM size, if any (do not consider stack of course). How IAR saved 160B of ram in ATmega project is really enigma for me :).

  7. Miro Samek says:

    Greg,I think you confuse the QF-nano codesize column with RAM consumption. My table does *not* contain any RAM footprint data, because as you correctly observe, compiler cannot do much about the RAM consumed by the application.I also experienced the phenomenon you described that a small change in the source code resulted in a disproportionate change in the generated code-size, when the highest-levels of optimizations were used with the IAR compiler. Actually, I have examined the generated machine code and found out that IAR has very aggressive common-expression elimination policy. In the case of the QF-nano framework, there is a repetition of a portion of the scheduler code. The repetition could be eliminated by using explicit “goto” statements, which I didn’t want to do, because it violates MISRA rules. But the IAR compiler noticed the commonality in the source, and generated code as if the gotos were there. This is pretty cool. This also explains why a small change in the code destroys or creates sometimes the opportunity to eliminate a common expression (the common part becomes smaller or bigger, depending what you do).In summary, really tight code is the result of cooperation between the compiler and the linker. Either one alone cannot do a really optimal job. Some of this cooperation is missing in the GNU toolchain. Commercial compilers are apparently getting better in this respect.–Miro

  8. GregK says:

    Indeed I confuse column’s there is not RAM record in second column!I am Sorry.It is good that my holiday just started, seems in time :).

  9. paul says:

    I think something really should be said about RAM, in particular the cost of using 32-bit pointers on small ARM systems where 16 bits would be enough for the actual amount of memory present. I guess the compilers and instruction set handle 16-bit integers ok on the ARM, though I’m not familiar with the architecture enough to be sure of this. I’d also like to see some actual power consumption measurements, since the ARM marketing department is working overtime to make claims that seem a bit suspicious to me. Yes the Cortex M0+ might be more powerful efficient than an 8 bitter per unit of computation, but the absolute consumption actually matters, or else the ARM itself is still behind a 100+ watt Ivy Bridge or Nvidia GPU workstation chip. I have a Casio wristwatch with all kinds of functions (computes the phase of the moon and stuff like that) which runs for 10+ years on a coin cell. I don’t know what kind of cpu it uses, but I’d like to know if it’s possible to do something comparable with an ARM.

  10. […] engineers meant us to say, general purpose registers or shortly GPR). Pic is renowned for its code density, or, to put it better is known for the lack of thereof. The Harvard architecture makes things even […]

Leave a Reply