embedded software boot camp

Turning automatic code generation upside down

Tuesday, February 14th, 2012 by Miro Samek

Much ink has been spilled on the Next Big Thing in software development. One of these things has always been “automatic code generation” from high-level models (e.g., from state machines).

But even though many tools on the market today support code generation, their widespread acceptance has grown rather slowly. Of course, many factors contribute to this, but one of the main reasons is that the generated code has simply too many shortcomings, which too often require manual “massaging” of the generated code. But this breaks the connection with the original model. The tool industry’s answer has been “round-trip engineering”, which is the idea of feeding the changes in the code back to the model.

Unfortunately, “round-trip engineering” simply does not work well enough in practice. This should not be so surprising, considering that no other code generation in software history has ever worked that way. You don’t edit by hand the binary machine code generated by an assembler. You don’t edit by hand the assembly code generated by the high-level language compiler. This would be ridiculous. So, why modeling tools assume that the generated code will be edited manually?

Well, the modeling tools have to assume this, because the generated is hard to use “as-is” without modifications.

First, the generated code might be simply incomplete, such as skeleton code with “TODO” comments generated from class diagrams. I’m not a fan of this, because I think that in the long run such code generation is outright counterproductive.

Second, most code generating tools impose a specific physical design (by physical design I mean partitioning of the code into directories, and files, such as header files and implementation files). For example, for generation of C/C++ code (which dominate real-time embedded programming), the beaten path is to generate <class>.h and <class>.cpp files for every class. But what if I want to put class declaration in a file scope? Actually, I often want to do this to achieve even better encapsulation. A typical tool would not allow me to do this.

And finally, all too often the automatically generated code is hard to integrate with other code, not created by the tool. For example, a class definition might rely on many included header files. But while most tools recognize that and allow inserting some custom beginning of the file, they don’t allow to insert code in an arbitrary place in the file.

But, how about a tool that actually allows you to do your own physical design? How about turning the whole code generation process upside down?

A tool like this would allow you to create and name directories and files instead of the tool imposing it on you. Obviously, this is still manual coding. But, the twist here is that in this code you can “ask” the tool to synthesize parts of the code based on the model. (The “requests” are special tags that you mix in your code.) For example, you can “ask” the tool to generate a class declaration in one place, a class method definition in another, and a state machine definition in yet another place in your code.

This “inversion” of code generation responsibilities solves most of the problems with the integration between the generated code and other code. You can simply “ask” the tool to generate as much or as little code as you see fit. The tool helps, where it can add value, but otherwise you can keep it out of your way.

The idea of “inverting” the code generation is so simple, that I would be surprised if it was not already implemented in some tools. One example I have is the free QM tool from my company (http://www.state-machine.com/qm). If you know of any other tool that works that way, I would be very interested to hear about it.

4 Responses to “Turning automatic code generation upside down”

  1. Lundin says:

    Following these arguments, why do automatic code generators give us C/C++ and not raw machine code? If I’m not supposed to change anything in the generated code, then it really doesn’t make any sense to have it handed in a high level language. There are two reasons why anyone would want it like that, either 1) they expect to make manual changes to the code somehow, and/or 2) they don’t trust the tool and want to verify that it does what it is supposed to do.

    • Miro Samek says:

      I think that when the automatic code generation technology matures and gains more widespread acceptance, the tools could indeed generate machine code directly.

      But, these are early days yet and model-to-code generation follows pretty much exactly the same trajectory as all other code generation technologies in the past. For example, most early C compilers generated assembly code (in fact, many embedded compilers still do). The early C++ compilers were all based on “cfront”, which compiled C++ to C. And so on.

      Such gradual, stepwise approach has many obvious advantages. It allows people to get used to the “new” by seeing how it turns into the old and familiar first. It allows leveraging the existing tools. A young technology cannot cover all the bases at once. For example, by generating portable C/C++ the code generator can address many more processor types than by generating specific machine code.

      The intermediate step of C/C++ also allows the developers to use the existing debuggers. This is not quite ideal, because debugging at the C/C++ level is below the model level. But in practice the inconvenience depends very strongly on the type of the generated code. For example, if such code uses compressed hexadecimal state-tables to represent state machines (e.g., IAR visualSTATE), you obviously have no chance of bridging the semantic levels.

      But if the code is designed upfront to be human-readable, you can quite easy see the model structure from the code. This is exactly the approach taken in the QM tool, which is based on the QP framework. QP has been originally designed for manual coding without “big tools” (see my first book “Practical Statecharts in C/C++”, published in 2002). But it turns out that QP makes also an excellent target for automatic code generation. On top of this, QM adds special comments to the generated code, which cross-reference the code snippets to the model. While debugging the application, you can simply copy the closest such comment to the Clipborad and paste it to QM. QM then will immediately locate the corresponding model element, open the diagram and highlight the class method, state, transitions, guard, or whatever that is. With this simple method you almost debug at the model level.

  2. Jean says:

    Automatic code generation is used in the c/c++ preprocessor already #define()/#include …, in Lisp since decades. Most scripting languages enable meta or macro programming. QM is definitely a tool on the right track. Once c/c++ -source will be just another option beside bytecode, assembly, executables and others.

    In future I see the complete program as an abstract syntax that is displayed as state machine, c-source, graphs or what ever representation is best suited for the current task.

    As long as the involved tools preserve the hierarchy of abstraction of the program one can create state machine templates, HTML handlers/generators, DSL , documentation etc.

  3. Anders says:

    Hi Miro,
    Discovered this post a bit late, but anyway…

    First: As I’m the product manager of the IAR visualSTATE product I would like to add a bit of information to the paragraph about our code generation. The user has the option to choose between table-based code generation(which is obviously a bit ‘difficult’ to decipher…) and what we call “readable code” which is a straight translation of the state charts into switch and if-statements. What is a bit surprising is that a large majority of our users stick to the table-based code…
    When asked about this they often answer that they don’t care about the generated code because they never have a need to change the code. Further, the code size needed for the ‘driver’ code to interpret the tables are well below 1k if compiled with a modern compiler. Typical numbers are more in the 400-600 bytes range and can be as low as <300 bytes under certain circumstances.

    Second: The question about generating C/C++ or directly to assembly language has a quite obvious answer, at least for me: Who would ever want to support a modeling tool with code generation capabilities if the generated code should be optimized for speed or size for several target CPU's. (Or even just one?)
    I actually came across a customer a few years ago that had their own tooling for translating UML to assembly language for an 8-bit controller; but I see that as a rare exception and the company was not too happy about the situation due to the lock-in effects on hardware choices, the maintenance costs etc.

Leave a Reply