Archive for the ‘Uncategorized’ Category

A Heap of Problems

Sunday, January 24th, 2010

Some design problems never seem to go away. You think that anybody who has been in the embedded software development business for a while must have learned to be wary of malloc() and free() (or their C++ counterparts new and delete). Then you find that many developers actually don’t know why embedded real-time systems are so particularly intolerant of heap problems.

For example, recently an Embedded.com reader attacked my comment to the article “Back to the Basics – Practical Embedded Coding Tips: Part 1 Reentrancy, atomic variables and recursion“, in which I advised against using the heap. Here is this reader’s argumentation:

I have no idea why did you bring up the pledge not to use the heap, on modern 32-bit MCUs (ARMs etc) there is no reason – and no justification – to avoid using the heap. The only reason not to use the heap is to avoid memory fragmentation, but good heap implementation and careful memory allocation planning will overcome that.

As I cannot disagree more with the statements above, I decided that it’s perhaps the time to re-post my “heap of problems” list, which goes as follows:

  • Dynamically allocating and freeing memory can fragment the heap over time to the point that the program crashes because of an inability to allocate more RAM. The total remaining heap storage might be more than adequate, but no single piece satisfies a specific malloc() request.
  • Heap-based memory management is wasteful. All heap management algorithms must maintain some form of header information for each block allocated. At the very least, this information includes the size of the block. For example, if the header causes a four-byte overhead, then a four-byte allocation requires at least eight bytes, so only 50 percent of the allocated memory is usable to the application. Because of these overheads and the aforementioned fragmentation, determining the minimum size of the heap is difficult. Even if you were to know the worst-case mix of objects simultaneously allocated on the heap (which you typically don’t), the required heap storage is much more than a simple sum of the object sizes. As a result, the only practical way to make the heap more reliable is to massively oversize it.
  • Both malloc() and free() can be (and often are) nondeterministic, meaning that they potentially can take a long (hard to quantify) time to execute, which conflicts squarely with real-time constraints. Although many RTOSs have heap management algorithms with bounded, or even deterministic performance, they don’t necessarily handle multiple small allocations efficiently.

Unfortunately, the list of heap problems doesn’t stop there. A new class of problems appears when you use heap in a multithreaded environment. The heap becomes a shared resource and consequently causes all the headaches associated with resource sharing, so the list goes on:

  • Both malloc() and free() can be (and often are) non-reentrant; that is, they cannot be safely called simultaneously from multiple threads of execution.
  • The reentrancy problem can be remedied by protecting malloc(), free(), realloc(), and so on internally with a mutex, which lets only one thread at a time access the shared heap. However, this scheme could cause excessive blocking of threads (especially if memory management is nondeterministic) and can significantly reduce parallelism. Mutexes can also be subject to priority inversion. Naturally, the heap management functions protected by a mutex are not available to interrupt service routines (ISRs) because ISRs cannot block.

Finally, all the problems listed previously come on top of the usual pitfalls associated with dynamic memory allocation. For completeness, I’ll mention them here as well.

  • If you destroy all pointers to an object and fail to free it or you simply leave objects lying about well past their useful lifetimes, you create a memory leak. If you leak enough memory, your storage allocation eventually fails.
  • Conversely, if you free a heap object but the rest of the program still believes that pointers to the object remain valid, you have created dangling pointers. If you dereference such a dangling pointer to access the recycled object (which by that time might be already allocated to somebody else), your application can crash.
  • Most of the heap-related problems are notoriously difficult to test. For example, a brief bout of testing often fails to uncover a storage leak that kills a program after a few hours, or weeks, of operation. Similarly, exceeding a real-time deadline because of nondeterminism can show up only when the heap reaches a certain fragmentation pattern. These types of problems are extremely difficult to reproduce.

Cute Creator

Tuesday, April 28th, 2009

For a long time I’ve been looking for a good cross platform development environment that would allow fast exploration and navigation of C/C++ source code, not just editing of individual files. For a while I though that Eclipse will fit the bill, but as I wrote previously, the CDT (C/C++ Development Tooling) was really disappointing for me.

In this post I’d like to tell you about my recent big hope for a truly productive IDE, which is the Qt Creator from qtsoftware.com. Qt Creator is based on the popular cross-platform Qt framework and runs natively on Windows, Linux, BSD, Mac OS X, and some embedded platforms. No Java (as in the case of Eclipse) means speed and snappy interface. Qt Software (previously Trolltech, acquired in 2008 by Nokia) offers free downloads of Qt Creator for all major platforms.

Qt Creator is primarily targeted as the IDE for Qt-related development. However, the recently released version 1.1 (April 23, 2009) supports external projects, so adding your embedded or any other projects unrelated to Qt is easy.

For example, I’ve created an embedded project for a “game” shown in the screen shot below (click on the image to see it full-size):

QtCreator

QtCreator

The editing surface maximizes the screen real-estate for file viewing and supports sophisticated splitting, so that my favorite side-by-side code editing is easy.

As shown in the left pane, you can add to your project as many files in different directories as you like. Given this information, Qt Creator builds an internal database of all symbols in your code to allow you exploring and navigating through your source code quickly. For example, you can jump from symbol usage to its definition by pressing F2 (press Alt-back-arrow to jump back to the previous context).

Everything in the editor is designed to enhance quick navigation. For example, every editor pane has a drop-down list of functions and other elements in the file. The editor also supports selective viewing with collapsible/expandable code sections, so you can fit more information on the screen. To quickly view the collapsed section you can simply hover your mouse cursor over it.

I immensely like the support for project-wide searching (as well as search-and-replace), which is available at the bottom of the screen. This feature alone is worth installing the tool.

Even though it is so new, Qt Creator is already very interesting, free, cross-platform IDE with features comparable to Visual Studio 2008 and other best-in-class tools. Qt Software seems very committed to enhancing Qt Creator and I hope that Qt Creator will soon catch up with Eclipse as third-party plug-ins will be developed. One feature that I will be looking forward to is side-by-side code differencing. But already, it is a powerful, free, cross-platform tool that you should try.

Make the most of side-by-side code differencing

Wednesday, June 11th, 2008

I’m constantly amazed how many developers shoot themselves in the foot by defeating the benefits of side-by-side source code differencing, which is perhaps the most routinely used technique in daily code development and maintenance with any VCS (Version Control System). In this post, I’d like to share a few tips for making the most of side-by-side differencing, which in my view should be adopted into every coding standard.

First of all, to benefit from side-by-side diff you need to limit the width of your lines so that you don’t need to scroll horizontally to see all the code. Countless bugs slip into a VCS, because they are hidden off screen during the final merge and people are simply tired of constantly scrolling back and forth. (All GUI usability studies agree that horizontal scrolling of text is always a bad idea.)

Granted, the modern high-resolution wide screens offer a lot of horizontal pixels, but ultimately you’ll always run out of the screen real estate if you allow lines to go on for miles. The column width must obviously allow comfortable viewing two code listings side-by-side, but you should also budget some horizontal space for the directory-tree view, vertical sliders, line numbers, and line margins, as shown in the screen shot below. I’ve been using the column width limit of no more than 78 characters. Your limit could perhaps be higher, but you must set such a limit and then enforce it without exceptions.

side-by-side diff

I can see two main reasons why people write very long lines. The first is long strings in the code. But C or C++ allow writing wide string constants in the following way:

char const s1[] = "This long string is acc\

eptable to all C compilers.";

char const s2[] = "This long string is permissible "

"in ANSI C.";

In other words, you can either use a backslash ‘\’ to terminate a string and continue in the next line, or you can terminate a string normally with a double quote ‘”‘, and an ANSI C compiler will concatenate such adjacent strings into a single zero-terminated string.

The second reason for long lines are preprocessor macros. Here again, you can use the backslash ‘\’ to break up a longer macro into lines. For example:
#define err(flag, msg) if (flag) \ printf(msg)

is the same as

#define err(flag, msg) if (flag) printf(msg)

The use of a backslash for breaking up longer lines brings up the issue of the end-of-line convention and the use of white space in your source code in general.

Let me start with the end-of-line convention. The issue here is that the backslash continuation won’t work unless the ‘\’ character is immediately followed by the end-of-line. Unfortunately, at lest two incompatible end-of-line conventions are in widespread use. The DOS/Windows end-of-line convention consists of the pair of characters CR-LF (0x0D, 0x0A in hex) to terminate lines. In contrast the UNIX™ end-of-line convention uses only one LF character (0x0A). As it turns out, Unix-like machines (e.g. Linux) are confused by the DOS end-of-line convention and will not correctly recognize the backslash-continuation, which looks like ‘\’-CR-LF (0x5C, 0x0D, 0x0A), instead of ‘\’-LF (0x5C, 0x0A).

My recommendation is to use consistently only the UNIX end-of-line convention, even on Windows machines. In my experience all Windows-based compilers have no problems with the UNIX convention, including the ancient tools from the DOS-era. As I mentioned, the converse is not true.

And finally, let me talk about the use of white space (spaces, tabs, end-of-line) in general. Obviously, to benefit from source code differencing you’d like to see only the relevant differences and differences in white space only are typically not relevant. Many code-differencing tools offer an option to ignore white space, but I would not recommend relying on it. Are files with different sizes really identical? And also, as I said before, extra spaces or tabs after the backslash, but before the end-of-line, are not allowed.

As far as tabs are concerned, I’d strongly recommend not to use them at all. Tabs are rendered differently by different editors and printers and bring only insignificant memory savings. Preferably, you should disable tabs at the editor level. At the very least, you should replace all tabs by spaces (“untabify”) before saving the file. As for spaces, I recommend removing any trailing spaces that precede the end-of-line character (LF).

Obviously, you can and should automate the source code cleanup. I use the QCLEAN utility (available here under the GPL license) for cleaning up the code from tabs, trailing blanks, and to enforce the Unix end-of-line convention. The simple console QCLEAN Windows executable scanns recursively all source files (.C, .CPP, .H, .ASM, .S, Makefile, etc.) down from the directory in which it is invoked. The following two listings show a code snippet before and after cleanup with the QCLEAN utility (spaces are shown as dots, tabs as \t, DOS end-of-lines as \r\n, UNIX end-of-lines as \n).

before cleanup:
.\t...\r\n

class.Foo.:.public.Bar.{...\n

public:.\r\n

\tFoo(int8_t.x,.int16_t.y,.int32_t z).//..ctor..\n

....:.Bar(x,.y),.m_z(z)....\n

....{}.............\n

.\t..\n

....virtual.~Foo();\t... //.xtor........\r\n

....virtual int32_t doSomething(int8_t.x);.//.method..\r\n

after cleanup with QCLEAN:
\n

class.Foo.:.public.Bar.{\n

public:\n

....Foo(int8_t.x,.int16_t.y,.int32_t z).//..ctor\n

....:.Bar(x,.y),.m_z(z)\n

....{}\n

\n

....virtual.~Foo();... //.xtor\n

....virtual int32_t doSomething(int8_t.x);.//.method\n

Is Eclipse The Emperor’s New Clothes?

Wednesday, September 26th, 2007

“Many years ago there was an Emperor so exceedingly fond of new clothes…

…one day came two swindlers. They let it be known they were weavers, and they said they could weave the most magnificent fabrics imaginable. Not only were their colors and patterns uncommonly fine, but clothes made of this cloth had a wonderful way of becoming invisible to anyone who was unfit for his office, or who was unusually stupid.

…so off went the Emperor in his new clothes that were nothing at all. Everyone in the streets and the windows said, “Oh, how fine are the Emperor’s new clothes! Don’t they fit him to perfection? And see his long train!” Nobody would confess that he couldn’t see anything, for that would prove him either unfit for his position, or a fool. No costume the Emperor had worn before was ever such a complete success.”

–Hans Christian Andersen, “The Emperor’s New Clothes”

To me this little story has a lot to do with Eclipse (www.eclipse.org), which apparently is taking our industry by storm. Obviously, I must be the poor fool, unfit to see the remarkable benefits of Eclipse, but as an embedded developer I really, honestly don’t.

Admittedly, I’m a very naïve user of Eclipse, with experience limited just to two tools: the Altera Nios II Integrated Development Environment (IDE) and the Texas Instruments Code Composer Essentials for MSP430. Both these tools are based on Eclipse, and because of this both are just terrible.

I’m really not impressed with the CDT (C/C++ Development Tooling). The CDT workspaces, project files, and makefiles are notoriously difficult to move from one development workstation to another because they contain absolute paths. Even for the simplest project the CDT manages somehow to produce hundreds of files in a directory tree 3-level deep. You tell me how am I supposed to save this in any VCS (Version Control System).

The make process takes ages.

But probably, the worst part is the GDB interface to the remote target. Not only is the connection flaky and dreadfully slow (no comparison at all to other commercial offerings.) The target connectivity spawns some GDB server processes that tend to be “pigs” (i.e., take 100% of your host CPU, even if not talking to the target.) This isn’t the highest level of professionalism…

Sure, the CDT allows you to forego the automatic makefiles generation and use external Makefiles instead (which I would actually recommend). In principle, I could also go ahead and fix any problems in Eclipse, the CDT plugin, or the GDB server, because they are all available as open source. But, then I must ask if Eclipse is really such a great productivity booster? Don’t I really have a bigger fish to fry than fighting the tool?

So, as it stands, the Eclipse Emperor is naked for me.

What do you think? What are your experiences with Eclipse in the embedded system space?

Embedded Software Crisis or Embedded Software Glut?

Saturday, June 23rd, 2007

I’ve been listening to the recent webcast “Solving the Embedded Software Crisis” (see also Rich Nass’ column “The need for more programmers” in the May issue of the ESD magazine). Of course, the main thrust of this particular webcast (as well as the ESD column) was the use of code generating tools (such as LabView from National Instruments, the sponsor of this webcast) to alleviate the allegedly looming crisis.

But tools or no tools, the real problem in my view is not so much with creating new code, as it is in getting rid of the old code.

In every company I worked for, we had to maintain just one broad code base for all products of that particular division of the company. We only kept adding to this code base, as new features, product variants, and entirely new products were released. But we never removed anything. Needless to say, the code was a kitchen sink of everything that the company ever did, including prototypes and dead ends. Most of the stuff was long obsolete, but it lived on in our code forever.

Adding code is easy. Removing dead code (without breaking the actually used parts of the code) is hard. But without the mechanisms for dropping the old baggage, we face a real Software Crisis.

Yet most managers don’t get it. I remember one day my boss came to my desk wanting to know how much code I have just cranked out. I proudly showed him that I managed to actually remove an ugly function. He was clearly disappointed in my negative productivity.

From all my experience, I’m convinced that getting rid of code is more important than creating new code. As I said, it’s not easy, but rather requires careful planning and actual design for obsolescence. In the future installments of this blog, I plan to provide a few concrete design strategies to allow easy (or at least easier) removing of obsolete code. Stay tuned.