The readers of this blog have certainly noticed that EmbeddedGurus is no longer active.
But the State-Space blog is not dead! The blog has been migrated to state-machine.com, where it will continue. Please check it out!
In this installment of my “Embedded Toolbox” series, I would like to share with you the free source code cleanup utility called QClean for cleaning whitespace in your source files, header files, makefiles, linker scripts, etc.
You probably wonder why you might need such a utility? In fact, the common thinking is that compilers (C, C++, etc.) ignore whitespace anyway, so why bother? But, as a professional software developer you should not ignore whitespace, because it can cause all sorts of problems, some of them illustrated in the figure below:
Note: The problems caused by whitespace in the source code are particularly insidious, because you don’t see the culprit. By using an automated whitespace cleanup utility you can save yourself hours of frustration and significantly improve your code quality.
QClean is a simple and blazingly fast command-line utility to automatically clean whitespace in your source code. QClean is deployed as natively compiled executable and is located in the QTools Collection (in the sub-directory qtools/bin
). QClean is also available in portable source code and can be adapted and re-compiled on all desktop platforms (Windows, POSIX –Linux, MacOS).
Typically, you invoke QClean from a command-line prompt without any parameters. In that case, QClean will cleanup white space in the current directory and recursively in all its sub-directories.
Note: If you have added the qtools/bin/ directory to your
PATH
environment variable (see Installing QTools), you can runqclean
directly from your terminal window.
As you can see in the screen shot above, QClean processes the files and prints out the names of the cleaned up files. Also, you get information as to what has been cleaned, for example, “Trail-WS” means that trailing whitespace has been cleaned up. Other possibilities are: “CR” (cleaned up DOS/Windows (CR) end-of-lines), “LF” (cleaned up Unix (LF) end-of-lines), and “Tabs” (replaced Tabs with spaces).
QClean takes the following command-line parameters:
PARAMETER | DEFAULT | COMMENT |
---|---|---|
[root-dir] |
. |
root directory to clean (relative or absolute) |
OPTIONS | ||
-h |
help (show help message and exit) | |
-q |
query only (no cleanup when -q present) | |
-r |
check also read-only files | |
-l[limit] |
80 | line length limit (not checked when -l absent) |
QClean fixes the following whitespace problems:
-l
option, default 80 characters per line).QClean can optionally check the code for long lines of code that exceed a specified limit (80 characters by default) to reduce the need to either wrap the long lines (which destroys indentation), or the need to scroll the text horizontally. (All GUI usability guidelines universally agree that horizontal scrolling of text is always a bad idea.) In practice, the source code is very often copied-and-pasted and then modified, rather than created from scratch. For this style of editing, it’s very advantageous to see simultaneously and side-by-side both the original and the modified copy. Also, differencing the code is a routinely performed action of any VCS (Version Control System) whenever you check-in or merge the code. Limiting the line length allows to use the horizontal screen real estate much more efficiently for side-by-side-oriented text windows instead of much less convenient and error-prone top-to-bottom differencing.
QClean applies the following rules for cleaning the whitespace depending on the file types:
FILE TYPE | END-OF-LINE | TRAILING WS | TABS | LONG-LINES |
---|---|---|---|---|
.c |
Unix (LF) | remove | remove | check |
.h |
Unix (LF) | remove | remove | check |
.cpp |
Unix (LF) | remove | remove | check |
.hpp |
Unix (LF) | remove | remove | check |
.s |
Unix (LF) | remove | remove | check |
.asm |
Unix (LF) | remove | remove | check |
.lnt |
Unix (LF) | remove | remove | check |
.txt |
DOS (CR,LF) | remove | remove | don’t check |
.md |
DOS (CR,LF) | remove | remove | don’t check |
.bat |
DOS (CR,LF) | remove | remove | don’t check |
.ld |
Unix (LF) | remove | remove | check |
.tcl |
Unix (LF) | remove | remove | check |
.py |
Unix (LF) | remove | remove | check |
.java |
Unix (LF) | remove | remove | check |
Makefile |
Unix (LF) | remove | leave | check |
.mak |
Unix (LF) | remove | leave | check |
.html |
Unix (LF) | remove | remove | don’t check |
.htm |
Unix (LF) | remove | remove | don’t check |
.php |
Unix (LF) | remove | remove | don’t check |
.dox |
Unix (LF) | remove | remove | don’t check |
.m |
Unix (LF) | remove | remove | check |
The cleanup rules specified in the table above can be easily customized by editing the array l_fileTypes
in the qclean/source/main.c
file. Also, you can change the Tab sizeby modifying the TAB_SIZE
constant (currently set to 4) as well as the default line-limit by modifying the LINE_LIMIT
constant (currently set to 80) at the top of the the qclean/source/main.c
file. Of course, after any such modification, you need to re-build the QClean executable and copy it into the qtools/bin directory.
Note: For best code portability, QClean enforces the consistent use of the specified End-Of-Line convention (typically Unix (LF)), regardless of the native EOL of the platform. The DOS/Windows EOL convention (CR,LF) is typically not applied because it causes compilation problems on Unix-like systems (Specifically, the C preprocessor doesn’t correctly parse the multi-line macros.) On the other hand, most DOS/Windows compilers seem to tolerate the Unix EOL convention without problems.
QClean is very simple to use (no parameters are needed in most cases) and is fast (it can easily cleanup hundreds of files per second). All this is designed so that you can use QClean frequently. In fact, the use of QClean after editing your code should become part of your basic hygiene–like washing hands after going to the bathroom.
In my previous post “A Heap of Problems” I have compiled a list of problems the free store (heap) can cause in real-time embedded (RTE) systems. This was quite a litany, although I didn’t even touch the more subtle problems yet (for example, the C++ exception handling mechanism can cause memory leaks when a thrown exception bypasses memory de-allocation).
But even though the free store is definitely not a free lunch, getting by without the heap is certainly easier said than done. In C, you will have to rethink implementations that use lists, trees, and other dynamic data structures. You’ll also have to severely limit your choice of the third-party libraries and legacy code you want to reuse (especially if you borrow code designed for the desktop). In C++, the implications are even more serious because the object-oriented nature of C++ applications results in much more intensive dynamic-memory use than in applications using procedural techniques. For example, most standard C++ libraries (e.g., STL, Boost, etc.) requrie the heap. Without it, C++ simply does not feel like the same language.
Here are a few common sense guidelines for dealing with the heap:
1. For smaller systems, such as microcontrollers with only on-chip RAM, you probably don’t want to open the heap can of worms at all. The problems and waste that goes with the heap aren’t simply worth the trouble.
For systems with sufficient RAM, such as processors with megabytes of external DRAM, trading some of this cheap RAM for convenience in programming might be a reasonable deal. In the following discussion I assume that the system is big enough to run under a preemptive RTOS.
2. The simplest option is to limit the use of the heap to just one task. In this case, heap is not being shared concurrently and does not need any mutual-exclusion protection mechanism. To limit the non-determinism of the heap, I would recommend assigning low priority to the task that uses the heap. The priority should be lower than any real-time task.
3. At the expense of introducing a mutual protection to *all* heap operations (e.g., a mutex), you can allow more than one task to use the heap. However, I would still strongly recommend against using the heap in any tasks with real-time deadlines. All tasks that use the heap should run at a lower priority than any of the real-time tasks.
4. In any case, heap should never be used inside the interrupt service routines (ISRs).
In summary, using the heap in real-time embedded (RTE) systems always requires extra thought and discipline. You should always make sure that the heap is correctly integrated with your runtime environment.