embedded software boot camp

Robust Embedded Software Architecture in 5 Easy Steps

Thursday, September 17th, 2009 by Michael Barr

Over the past few years, I’ve spent a large amount of my time consulting with and training software development teams that are in the midst of rearchitecture. These teams have already developed the firmware inside successful long-lived products or product families. But to keep moving forward, reduce bugs, and speed new feature development, they need to take the best of their old code and plug it into a better firmware architecture.

In the process, I have collected substantial anecdotal evidence that few programmers, technical managers, or teams truly understand what good firmware architecture is, how to achieve it, or even how to recognize it when they see it. That includes the most experienced individual developers on a team. Yet, despite the fact that these teams work in a range of very different industries (including safety-critical medical devices), the rearchitecture process is remarkably similar from my point of view. And there are numerous ways that our clients’ products and engineering teams would have benefited from getting their firmware architecture right from the beginning.

Though learning to create solid firmware architecture and simultaneously rearchitecting legacy software may take a team months of hard work, five key steps are easily identified. So whether you are designing firmware architecture from scratch for a new product or launching a rearchitecture effort of your own, here is a step-by-step process to help your team get started on the right foot.

Step 1: Identify the Requirements

Before we can begin to (re)architect an embedded system or its firmware, we must have clear requirements. Properly written requirements define the WHAT of a product. WHAT does it the product for the user, specifically? For example, if the product is a ventilator the list of WHAT it does may include a statement such as:

If power is lost during operation, the ventilator shall resume operation according to its last programmed settings within 250 ms of power up.

Note that a properly written requirement is silent about HOW this particular part of the overall WHAT is to be achieved. The implementation could be purely electronics or a combination of electronics and firmware; the firmware, if present, might contain an RTOS or it might not. From the point of view of the requirement writer, then, there may as well be a gnome living inside the product that fulfills the requirement. (So long as the gnome is trustworthy and immortal, of course!)

Each requirement statement must also be two other things: unambiguous and testable. An unambiguous statement requires no further explanation. It is as clear and as concise as possible. If the requirement includes a mathematical model of expected system behavior, it is helpful to include the equations.

Testability is key. If a requirement is written properly, a set of tests can be easily constructed to verify that requirement is met. Decoupling the tests from the particulars of the implementation, in this manner, is of critical importance. Many organizations perform extensive testing of the wrong stuff. Any coupling between the test and the implementation is problematic.

A proper set of requirements is a written list of statements each of which contains the key phrase “… the [product] shall …” and is silent about how it is implemented, unambiguous, and testable. This may seem like a subject unrelated to architecture, but too often it is poor requirements that constrain architecture. Thus good architecture depends in part on good requirements.

Step 2: Distinguish Architecture from Design

Over the years, I have found that many engineers (as well as their managers) struggle to separate the various elements or layers of firmware engineering. For example, Netrino is barraged with requests for “design reviews” that turn out to be “code reviews” because the customer is confused about the meaning of “design”. This even happens in organizations that follow a defined software development lifecycle. We need to clear this up.

The architecture of a system is the outermost layer of HOW. Architecture describes persistent features; the architecture is hard to change and must be got right through careful thinking about intended and permissible uses of the product. By analogy, an architect describes a new office building only very broadly. A scale model and drawings show the outer dimensions, foundation, and number of floors. The number of rooms on each floor and their specific uses are not part of the architecture.

Architecture is best documented via a collection of block diagrams, with directional arrows connecting subsystems. The system architecture diagram identifies data flows and shows partitioning at the hardware vs. firmware level. Drilling down, the firmware architecture diagram identifies subsystem-level blocks such as device drivers, RTOS, middleware, and major application components. These architectural diagrams should not have to change even as roadmap features are added to the product—at least for the next few years. Architectural diagrams should also pass the “six pack test,” which says that even after drinking a six pack every member of the team should still be able to understand the architecture; it is devoid of confusing details and has as few named components as possible.

The design of a system is the middle layer of HOW. The architecture does not include function or variable names. A firmware design document identifies these fine-grained details, such as the names and responsibilities of tasks within the specific subsystems or device drivers, the brand of RTOS (if one is used), and the details of the interfaces between subsystems. The design documents class, task, function/method, parameter and variable names that must be agreed upon by all implementers. This is similar to how a design firm hired by the renter of a floor on the office building describes the interior and exterior of the new building in finer detail than the architect. Designers locate and name rooms and give them specific purposes (e.g., cube farm, corner office, or conference room).

An implementation is the lowest layer of HOW. There need be no document, other than the source code or schematics, to describe the implementation details. If the interfaces are defined sufficiently at the design level above, individual engineers are able to begin implementation of the various component parts in parallel. This is similar to the way that a carpenter, plumber, and electrician work in parallel in nearby space, applying their own judgment about the finer details of component placement, after the design has been approved by the lessee.

Of course, there is architecture and there is good architecture. Good architecture makes the most difficult parts of the project easy. These difficult parts vary in importance somewhat from industry to industry, but always center on three big challenges that must be traded off against each other: meeting real-time deadlines, testing, and diversity management. Addressing those issues comprise the final three steps.

Step 3: Manage CPU Time

Some of your product’s requirements will mention explicit amounts of time. For example, consider the earlier ventilator requirement about doing something “within 250 ms of power up.” That is a timeliness requirement. “Within 250 ms of power up” is just one deadline for the ventilator implementation team to meet. (And something to be tested under a variety of scenarios.) The architecture should make it easy to meet this deadline, as well as to be certain it will always be met.

Most products feature a mix of non-real-time, soft-real-time, and hard-real-time requirements. Soft deadlines are usually the most challenging to
define in an unambiguous manner, test, and implement. For example, in set-top box design it may be acceptable to drop a frame of video once in a while, but never more than two in a row, and never any audio, which arrives in the same digital input stream. The simplest way to handle soft deadlines is to treat them as hard deadlines that must always be met.

With deadlines identified, the first step in architecture is to push as many of the timeliness requirements as possible out of the software and onto the hardware. Figure 1 shows the preferred placement of real-time functionality. As indicated, an FPGA or a dedicated CPU is the ideal place to put real-time functionality (irrespective of the length of the deadline). Only when that is not possible, should an interrupt service routine (ISR) be used instead. And only when an ISR won’t work should a high-priority task be used.

Where to Put Real-Time Functionality

Figure 1. Where to Put Real-Time Functionality

Keeping the real-time functionality separate from the bulk of the software is valuable for two important reasons. First, because it simplifies the design and implementation of the non-real-time software. With timeliness requirements architected out of the bulk of the software, code written by novice implementers can be used without affecting user safety.

A second advantage of keeping the real-time functionality together is it simplifies the analysis involved in proving all deadlines are always met. If all of the real-time software is segregated into ISRs and high-priority tasks, the amount of work required to perform rate monotonic analysis (RMA) is significantly reduced. Additionally, once the RMA analysis is completed, it need not be revised every time the non-real-time code is tweaked or added to.

Step 4: Design for Test

Every embedded system needs to be tested. Generally, it is also valuable or mandatory that testing be performed at several levels. The most common levels of testing are:

System Tests verify that the product as a whole meets or exceeds the stated requirements. System tests are generally best developed outside of the engineering department, though they may fit into a test harness developed by engineers.

Integration Tests verify that a subset of the subsystems identified in the architecture diagrams interact as expected and produce reasonable outcomes. Integration tests are generally best developed by a testing group or person within software engineering.

Unit Tests verify that individual software components identified at the intermediate design level perform as their implementers expect. That is, they test at the level of the public API the component presents to other components. Unit tests are generally best developed by the same people that write the code under test.

Of the three, system tests are most easily developed, as those test the product at its exposed hardware interfaces to the world (e.g., does the dialysis machine perform as required). Of course, a test harness may need to be developed for engineering and/or factory acceptance tests. But this is generally still easier than integration and unit tests, which demand additional visibility inside the device as it operates.

To make the development, use, and maintenance of integration and unit tests easy it is valuable to architect the firmware in a manner compatible with a software test framework. The single best way to do this is to architect the interactions between all software components at the levels you want to test so they are based on publish-subscribe event passing (a.k.a., message passing).

Interaction based on a publish-subscribe model allows a lightweight test framework like the one shown in Figure 2 to be inserted alongside the software component(s) under test. Any interface between the test framework and the outside world, such as a serial port, provides an easy way to inject or log events. A test engine on the other side of that communications interface can then be designed to accept test “scripts” as input, log subscribed event occurrences, and off-line check logged events against valid result sequences. Adding timestamps to the event logger and scripting language features like delay(time) and waitfor(event) significantly increases testing capability.

A Test Framework Based on a Publish-Subscribe Event Bus

Figure 2. A Test Framework Based on a Publish-Subscribe Event Bus

It is unfortunate that the publish-subscribe component interaction model is at odds with proven methods of analyzing software schedulability (e.g., RMA). The sheer number of possible message arrival orders, queue depths, and other details make the analysis portion of guaranteeing timeliness difficult and fragile against minor implementation changes. This is, in fact, why it is important to separate the code that must meet deadlines from the rest of the software. In this architecture, though, the real-time functionality remains difficult to test other than at the system level.

Step 5: Plan for Change

The third key consideration during the firmware architecture phase of the project is the management of feature diversity and product customizations. Many companies use a single source code base to build firmware for a family of related products. For example, consider microwave ovens; though one high-end model may feature a dedicated “popcorn” button, another may lack this. The architecture of any new product’s firmware will also soon be tested and stretched in the direction of foreseeable planned feature additions along the product road map.

To plan for change, you must first understand the types of changes that occur in your specific product. Then architect the firmware so that those sorts of changes are the easiest to make. If the software is architected well, feature diversity can be managed through a single software build with compile-time and/or run-time behavioral switches in the firmware. Similarly, new features can be added easily to a good architecture without breaking the existing product’s functionality.

An architectural approach that handles product family diversity particularly well is one in which groups of related software components are collected into “packages”. Each such package is effectively an internal widget from which larger products can be built. The source code and unit tests for each particular package should be maintained by a team of “package developers” focused primarily on their stability and ease of use.

Teams of “product developers” combine stable releases of packages that contain the features they need, customize each as appropriate (e.g., via compile- or run-time mechanisms, or both) to their particular product, and add product-specific “glue.” Typically, all of the products in a related product family are built upon a common “Foundation” package (think API). For example a Model X microwave might be built from Foundation + Package A + Package B; whereas Model Y might consist of Foundation + A’ + B + C, where package A’ is a compile-time variant of package A and package C contains optional high-leve
l cooking features, such as “Popcorn.”

Using this approach in a large organization, a new product built from a selection of stable bug-free packages can be brought to market quickly—and all products share an easy upgrade path as their core packages are improved. The main challenge in planning for change of this sort is in striking the right balance between packages that are too small and packages that are too large. Like many of the details of firmware architecture, achieving that balance for a number of years is more of an art than a science.

Next Steps

I hope the five-step “architecture road map” presented here is useful to you. I plan to drill down into more of the details in articles and blog posts over the coming months. Meanwhile your constructive feedback is welcome via the comment form or e-mail.

Tags: , , ,

6 Responses to “Robust Embedded Software Architecture in 5 Easy Steps”

  1. Sundar Srinivasan says:

    That's simple and neat, Michael. But I think this would work only if performance is the single design objective. We crack under pressure when we have to guarantee performance under the constraints of power, area, and of course time to market. That's when we have to start looking at application-specific hardware. That part is not addressed.

  2. Nigel Jones says:

    Very nice article Mike. I could probably write an essay on all my thoughts, but I'll restrain myself to the first step that you outlined – namely the 'what'. In my experience this is amazingly difficult to obtain. Particular problems I often see are:1. An incomplete list of 'whats'2. A 'what' that is incomplete. For example "The oven must be up to temperature with 5 minutes". Seems clear enough – until you realize that getting it up to temperature is one thing, but ensuring it doesn't overshoot by too much is another!3. Over specification at times. (I see this a lot on government type requirements)Anyway, given that it's the first step (and in my opinion the toughest to get right), it's hardly surprising that a lot of products have architectural problems.

  3. K1200LT Rider says:

    One basic problem with requirements that I've seen a handful of times goes something like this:"The event shall occur at 30 Hz" – with no tolerance specified. As written, a measured frequency of 29.9999999 Hz is a failure. In this case, repeated requests for a tolerance went unanswered, so we picked what we thought was suitable, and the requirement has never been edited. Amazing.

  4. Rob Williams says:

    A recommendation that has several plus points in its favour is to clearly divide the requirements into "Positives" and "Negatives". The positives state what the system should be able to do, while the negatives identify the behaviours which are to be avoided at all costs.There are at least three advantages for establishing the distinction at an early stage in the development:1. Separate the development team into two parts, one to deal with the core functional behaviour, while the second deals with error management aspects. This eliminates, or at least mitigates, the worst case errors which arise from misunderstanding the original requirements spec. The two teams carry out separate analysis, design and implementation, with less chance of propagating the fundamental mistake throughout the final system.2. It reduces the likelihood of a conflict with clients around some "implied" behaviour. Having documentation that states what the system will NOT do is a powerful argument in any dispute.3. The separation of functional core code from the associated error management routines, as far as possible, improves testability and eventual reusability.

  5. Thomas Honold says:

    I have read your post and want to give some feedback. First I have enjoyed reading it. The best part for me was Step 2: "Distinguish architecture from design" showing the different levels of HOW.For the test aspect I think you haven't yet shown the complete picture. Since testing is about 60 % of the whole project time, I think it should be a little bit more detailed. Here is what I suggest:In Step 4: "Design for test" from my point of view it is missing that the tests should all be requirements driven. This means, that the requirements which were defined in Step 1 are the base of all tests. This formal approach has the big advantage that you do not test what the code does (like white box Unit Tests in the past), it is tested what the code is supposed to do.It should also be stated that it is not required to test each requirement by a single test. To reduce the overall test time, you can run the OK-case tests first by using the Release Build of the product. Then you check what requirements where already covered. Then the Error-Cases should be tested by an instrumented Debug Build (by injecting errors), to see what happens in error cases. It is a Debug Build, since the final code should not contain any test features, since they are unused code in normal operation.What we do in our SW development process for avionics equipment (DO-178B) is1. Run test procedures with all requirements based OK-cases (Release Build of SW)2. Run test procedures with all requirements based Error-cases (instrumented Debug Build)3. Do dedicated Unit Tests for modules which were not coverable by the first 2 steps4. If Unit Tests do no cover 100% of the code, manual code inspection fills this gapWe use PERL scripts on a Personal Computer commanding the embedded system to execute the dedicated tests. The test results then are sent back to the PC for a final report. All tests have to run automatically without any user interaction.This test approach is quite a lot of work for the first time. But for all further SW releases it pays off.I have once read that the most SW bugs are introduced by SW changes. I think this is absolutely the truth.I'm looking forward to you next articles for this matter.kind regards,Thomas

  6. Miro Samek says:

    The post is interesting and makes some valid points, but I think it entirely misses the promise made in the title, because this post does not cover embedded systems **architecture** at all. Instead, the post outlines some steps of the embedded software development **process**. Consequently, I would suggest to change the title and classification of this post accordingly.

    The problem with the misleading title is that this post comes up in searches for “embedded software architecture”, which is misleading if left as is.

    A post about embedded software architecture should describe, at a minimum, at least some aspects of the recommended software structure. For example, it could mention some general types of architectures, some architectural design patterns, or anything related to software structure.

Leave a Reply