Posts Tagged ‘documentation’

The (not so) Exciting World of Documentation

Friday, February 25th, 2011 Gary Stringham

In a survey I conducted of several firmware engineers, lack of good documentation of hardware was the number one complaint. It is because firmware engineers so heavily rely on the hardware documentation to correctly do their job. Some of the engineers said that wrong documentation is worse than no documentation because of the wasted time producing incorrect code.

It is a problem because one group (hardware engineers in this case) has to produce documentation for a different group (firmware engineers). They generally are in different physical locations, come from different engineering backgrounds, have different terminology, and have different perspectives on the end products. To overcome these differences, hardware and firmware engineers need to collaborate with each other.

Writing documentation is difficult. When I write, I understand perfectly well what I meant to say. I know and understand unwritten details and nuances that are second nature to me without realizing that my reader might not. I’m reminded of that when I have someone else review my writing and they bring up missing and incomplete sections.

While both hardware and firmware engineers should be actively involved in the collaboration, it is primarily the hardware engineers who do the majority of the work with documentation and design. When ready for review, they should give the documentation to firmware engineers for review. Firmware engineers should, in a timely fashion, review the documentation and provide hardware engineers with comments on incomplete, unclear, and incorrect sections and any issues that they might discover. Firmware review of hardware documentation should be part of the checklist for milestone completions.

Best Practice: Give hardware documentation to firmware engineers to review and respond with comments and issues.

Before firmware engineers can understand the specifics of how the block works, they need to have an overall picture of the purpose and operations of the block. If they focus too much on details, specific registers, and bits, it is hard to see what the overall operation should be. Not having a high-level description can be compared to seeing a box full of nuts, bolts, springs, levers, and other parts and not being able to recognize that it is supposed to be a toaster. But by first stepping back and looking at a high-level description of the toaster as a whole and understanding how it toasts bread, then it is easier to understand how to assemble the components to make a functional toaster.

The same concept applies to a block on the chip. Firmware engineers need to see and understand the big picture of how that block should operate, within itself and in conjunction with other parts of the system, before they can understand the detailed registers and bits.

Best Practice: Produce and have reviewed a high-level description of the block that describes its theory of operation, its function in the system, and its parts.

In addition to the high-level documentation, a detailed documentation is needed and should contain both a reference section and a tutorial section. A reference section has a list of all registers in the block, typically in address order. It gives details for each register and the bits and/or bit fields in that register. The tutorial section describes when, how, and in what order to use those registers and bits to carry out a task.

An example of reference documentations are the UNIX man pages (and Linux and other variants) which contains all commands and functions in alphabetical order, each one describing their respective options in order. On the other hand, the well-known book by Kernighan and Ritchie, “The C Programming Language,” is written in tutorial style, explaining how to do various tasks, using necessary C constructs to accomplish the tasks.

Best Practice: Provide both a reference section and a tutorial section in the detailed documentation of a block.

Until the next writing…

Balancing How Firmware Waits on Hardware

Friday, May 7th, 2010 Gary Stringham

A common question engineers often wrestle with is how long hardware will take to do a requested task so firmware can take the next step. Engineers implement different designs (both in hardware and firmware) depending on the length of time, and these designs have varying impacts on hardware and firmware complexity and overall system performance. Understanding their ramifications during the design phase helps balance the load between hardware and firmware.

Based on the hardware and firmware implementation required, we can group these designs into three categories:

  • No Delay – Hardware completes the task almost immediately. Firmware can assume the task is immediately completed and can safely take the next step.
  • Short Delay – Hardware completes the task after a short delay. Firmware must wait momentarily for the task to complete before taking the next step.
  • Long Delay – Hardware completes the task after a long delay. The wait time is long enough that firmware should do other processing while waiting for the task to complete before it can take the next step.

Let’s take aborts in hardware as an example, since implementations exist in each of the three categories – no, short, and long delays. For some aborts there is no delay; it is a simple matter of returning back to the home or idle state, clearing counters and buffers, and completing other activities that can be done quickly. Such an operation is so quick that it is not necessary for hardware to add extra logic for a status or interrupt bit. In these cases, firmware can initiate the abort and simply move on to the next step, which may be to set up the hardware for the next job. The key is for hardware to complete the abort before firmware tries to access it again.

Best Practice: When the task in hardware is fast enough to complete before the next firmware access, hardware does not need to implement a status or interrupt bit for task completion.

Some abort implementations can take several clock cycles to complete, which means that firmware must wait for completion before accessing the block again. If it is a short delay, hardware should provide a status bit that firmware could poll, looping a few times until the task is done, then move on to the next step. If there is a long delay, then hardware should provide an interrupt bit that firmware will enable. Firmware will then do other processing while waiting for the interrupt to occur. Setting up, waiting and responding to an interrupt requires several CPU cycles with task swaps, context switches and semaphore handling. Thus, for firmware, polling a status bit is preferable to managing an interrupt if the task will be done after a short delay.

Where that line should be between short and long delays must be determined on a case-by-case basis and depends on the hardware platform, operating system and performance requirements. The dividing line could even move dynamically depending on the current operating conditions of the product. To give engineers the flexibility of moving that dividing line, the hardware for short and long delays should be the same, implemented with both a status bit and a maskable interrupt. This flexibility allows engineers to calculate or take measurements to count how many loops the polling is taking and determine if polling is acceptable or if interrupts are needed.

Best Practice: Implement both a status bit and a maskable interrupt bit to indicate completion of hardware tasks that take time to complete, whether a short or a long time.

For some blocks, the time the abort takes can vary from a short delay if the block is in an idle state to a long delay if the block is busy and needs to gracefully terminate. Since firmware cannot know the current state, it must always assume the worse case. If firmware wants to take advantage of the shorter aborts when they do occur, it could poll for several loops in case the task completes quickly. If not, then enable the interrupt and switch to another task.

To help engineers know how to implement the firmware, put in the block’s documentation the min and max abort times and the conditions in which they will occur. It could be something such as, “if the block is already idle, the abort will complete in 20ns, otherwise it will take 2-3us to complete.”

Best Practice: Document the min and max times that a hardware task will take, including the conditions and states that affect those times.

I used aborts for these examples, but the concepts apply for any firmware-initiated hardware task that could take time to complete. Implementing both status and interrupt bits for short- and long-delay hardware tasks allows firmware to balance the system load and performance by using polling loops or interrupts as appropriate.

Until the next interrupt (which will not occur for at least 1,000,000,000,000us)…