embedded software boot camp

Abiding by Industry Standards

June 3rd, 2011 by Gary Stringham

A printer ASIC was designed to be just a PCI Express endpoint so some of the configuration registers were hard-coded as such and it was used in a printer model. Sometime later, for a new printer model, the engineers wanted to use the ASIC as a root complex to bridge to other PCI Express devices on a different bus. However, since the PCI Express configuration registers in the ASIC were hard-coded as an endpoint, the standard discovery process would not search for another bus connected to that ASIC. Fixing the ASIC to make it configurable as a root complex would have required a respin.

Because the ASIC had not been implemented per the standard, other components had to deviate from the standard to interface with it. In this case, the engineers had to hack up the firmware discovery process to say, “If the device being queried is an endpoint and the vendor and device id matches this particular chip, then treat it as a root complex and search for another bus and its devices.”

Designing a block in strict compliance to a standard has several benefits:

  • The compliant block will work, without changes, with compliant off-the-shelf components (e.g. blocks, chips, device drivers, test suites, debuggers, analyzers, development platforms).
  • Well-defined specifications already exist for verifying and testing compliant blocks.
  • Expertise in standards exists and is available.

Best Practice: Design the chip or block exactly to the specifications of the standard, even down to the details of the characteristics of the registers.

Some standards have standard subsets. For example, the RS-232 serial interface has transmit and receive lines and hardware handshaking lines. The hardware handshaking lines are optional. Hardware and drivers should be designed such that, if they are fully implemented, they will work with other components that do not have the optional part implemented.

A non-standard subset of a standard will create problems. Implementing a transmit-only RS-232 interface can cause compatibility problems with associated compliant components, such as RS-232 drivers, other RS-232 interfaces connected to it, and test suites trying to verify functionality. These interfacing components would all have to be customized, increasing the time and cost of development, and the risk of introducing bugs.

Best Practice: When implementing a subset of a standard, implement a standard subset, not a custom-designed standard.

Remember, there is no such thing as a customized standard. Either it is customized or it is a standard – not both.

Until the next standard issue…

 

The (not so) Exciting World of Documentation

February 25th, 2011 by Gary Stringham

In a survey I conducted of several firmware engineers, lack of good documentation of hardware was the number one complaint. It is because firmware engineers so heavily rely on the hardware documentation to correctly do their job. Some of the engineers said that wrong documentation is worse than no documentation because of the wasted time producing incorrect code.

It is a problem because one group (hardware engineers in this case) has to produce documentation for a different group (firmware engineers). They generally are in different physical locations, come from different engineering backgrounds, have different terminology, and have different perspectives on the end products. To overcome these differences, hardware and firmware engineers need to collaborate with each other.

Writing documentation is difficult. When I write, I understand perfectly well what I meant to say. I know and understand unwritten details and nuances that are second nature to me without realizing that my reader might not. I’m reminded of that when I have someone else review my writing and they bring up missing and incomplete sections.

While both hardware and firmware engineers should be actively involved in the collaboration, it is primarily the hardware engineers who do the majority of the work with documentation and design. When ready for review, they should give the documentation to firmware engineers for review. Firmware engineers should, in a timely fashion, review the documentation and provide hardware engineers with comments on incomplete, unclear, and incorrect sections and any issues that they might discover. Firmware review of hardware documentation should be part of the checklist for milestone completions.

Best Practice: Give hardware documentation to firmware engineers to review and respond with comments and issues.

Before firmware engineers can understand the specifics of how the block works, they need to have an overall picture of the purpose and operations of the block. If they focus too much on details, specific registers, and bits, it is hard to see what the overall operation should be. Not having a high-level description can be compared to seeing a box full of nuts, bolts, springs, levers, and other parts and not being able to recognize that it is supposed to be a toaster. But by first stepping back and looking at a high-level description of the toaster as a whole and understanding how it toasts bread, then it is easier to understand how to assemble the components to make a functional toaster.

The same concept applies to a block on the chip. Firmware engineers need to see and understand the big picture of how that block should operate, within itself and in conjunction with other parts of the system, before they can understand the detailed registers and bits.

Best Practice: Produce and have reviewed a high-level description of the block that describes its theory of operation, its function in the system, and its parts.

In addition to the high-level documentation, a detailed documentation is needed and should contain both a reference section and a tutorial section. A reference section has a list of all registers in the block, typically in address order. It gives details for each register and the bits and/or bit fields in that register. The tutorial section describes when, how, and in what order to use those registers and bits to carry out a task.

An example of reference documentations are the UNIX man pages (and Linux and other variants) which contains all commands and functions in alphabetical order, each one describing their respective options in order. On the other hand, the well-known book by Kernighan and Ritchie, “The C Programming Language,” is written in tutorial style, explaining how to do various tasks, using necessary C constructs to accomplish the tasks.

Best Practice: Provide both a reference section and a tutorial section in the detailed documentation of a block.

Until the next writing…

Designing a Chip for Unplanned Products

December 29th, 2010 by Gary Stringham

One of the rules of the Extreme Programming design philosophy for software is Never Add Functionality Early. This means that when coding for one product, do not add features or functionality needed for a future product. While this rule does have some merit for software development, it should be applied more judiciously to hardware development.

One of the reasons programmers can get away with that type of thinking is because they can and are able to change software rapidly by simply recompiling and re-executing. Hardware engineers do not have that luxury. It takes months and millions of dollars to produce a different version of the chip. Future costs can be reduced if the chip is designed with the future in mind, with the possibility that it might be used in future products that are currently unplanned.

There is also an aspect of planning for the future where you prepare by building a framework but not putting in the features. That makes it easier to implement the new features in the future but still requires putting out a new version of the chip. But that will be the subject for another issue. For this issue, I am talking about what can be done to make this version of the chip more likely to be used in the future.

Not all known future functionality should be put into the design; time, effort, space and cost will not permit it. But some of it has little risk in terms of complexity, implementation and verification. If a three-bit number might need four bits in the future, making it a four-bit number now has little risk to the program. Adding a couple of extra GPIO lines is also low-risk (assuming there are pins for it.) Adding a block that is new, big and complex could add a lot of risk to the program and needs to be weighed against the potential gain if it were needed.

Extreme Programming says that “only 10% of that extra stuff will ever get used.” But if you put 10 “extra” features in the chip and only one gets used, you could save your company months and millions of dollars.

Best Practice: Include low-risk features in the design of the chip that might be used in future products.

Aside from features that might be needed in the future, faster speed is characteristic of future products. Chips are designed with a performance and speed budget. Planning for the future would mean that you also look at the speed of the chip. I have seen cases where existing chips were limited in their usefulness because they were not fast enough and we had to spend lots of time and money to produce a new version just to get the speeds we needed.

Best Practice: Increase the performance margins in the design to allow the chip to be used in faster products in the future.

Always looking toward the future…

Built-in Debugging Support

October 30th, 2010 by Gary Stringham

After a presentation I gave at a conference, one of the attendees came up and told me about his ASIC design team that consisted of young engineers. They had completed their design and told him that they were done. He then asked the question, “Six months from now when you get the silicon back and it does not work, what are you going to need to diagnose and solve the problems?” They went back to their desks and worked some more.

He taught them a very important principle: Think about what could go wrong and what would be needed to figure that out. Too often, hardware designs assume that nothing will go wrong. It is akin to a software function that does not check the validity of parameters being passed in, or a hardware module that does not synchronize an incoming signal to its clock.

Hardware engineers are good at troubleshooting chips by mounting them to test fixtures and attaching probes. However, when the chip is mounted inside a prototype device, it is often at the expense of not being able to attach the test fixtures. Unfortunately, some problems will not reveal themselves until the chip is inside the actual device running the actual firmware. For situations like this, it’s important for hardware to be designed to assist troubleshooting.

A technique that I have successfully employed is to build firmware-accessible debugging resources into the chip. It is like having a built-in logic analyzer. Here are three types of debugging resources I’ve found particularly useful.

Some devices use a specific number of signal pulses to control certain features. For example, a laser printer generates a horizontal sync pulse to set the size of the paper to be printed. When the printer is operating properly, you probably do not need to know how many pulses occurred. But if something is wrong, you might want to know how many pulses were generated. Maybe only enough pulses for a Letter-size sheet of paper was generated when you are trying to print on a Legal-size sheet. In this case, having a pulse counter on the signal that firmware can read and reset would help solve that problem.

Best Practice: Provide firmware-readable and resettable event counters to track the occurrences of key events in the hardware.

Well-designed UARTs allow firmware to read the current levels of the handshaking signals. That helps troubleshoot the RS-232 communications. The same technique can be used for any other I/O signals where knowing the current levels of those signals could help troubleshoot problems.

Best Practice: Provide read access to view the current state of key input and output signal pins.

As a general rule, firmware does not need to know the current state of a state machine in hardware. However when there are problems, knowing the state can prove useful. A co-worker was trying to get a new chip going and it would not work. He read the current state of the state machine and discovered that it was stuck in a state waiting for an external signal. He looked at the prototype board and discovered that a resister was missing on that signal line. He solved that problem in a matter of minutes whereas it probably would have taken him hours if that state register had not been there.

Best Practice: Provide a register that shows the current state of each state machine.

Until the next bug…

Accommodating Product Changes

September 7th, 2010 by Gary Stringham

Late in the development of a new printer, a third-party print engine that interfaced with a block on the ASIC changed its interface behavior. The print engine would quit sending pulses before the block was done with its job, causing the block to hang waiting for more pulses. This behavior existed in other printer models; their associated blocks had support to detect early pulse termination. Because this new printer’s block did not have that support, I had to create a firmware workaround for it.

I tried three different algorithms over a three-week period and finally settled on one, even though under rare circumstances, it had a severe performance penalty. Unfortunately, one particular customer bought several of these printers for a specific application and experienced this rare circumstance that cut print speed in half for every page. Obviously the customer was dissatisfied and threatened to return the printers and cancel the large order, which included other printer models. I spent another two weeks to come up with a fourth algorithm that avoided the penalty. We upgraded the customer’s printers and they were happy.

Had this particular block had the early termination support (at a low cost of a few hundred gates), we could have accommodated this late product change with a one-hour firmware change. Instead, beyond nearly costing us a major contract, it also cost five weeks of engineering effort, thousands of warranty dollars and some damage to our reputation.

In contrast, we had an ASIC that was designed for portrait-format printers that was being investigated for use in landscape-format printers. The ASIC was based off previous ASICs that did support landscape-format printers. The engineers did not remove the landscape-format support even though the ASIC was not targeted for landscape-format printers. ASICs for landscape-format printers require more onboard memory in several places in the pipeline to accommodate longer scan lines. Reducing the memory requirement would have reduced their gate count, but they decided not to do that when they made this ASIC.

After verifying that landscape-format support was still in the ASIC, we used them in landscape-format printers. This saved us millions of dollars and several months from having to develop and produce a new ASIC.

Ideally, every ASIC will support any product, whether old or new, high-end or low-end. Since that is not realistic, choices have to be made. Old functionality can be removed once you are sure that it will never be needed again, even for mid-life kicker products. Functionality with large gate-count requirements that is only needed for specific family lines of products could be removed. But where possible, leave as much functionality in an ASIC, even if not all of it is needed for the targeted products.

Best Practice: Implement and retain all known low-overhead functionality in a block, even if the current requirements do not call for it.

Until the next shipment…