Atmel has a very interesting application note on IEC60730 Class B compliance. If you aren’t aware of IEC60730, there is a nice introduction here. In a nutshell IEC60730 Class B compliance is a safety standard related to household appliances. Part of IEC60730 requires that one actively monitor that a microcontroller (if one is used) is functioning correctly. This seems to be a reasonable thing to do. However, as the Atmel application note shows, meeting this requirement requires one to constantly do things such as test memory, confirm that timers are operating at the correct frequencies and so on. Again conceptually this doesn’t seem unreasonable. However, my concern with this is that the very act of confirming that the hardware is functioning could result in a system failure at a critical point, thus creating the very problem the standard is designed to prevent.
For example, it’s hard to argue with the contention that the stack is the most used portion of memory in most microcontrollers. I think most engineers would agree that if the memory used for the stack malfunctioned then disastrous things would most likely occur. On this basis, a regular check of the Stack memory would seem to be in order. Maybe it’s just me, but the thought of running a memory test on the stack area of a processor while simultaneously trying to respond to interrupts etc seems like a very tall order. Indeed, I can easily envisage a piece of code that is designed to test the stack area malfunctioning and causing a system crash and potentially causing the very thing it’s designed to avoid.
I think what it comes down to is this. The reliability of hardware seems to me to be several orders of magnitude better than the reliability of software. Thus using software to validate hardware seems problematic. I’ll be very interested to see what happens the first time someone gets hurt as a result of a malfunction in software written to conform to IEC60730. If you don’t think this is likely, take a look at the size of the object code produced by Atmel’s suggested tests. Then consider that many household appliances use microcontrollers that contain just a few kbytes of object code – and that the IEC60730 code will thus make up a very large fraction of the delivered code. On a simplistic statistical basis, we can assume that if 30% of the code in a product is related to IEC60730 compliance, then 30% of the bugs will be in that code. Given what the code has to do, my money is that the IEC60730 compliance code will have a much higher bug rate than the general application. Thus the probability of a failure occurring in the IEC60730 code is high – and someone will get hurt when the code fails.
As a parting thought, how exactly does one set about testing code that is designed to detect hardware failures internal to an integrated circuit. Although I’m sure I could come up some test protocols for some hardware, I suspect that the Heisenberg uncertainty principle will ensure that the very act of testing the test will result in a flawed test.