At a rather fundamental level this blog is about how I do embedded systems. Implicit in a lot of the articles is the concept that I believe what I’m doing is ‘right’, or at least ‘better’. Well today I thought I’d write about something I got wrong (at least on the first pass).
This is the scenario. I’m currently working on an NXP LPC17xx ARM Cortex design. Like all modern processors, the LPC17xx has a number of sophisticated timers with all sorts of operating modes. Well it so happens that I am using four (out of a possible six) interrupt sources for one particular timer. The hardware architecture of the processor routes all of these interrupts to one vector and thus one interrupt handler. Here’s what I wrote:
void TMR3_IRQHandler(void) { if (T3IR_bit.MR0INT) { /* Do stuff */ } if (T3IR_bit.CR0INT) { /* Do stuff */ } if (T3IR_bit.MR1INT) { /* Do stuff */ } if (T3IR_bit.CR1INT) { /* Do stuff */ } T3IR = 0x3F; /* Acknowledge all interrupts */ ... }
Thus in the ISR I tested each of my interrupt sources, took the appropriate actions in the sections marked ‘Do stuff’, acknowledged the interrupts, did a bit of clean up and I was done. The ‘Do Stuff’ sections were quite complicated and so this was where I spent my time. Anyway having finished coding the ISR, I took a short break and came back to re-examine the code. As I was re-reading the code, I realized that I had made a classic mistake. In case you haven’t spotted it, the problem is in the line where I acknowledge all interrupts. Consider the following sequence of events:
- Interrupt source CR1INT is asserted and the CPU vectors to this ISR.
- I test the various interrupt flags and discover that CR1INT is set and do the requisite work.
- While I’m doing the requisite work, interrupt source MR1INT becomes active.
- I clear all interrupt sources (including MR1INT) and terminate the ISR
- As a result I have missed an interrupt.
The way this should have been coded is to acknowledge each interrupt bit individually. I.e. like this:
void TMR3_IRQHandler(void) { if (T3IR_bit.MR0INT) { /* Do stuff */ T3IR_bit.MR0INT = 1; /* Clear the interrupt */ } if (T3IR_bit.CR0INT) { /* Do stuff */ T3IR_bit.CR0INT = 1; /* Clear the interrupt */ } if (T3IR_bit.MR1INT) { /* Do stuff */ T3IR_bit.MR1INT = 1; /* Clear the interrupt */ } if (T3IR_bit.CR1INT) { /* Do stuff */ T3IR_bit.CR1INT = 1; /* Clear the interrupt */ } ... }
So how did this mistake come about? I think there were two culprits:
Mistake 1
The first mistake I made was in using another timer ISR as a template. The code I copied had just a single interrupt source, and thus acknowledging all of the sources was reasonable.
Mistake 2
I was too concerned with the ‘real work’ of the ISR. I should have written the ISR outline first and only then worried about the real work.
Notwithstanding the above, I did do one thing correctly – and that was to finish the code, walk away, and then come back to re-examine it. At no time did I reach for the debugger to test my code – which was just as well because quite frankly the chances of this bug being caught by testing are vanishingly small. Indeed just about the only way a bug like this would get caught is via code inspection – which is why I’m such a firm believer in code inspection as a debugging tool.
Anyway if you found this informative, you may find this account of another mistake I made equally enlightening.