embedded software boot camp

Idling along, (or what to do in the idle task)

Sunday, April 14th, 2013 by Nigel Jones

If you are using an RTOS in your latest design then no doubt you have an idle task. (Most of the time, the idle task is explicit and is the user task with the lowest priority; sometimes it’s built into the RTOS). It’s been my experience that the idle task is an interesting beast. On the one hand it is what gets executed when there’s nothing else to do, and thus inherently contains nothing directly related the product. On the other hand it’s this wonderful resource that you can exploit to do all sorts of interesting things to improve your product without having to worry too much about it impacting the running of your product.

With that being said, here are some suggestions for what your idle task can do, starting with one thing it shouldn’t do.

Do Nothing

If your idle task consists of a do-nothing loop, then you are almost certainly missing an opportunity. Hopefully the suggestions below will serve to spark your creative juices.

Watchdog

In any RTOS based design, the idle task should play a role in the overall watchdog supervisor. Without going into a treatise on watchdog design, suffice it to say if the idle task isn’t being run frequently then you’ve got a problem. Thus if the idle task doesn’t feature in your watchdog supervisor then you are doing something wrong. Note that if the idle task is featured in your watchdog, then it doesn’t necessarily mean you are doing it right either!

Power Save

For power constrained systems, the idle task is usually the place to put the microprocessor to sleep and / or indulge in other power saving strategies. I often find that my idle tasks consist of some of the features described here, plus power save. In other words, the idle task takes care of some housekeeping and then takes a nap.

Load calculation

Used in conjunction with hardware timers and the task switch hook function, it’s normally possible to construct a system that gives a decent indication of both overall CPU load and also the CPU utilization of each of the tasks. The idle task isn’t a bad place to do all the calculations. I find this a very useful diagnostic aid as I’m developing a system. Once you are done using it as a development aid, with a bit more work it can be modified to be part of your overall watchdog strategy, in that it can provide useful information about how tasks are (mis)behaving.

Flash Check

Just about every embedded system I’ve looked at in the last decade or two performs a CRC check on program memory on start up. However, if you are designing a system that is safety critical and / or expected to run a long time between power cycles, then you should seriously consider running a Flash CRC check in the idle task. Because no writing of memory is involved and one is instead reading memory that is supposed to be constant, there are are no real race-conditions to worry about and thus there’s no need to be entering critical sections to perform the reads. Of course if you are using an MMU or MPU then things might get a little more challenging. Naturally such a challenge is nothing for a reader of your ability! [As an aside, one of my electrical engineering professors used to say to me, "Nigel, this is nothing for a man of your ability!" One is of course simultaneously flattered, irritated and motivated. I've never forgotten it.]

RAM Check

This is of the course the evil twin to the Flash check. However this time you need to perform both reads and writes from locations that are being used by higher priority tasks and interrupts. You can of course only do this safely by executing a suitable lock / unlock procedure on each RAM location. Now doing this in the idle task could seriously change your system’s response time, so you need to think very seriously about how to structure such a test. A good starting point is to do just one locked read / write per idle task invocation. Of course if that results in a 10 year RAM test on your system, then you’ll need to rethink the strategy.

If you have other good ideas for idle task work then please leave them in the comments.

9 Responses to “Idling along, (or what to do in the idle task)”

  1. Anonymous says:

    Nigel,

    Insightful post, as always. The idle task is often overlooked, ignored or misunderstood, all of which lead to it being (sadly) under-utilized.

    There are a few other things I often try to do in the idle task when possible. For one, monitoring other system resources, such as heap usage (for those daring souls using dynamic memory), queue high water marks, stack “warning track” marks(*), filesystem free space, etc…

    With many of these things, it’s often “too late” once you’ve gone over the cliff (blown stack, no free filesystem space, etc.) so it’s nice to detect & log when things are starting to go a little wonky. I know you’re a big proponent of logging; often times seeing the warning signs and correlating them with other logged activities and symptoms gives insight into the root cause of a problem.

    Speaking of logging, I often use the idle task to throw logging data out of the product (UART, Ethernet) when I don’t have better hardware support (deep FIFOs, buffer descriptors, DMA, etc.) There’s always a tradeoff between latency, memory usage for buffering, etc. but it can work pretty well. I believe Miro Samek’s QSPY tracing/logging component within the QP framework operates this way as well. (I’d put a link to Miro’s blog here on EG but I don’t know if HTML is supported.)

    Last point, then I’ll get back to the rest of Sunday’s work… you noted the RAM check and lock/unlock. I’ve seen this implemented incorrectly in so many different ways, but that’s another story. What I wanted to mention is that systems generally have a known “worst case” timing for critical sections – maybe it’s in the kernel, in a driver, etc. but as long as the critical section for the RAM checking is no longer than the current worst case, in theory you’re no worse off. Of course, this assumes a) the designer actually knows the worst case, b) has designed the system to handle the worst case, and c) that it’s not likely to decrease (then our idle loop RAM check would be the new worst case).

    Fun stuff. Now get out on the bike and “enjoy” a century, will ya?

    • Nigel Jones says:

      Some excellent suggestions. The logging one is a particularly good one, as it’s one of those things that just gives you so much visibility into what’s going on. Your comment on lock / unlock times is prescient as my next blog post is going to tackle this and the topic of systematically reading RAM in the idle task. Finally I was on my bike today. It’s always fun transferring from ones skiing legs to one riding legs! Hopefully it’ll stay cool and we’ll have a great cycling season with a century or two lined up.

  2. Tony Leigh says:

    Nigel,

    Another handy thing you can do, if you’ve got a spare LED on your board somewhere, is to toggle it each time the idle task is run. You’ll see longer periods of LED on or off when a task is hogging the processor. You might need to look at it on a scope if the idle task’s scheduled too frequently to see. I’ve used this several times to find problems by noting that the LED pattern changes when the system goes into a particular mode. Even after the system goes into production, it’s useful to leave it there, just to give users confidence it’s running.

  3. John Day says:

    Great points, especially the Flash self check! Coincidentally, we implemented that earlier this week.

    We often set up a flag register where every regularly occurring ISR (timer, ADC or regular communications) will set its individual flag whenever it successfully executes. The Flash self check routine can also set a flag in this register when it passes self check.

    The idle task reads the flag register and issues the Clear Watchdog function when all regular ISR execution flags are set and then clears the ISR execution flag register. This way, the watchdog covers not only the idle task loop, but every regular task execution. This has been very effective in recovering from I2C or other shared bus communication protocols. Many of our systems (8/16/32 bit PIC processors) don’t run a RTOS, but they employ the same idle loop strategy in the main loop.

  4. David Haile says:

    I agree with the last suggesting. Use the idle loop to update Status LED’s which may include a heartbeat ticker. It makes it immediately obvious if you’ve just tried something that is hogging CPU time because the LED blink rate slows down. Another way to do it is to update a status LED with a slow brightness ramp up and down continually to indicate a successful communication link. If the idle process isn’t running at a predictable rate, the brightness ramp up/down is uneven and it reminds me that whatever code change I just did is bogging down one of the tasks.

  5. bandit says:

    I like to have two of the lowest priority tasks:

    2nd lowest is the Command Line Interpreter (cli)

    Lowest – Idle task

    1. fill your stacks with a magic number when you fork them off. The idle task just runs thru them looking for the percentage of task usage. Alert for 90% stack usage

    2. general system health – power monitoring

    3. watchdog stroking

    4. write-thru EEPROM cache update

    5. output queued-up message buffers to the outside world (as previously mentioned)

    6. keep track of general system stats: memory pool usage, fifo queue usage, number of times particular events occur, etc

    7. maintain a “slime trail” in various tasks, ie particular waypoints in the code. the Idle task can print those out. Useful for postmortem, test coverage, etc

    8. special printf() that a time-critical task can write a buffer to a queue, and the idle task prints it out so you don’t change the timing on critical tasks – like the waypoint slime trail, but with more info.

  6. [...] filed under Algorithms, RTOS Multithreading. You can follow any responses to this entry through the RSS 2.0 [...]

  7. Another way to do it is to update a status LED with a slow brightness ramp up and down continually to indicate a successful communication link. If the idle process isn’t running at a predictable rate, the brightness ramp up/down is uneven and it reminds me that whatever code change

  8. Jerry Kaidor says:

    Back in the early 80′s, I worked on a system that extremely robust. Lives and property were at stake. The whole thing – hardware and software – was architected by one gifted and conscientious engineer ( no, not me ). We did not use an RTOS. Instead, we had a single main loop that called all the high-level stuff. A periodic interrupt of a few milliseconds that did the more real-time stuff, and individual interrupts for hardware that needed them.

    At the top of the main loop, we checked:
    * The stack position. Because there was no RTOS, this was ALWAYS THE SAME NUMBER. If it was different, log & RESET!
    * The stack contents. We filled the top N locations of stack with a magic number. If that magic number got changed,
    then the stack might be blown. Log & RESET!

    * The “busyiness” of the system. The loop was timed to 50ms. When we got to the bottom of the stuff that did anything,
    we would time for the remainder of the 50, and hit a pin that could be scoped.

    …And a bunch of other stuff that I forget. We were constantly doing integrity checks of the hardware & software. It got to the point that the system would fix itself so fast that we were having trouble finding bugs. So we put in a jumper to disable
    all the safeties so we could troubleshoot.

    The system was specced to run at 12MHz, but we ran it at 9 MHz, because that gave us extra robust timing. The hardware
    engineer had a CB radio with a rubber duckie antenna. He’d stick that rubber duckie into the chassis, transmit the 5 watts,
    and poke around the traces with a scope looking for 27MHz garbage. The backplane had a full ground plane on one side,
    the signal traces on the other, and grounded guard traces between all the signal traces. He also had a “noise generator” consisting of a relay that reset itself, plus IIRC a cap.

    Since we had no RTOS, all the code in that box was written by us, and could be verified. There was no magic.

    Everything I have worked on since has been crap in comparison.

Leave a Reply