embedded software boot camp

RTOS without blocking?

Monday, April 19th, 2010 by Miro Samek

In my previous post, “I hate RTOSes”, I have identified blocking as the main cause of the particular brittleness and inflexibility of the programs based on RTOSes. Here I’d like to discuss techniques of minimizing blocking and eradicating it completely from the application-level code. In other words, I’d like to show you how to use an RTOS for building responsive event-driven software.

For reasons I’ve outlined before, experienced RTOS users have learned to be weary of peppering the code with the blocking calls to the RTOS. So, even though every RTOS boasts a plethora of various communication and synchronization mechanisms (all of them based on blocking), advanced real-time developers intentionally limit their designs to just one generic blocking call per task, as shown in the following pseudocode:

void task_routine(void *arg) {
    while (1) {
        // block on any event designated for this task (generic)
        // process the event *without* further blocking (task specific)
    }
}

Most RTOSes provide mechanisms to wait for multiple events in a single blocking call, for example: event flags, message mailboxes, message queues, the select() call, condition variables, and many others. From all these possibilities, I’d like to single out the message queue, because it is the most generic and flexible mechanism. A message posted to a message queue not only unblocks any task that waits on the queue (synchronization), but the message can also contain any information associated with the event (interprocess communication). For example, a message from an analog-to-digital converter (ADC) can signal when the conversion has completed as well as the actual value of the conversion result.

The generic pseudocode of a task based on a message queue looks as follows:

void task_routine(void *arg) {
    while (1) { // main event loop of the task
        void *event = msg_queue_get(); // wait for event
        // process the event *without* further blocking (task specific)
    }
}

The most important premise of this event-loop design is that the task-specific code that processes the events obtained from the queue is not allowed to block. The event-processing code must execute quickly and return back to the event loop, so that the event loop can check for other events.

This design also automatically guarantees that each event is processed in run-to-completion (RTC) fashion. By design, the event loop must necessarily complete processing of the current event before looping back to obtain and process the next event. Also note that the need for queuing events is an immediate consequence of the RTC processing style. Queuing prevents losing events that arrive while the event-loop is executing an RTC step.

The event-loop pseudocode shown above is still task-specific, but it is quite easy to make it completely generic. As shown below, you can combine a message queue and an event-handler pointer-to-function in the TCB structure. A pointer to the TCB struct can be then passed to the task in the argument of the task routine (arg). This is quite easily achieved when the task is created.

typedef struct {
    MessageQueue queue;        // event queue associated with the task
    void (*handler)(void *event); // event handler pointer-to-function
} TCB;   // task control block

void task_routine(void *arg) {
    while (1) { // main event loop of the task
        void *event = msg_queue_get(((TCB *)arg)->queue); // wait for event
        (*((TCB *)arg)->handler)(event);// handle the event without blocking
    }
}

The last snippet of code is generic, meaning that this simple event-loop can be used for all tasks in you application. So at this point, you can consider the task_routine() function as part of the generic event-driven infrastructure for executing your applications, which consist of event-handler functions.

What this way of thinking gives you is quite significant, because in fact you have just created your first event-driven framework.

The distinction between a framework and a toolkit is simple. A toolkit, such as an RTOS, is essentially a collection of functions that you can call. When you use a toolkit, you write the main body of the application (such as all the task routines) and you call the various functions from the RTOS. When you use a framework, you reuse the main body (such as the task_routine() function) and you provide the code that the framework calls. In other words, a framework uses inverted control compared to a traditional RTOS.

Inversion of control is a very common phenomenon in all event-driven architectures, because it recolonizes the plain fact that the events are controlling the application, not the other way around.

In my next post in the “I hate RTOSes” series, I’ll talk about challenges of programming without blocking. I’ll explain what you need to sacrifice when you write non-blocking code and why this often leads to “spaghetti” code. Stay tuned!

4 Responses to “RTOS without blocking?”

  1. Yoquan says:

    Hi Miro,

    I just start working for nearly 2 years for a small company outsourcing embedded (mobile) software. I’ve done my Master degree in Quantum optics, and have been still in love with QM. That why I was so excited when found out you and your book. I also have a deep belief in the meaning of the “State” concept even in this old-fashion industry.

    I’m planning to learn your idea to my best and try to apply it for making a simple hypervisor. I’m not sure whether it is possible. But it’s a long road ahead of my career, I have a lot of time to try it 🙂

  2. Joe says:

    I’m looking forward to the next post in the series, hope you haven’t forgotten about it.

    The generic task_routine you present seems very elegant, but I’m curious to know how you handle the case when the event handler needs to query another task, i.e. if it cannot continue processing until it gets a reply to a message.
    It could return back to the main event loop to wait for the reply but then that would require saving state somewhere and adding another event type for the event handler to handle, which could easily get very messy.

    Do you have any tips for this kind of situation?

    • Miro Samek says:

      Thanks for the nudge. It’s about time for the next installment, in which I promised to talk about the challenges of programming without blocking. I plan to explain what you need to sacrifice when you write non-blocking code and why this often leads to “spaghetti” code.

      Thanks also for the question about the synchronous request-reply communication. In a traditional sequential programming this is the most basic communication style corresponding to a function call (or perhaps a remote procedure call–RPC). The caller simply blocks until the function returns. Seems clean, but it really sweeps tons of issues under the rug. The question, of course, is what happens between the call and the return.

      The event-driven programming paradigm without blocking exposes the in-between. (So yes, the originator of the request indeed returns to the main loop, as *always*.) Without blocking, you explicitly have to do something between an asynchronous request and the asynchronous reply. (I assume that the recipient of the asynchronous request knows that the reply is expected.) So, in real life, the producer of the asynchronous request transitions to a special “waiting-for-reply” state. The behavior in this state could be to ignore all events except the expected reply, but often you will find out that some events in-between should be actually handled. And this is exactly the beauty of event-driven programming. Because you don’t block, you actually *can* handle some events in-between.

      In contrast, when you program sequentially, you lose the whole blocked task for the whole duration of the RPC call and you end up inventing another task that is alive and can handle some critical events that arrive in-between. This is how the real mess begins, because now the second task needs to likely access the same data that the first task is operating on, so the whole issue of mutual exclusion comes into play. Mutual exclusion causes more blocking, and so the vicious cycle begins.

      • Peppe says:

        Hi Miro,

        I am a proud owner of your wonderful book, always planning to convince my superiors to try implementing our next project using QP.
        In the meantime, falling back to “experienced RTOS users”, where do they manage the state in such an architecture? In task_routine, right in the event handler or where else?

        Best regards,

        Peppe

Leave a Reply