<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for State Space</title>
	<atom:link href="http://embeddedgurus.com/state-space/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://embeddedgurus.com/state-space</link>
	<description>A Blog by Miro Samek</description>
	<lastBuildDate>Wed, 30 Nov 2011 07:55:32 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>Comment on On the Origin of Software by Means of Artificial Selection by Anders Eriksson</title>
		<link>http://embeddedgurus.com/state-space/2011/08/on-the-origin-of-software-by-means-of-artificial-selection/comment-page-1/#comment-15097</link>
		<dc:creator>Anders Eriksson</dc:creator>
		<pubDate>Wed, 30 Nov 2011 07:55:32 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=102#comment-15097</guid>
		<description>Hi Miro!

Have you tested also the Unity framwork?
in VC++ and some other embedded compiler?

I have problems with the macro&#039;s in the unity_fixture.h

The problem is the macro construction that generates a function declaration followed by the call to it, see code snippet below, strangely it compiles in GCC

TEST_GROUP_RUNNER(LedDriver)
{
// RUN_TEST_CASE(LedDriver, LedsOffAfterCreate); 
void TEST_LedDriver_LedsOffAfterCreate_run();
 TEST_LedDriver_LedsOffAfterCreate_run();

// RUN_TEST_CASE(LedDriver, TurnOnLedOne); 
void TEST_LedDriver_TurnOnLedOne_run();  // this is row 30 that the compiler reports an error for
TEST_LedDriver_TurnOnLedOne_run();
}

—-—-- compile error in VC++ 2008 express --—-
code/unity/LedDriver/LedDriverTestRunner.c(30) : error C2143: syntax error : missing ’;’ before ‘type’
-—-—-—-—-—-—-—-—-—-—-—-</description>
		<content:encoded><![CDATA[<p>Hi Miro!</p>
<p>Have you tested also the Unity framwork?<br />
in VC++ and some other embedded compiler?</p>
<p>I have problems with the macro&#8217;s in the unity_fixture.h</p>
<p>The problem is the macro construction that generates a function declaration followed by the call to it, see code snippet below, strangely it compiles in GCC</p>
<p>TEST_GROUP_RUNNER(LedDriver)<br />
{<br />
// RUN_TEST_CASE(LedDriver, LedsOffAfterCreate);<br />
void TEST_LedDriver_LedsOffAfterCreate_run();<br />
 TEST_LedDriver_LedsOffAfterCreate_run();</p>
<p>// RUN_TEST_CASE(LedDriver, TurnOnLedOne);<br />
void TEST_LedDriver_TurnOnLedOne_run();  // this is row 30 that the compiler reports an error for<br />
TEST_LedDriver_TurnOnLedOne_run();<br />
}</p>
<p>—-—&#8211; compile error in VC++ 2008 express &#8211;—-<br />
code/unity/LedDriver/LedDriverTestRunner.c(30) : error C2143: syntax error : missing ’;’ before ‘type’<br />
-—-—-—-—-—-—-—-—-—-—-—-</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Protothreads versus State Machines by Uri</title>
		<link>http://embeddedgurus.com/state-space/2011/06/protothreads-versus-state-machines/comment-page-1/#comment-13098</link>
		<dc:creator>Uri</dc:creator>
		<pubDate>Tue, 18 Oct 2011 18:09:37 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=92#comment-13098</guid>
		<description>Hi Miro,

I want an automatic tool that could convert a nested code such kind of:

(C / C++)

if (q1 &amp;&amp; q2 &#124;&#124; ....)
{
	do_1();
	if (q3 ...)
	{
		do_3();
		if (q4 ...)
		{
			do_4();
			...
		}
		else
		{
			do_e4();
			...
		}
	else
	{
		do_e3();
		...
	}
}	

To the type of linear code:

if (q1 &amp;&amp; q2 &#124;&#124; ....)
	do_1();

if ((q1 &amp;&amp; q2 &#124;&#124; ....) &amp;&amp; q3)
	do_3();	

if ((q1 &amp;&amp; q2 &#124;&#124; ....) &amp;&amp; q3 &amp;&amp; q4)
		do_4();
	
if ((q1 &amp;&amp; q2 &#124;&#124; ....) &amp;&amp; q3 &amp;&amp; !q4)
		do_e4();
	
if ((q1 &amp;&amp; q2 &#124;&#124; ....) &amp;&amp; !q3)
	do_e3();	
	
Do you know of such a tool?

Thank you

Uri.</description>
		<content:encoded><![CDATA[<p>Hi Miro,</p>
<p>I want an automatic tool that could convert a nested code such kind of:</p>
<p>(C / C++)</p>
<p>if (q1 &amp;&amp; q2 || &#8230;.)<br />
{<br />
	do_1();<br />
	if (q3 &#8230;)<br />
	{<br />
		do_3();<br />
		if (q4 &#8230;)<br />
		{<br />
			do_4();<br />
			&#8230;<br />
		}<br />
		else<br />
		{<br />
			do_e4();<br />
			&#8230;<br />
		}<br />
	else<br />
	{<br />
		do_e3();<br />
		&#8230;<br />
	}<br />
}	</p>
<p>To the type of linear code:</p>
<p>if (q1 &amp;&amp; q2 || &#8230;.)<br />
	do_1();</p>
<p>if ((q1 &amp;&amp; q2 || &#8230;.) &amp;&amp; q3)<br />
	do_3();	</p>
<p>if ((q1 &amp;&amp; q2 || &#8230;.) &amp;&amp; q3 &amp;&amp; q4)<br />
		do_4();</p>
<p>if ((q1 &amp;&amp; q2 || &#8230;.) &amp;&amp; q3 &amp;&amp; !q4)<br />
		do_e4();</p>
<p>if ((q1 &amp;&amp; q2 || &#8230;.) &amp;&amp; !q3)<br />
	do_e3();	</p>
<p>Do you know of such a tool?</p>
<p>Thank you</p>
<p>Uri.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by 42Bastian</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12609</link>
		<dc:creator>42Bastian</dc:creator>
		<pubDate>Tue, 04 Oct 2011 10:53:52 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12609</guid>
		<description>Miro,

just a side-note: Locking interrupts on ColdFire is very costly: The lock/unlock pair needs up to 15cycle (including preserving the old state).

Cheers,
42Bastian</description>
		<content:encoded><![CDATA[<p>Miro,</p>
<p>just a side-note: Locking interrupts on ColdFire is very costly: The lock/unlock pair needs up to 15cycle (including preserving the old state).</p>
<p>Cheers,<br />
42Bastian</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Paul Kimelman</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12596</link>
		<dc:creator>Paul Kimelman</dc:creator>
		<pubDate>Tue, 04 Oct 2011 00:21:16 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12596</guid>
		<description>I am very well aware of a run to completion kernel model.
You keep missing the point that yes PendSV is sitting on the current_task&#039;s frame (registers), but if you return from PendSV, it is ***gone***.
All of this is about creating a new task ***stacked*** over an existing task (due to priority) . Like I said, we can look at 3 scenarios and maybe it will be clearer:
1. Start a task and when it finishes, start another. This could be chained by the end of the task (so never goes to exception) or can be by PendSV or SVCall. If either of those,  just returns into new task.
2. Task is running, but interrupt comes in and readies higher pri task. This is the case we are talking about. You need to create a new frame so you can return via it (so original frame is left in place to preserve running task). I am not sure why you are not getting this. This is what I showed before with representation of stack state. If you just returned, you would go back to the task that was running. If you modified its PC on frame, then you would lose running task&#039;s registers!
3. Higher pri task is running stacked over another (due to pri) and it finishes. The code needs to determine if another high pri task should run or if it should return to original context.

My point is that you can use a variety of approaches to handle all 3 cases cleanly.   The two obvious choices are:
- You have a &quot;launcher&quot; running at task level. 
- You do via handlers as we have been discussing.

A launcher looks like:
void Launcher(void) {
  while (top_ready.function)  // a task is ready to run, so run it
    top_ready.function(); // run to completion. When done see if more to run
  return; // pop back to SVC instruction (LR points to SVC when PendSV created frame)
}

This handles back to back, nested back to back, and so on. When nested stack finishes and no other high pri task, it returns to allow &quot;frame&quot; to be popped by task underneath. You *could* do this by manually popping yourself from return link code instead of SVC. 
If nothing to run, then you can use SLEEP_ON_EXIT. If you do not want to do that, you can have launcher never leave if lowest:
void Launcher(void) {
  while (top_ready.function  // a task is ready to run, so run it
             &#124;&#124; nothing_stacked)
    if (top_ready.function)
      top_ready.function(); // run to completion.
    else // no nested tasks, so run idle task
      idle_task(); // sleeps or whatever you do
  return; // pop back to SVC instruction (LR points to SVC when PendSV created frame)
}</description>
		<content:encoded><![CDATA[<p>I am very well aware of a run to completion kernel model.<br />
You keep missing the point that yes PendSV is sitting on the current_task&#8217;s frame (registers), but if you return from PendSV, it is ***gone***.<br />
All of this is about creating a new task ***stacked*** over an existing task (due to priority) . Like I said, we can look at 3 scenarios and maybe it will be clearer:<br />
1. Start a task and when it finishes, start another. This could be chained by the end of the task (so never goes to exception) or can be by PendSV or SVCall. If either of those,  just returns into new task.<br />
2. Task is running, but interrupt comes in and readies higher pri task. This is the case we are talking about. You need to create a new frame so you can return via it (so original frame is left in place to preserve running task). I am not sure why you are not getting this. This is what I showed before with representation of stack state. If you just returned, you would go back to the task that was running. If you modified its PC on frame, then you would lose running task&#8217;s registers!<br />
3. Higher pri task is running stacked over another (due to pri) and it finishes. The code needs to determine if another high pri task should run or if it should return to original context.</p>
<p>My point is that you can use a variety of approaches to handle all 3 cases cleanly.   The two obvious choices are:<br />
- You have a &#8220;launcher&#8221; running at task level.<br />
- You do via handlers as we have been discussing.</p>
<p>A launcher looks like:<br />
void Launcher(void) {<br />
  while (top_ready.function)  // a task is ready to run, so run it<br />
    top_ready.function(); // run to completion. When done see if more to run<br />
  return; // pop back to SVC instruction (LR points to SVC when PendSV created frame)<br />
}</p>
<p>This handles back to back, nested back to back, and so on. When nested stack finishes and no other high pri task, it returns to allow &#8220;frame&#8221; to be popped by task underneath. You *could* do this by manually popping yourself from return link code instead of SVC.<br />
If nothing to run, then you can use SLEEP_ON_EXIT. If you do not want to do that, you can have launcher never leave if lowest:<br />
void Launcher(void) {<br />
  while (top_ready.function  // a task is ready to run, so run it<br />
             || nothing_stacked)<br />
    if (top_ready.function)<br />
      top_ready.function(); // run to completion.<br />
    else // no nested tasks, so run idle task<br />
      idle_task(); // sleeps or whatever you do<br />
  return; // pop back to SVC instruction (LR points to SVC when PendSV created frame)<br />
}</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Miro Samek</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12592</link>
		<dc:creator>Miro Samek</dc:creator>
		<pubDate>Mon, 03 Oct 2011 21:49:43 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12592</guid>
		<description>Thank you for the reply. But, now I&#039;m not sure whether we are on the same page regarding the structure of a run-to-completion kernel. I can&#039;t describe this kernel any better than I already did in the aforementioned article in Embedded Systems Design &quot;Build a Super Simple Tasker&quot; (http://www.eetimes.com/design/embedded/4025691/Build-a-Super-Simple-Tasker). But if you have no time to take a look, here is the slightly simplified code of the scheduler:

&lt;code&gt;
void QK_schedule_(void) {       /* entered with interrupts locked!!! */
....uint8_t pin = QK_currPrio_;      /* the initial QK-nano priority */
....uint8_t p;                      /* highest-priority ready to run */

....while ((p = log2Lkup[QF_readySet_]) &gt; pin) { /* above threshold? */
........QActive *a;
........QK_currPrio_ = p;             /* update the current priority */
........QF_INT_UNLOCK();      /* it&#039;s safe to leave critical section */
            
........a = (QActive *)QF_active[p].act; /* map prio. to active obj. */
........e = QEQueue_get(&amp;a-&gt;queue);       /* obtain the event */
........QHsm_dispatch((QHsm *)a, e);    /* dispatch to state machine */

........QF_INT_LOCK();      /* lock interrupts for next loop or exit */
....}
....QK_currPrio_ = pin;              /* restore the initial priority */
}                     /* scheduler entered with interrupts locked!!! */
&lt;/code&gt;
My main point is that launching a task is just a simple function call and *no* additional registers need to be saved above and beyond what the C compiler already does. Quite specifically, I really don&#039;t need to save the registers clobbered by the APCS (r0-r3,r12,lr) to launch a task. The only time I care for these registers is when an exception preempts a task, but Cortex-M does this automatically for me. So, by the time I&#039;m inside PendSV, I already sit on top of the exception stack frame that preserves the APCS-clobbered registers for the preempted task. Any additional saving of these APCS-clobbered registers would be saving them *twice*, which is harmless, but incurs cost both in CPU time and stack space. I hope you agree that it would be nice to avoid this unnecessary overhead. 

So, do other processors handle this better than Cortex-M? I think so. Looking only at CPUs that can prioritize interrupts, Coldfire or M16C require just one assembly instruction to drop the IPL (interrupt priority level) to the task. Here is an example ISR for M16C:

&lt;code&gt;
#pragma INTERRUPT ta0_isr (vect = 21)     /* system clock tick ISR */ 
void ta0_isr(void) { 
....++QK_intNest_;              /* inform QK about entering in ISR */
...._asm(&quot;FSET I&quot;);                       /* unlock the interrupts */

....QF_tick();                                   /* ISR processing */ 
..../* perform other ISR work . . . */
 
...._asm(&quot;LDC #0,FLG&quot;);        /* lock interrupts and set IPL to 0 */ 
....--QK_intNest_; 
....if (QK_intNest_ == (uint8_t)0) {     /* last nested interrupt? */ 
........QK_schedule_();                  /* handle the preemptions */ 
....}
}
&lt;/code&gt;
I&#039;d like to achieve similar performance on Cortex-M. Is it possible?</description>
		<content:encoded><![CDATA[<p>Thank you for the reply. But, now I&#8217;m not sure whether we are on the same page regarding the structure of a run-to-completion kernel. I can&#8217;t describe this kernel any better than I already did in the aforementioned article in Embedded Systems Design &#8220;Build a Super Simple Tasker&#8221; (<a href="http://www.eetimes.com/design/embedded/4025691/Build-a-Super-Simple-Tasker" rel="nofollow">http://www.eetimes.com/design/embedded/4025691/Build-a-Super-Simple-Tasker</a>). But if you have no time to take a look, here is the slightly simplified code of the scheduler:</p>
<p><code><br />
void QK_schedule_(void) {       /* entered with interrupts locked!!! */<br />
....uint8_t pin = QK_currPrio_;      /* the initial QK-nano priority */<br />
....uint8_t p;                      /* highest-priority ready to run */</p>
<p>....while ((p = log2Lkup[QF_readySet_]) &gt; pin) { /* above threshold? */<br />
........QActive *a;<br />
........QK_currPrio_ = p;             /* update the current priority */<br />
........QF_INT_UNLOCK();      /* it's safe to leave critical section */</p>
<p>........a = (QActive *)QF_active[p].act; /* map prio. to active obj. */<br />
........e = QEQueue_get(&amp;a-&gt;queue);       /* obtain the event */<br />
........QHsm_dispatch((QHsm *)a, e);    /* dispatch to state machine */</p>
<p>........QF_INT_LOCK();      /* lock interrupts for next loop or exit */<br />
....}<br />
....QK_currPrio_ = pin;              /* restore the initial priority */<br />
}                     /* scheduler entered with interrupts locked!!! */<br />
</code><br />
My main point is that launching a task is just a simple function call and *no* additional registers need to be saved above and beyond what the C compiler already does. Quite specifically, I really don&#8217;t need to save the registers clobbered by the APCS (r0-r3,r12,lr) to launch a task. The only time I care for these registers is when an exception preempts a task, but Cortex-M does this automatically for me. So, by the time I&#8217;m inside PendSV, I already sit on top of the exception stack frame that preserves the APCS-clobbered registers for the preempted task. Any additional saving of these APCS-clobbered registers would be saving them *twice*, which is harmless, but incurs cost both in CPU time and stack space. I hope you agree that it would be nice to avoid this unnecessary overhead. </p>
<p>So, do other processors handle this better than Cortex-M? I think so. Looking only at CPUs that can prioritize interrupts, Coldfire or M16C require just one assembly instruction to drop the IPL (interrupt priority level) to the task. Here is an example ISR for M16C:</p>
<p><code><br />
#pragma INTERRUPT ta0_isr (vect = 21)     /* system clock tick ISR */<br />
void ta0_isr(void) {<br />
....++QK_intNest_;              /* inform QK about entering in ISR */<br />
...._asm("FSET I");                       /* unlock the interrupts */</p>
<p>....QF_tick();                                   /* ISR processing */<br />
..../* perform other ISR work . . . */</p>
<p>...._asm("LDC #0,FLG");        /* lock interrupts and set IPL to 0 */<br />
....--QK_intNest_;<br />
....if (QK_intNest_ == (uint8_t)0) {     /* last nested interrupt? */<br />
........QK_schedule_();                  /* handle the preemptions */<br />
....}<br />
}<br />
</code><br />
I&#8217;d like to achieve similar performance on Cortex-M. Is it possible?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Paul Kimelman</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12562</link>
		<dc:creator>Paul Kimelman</dc:creator>
		<pubDate>Mon, 03 Oct 2011 00:56:44 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12562</guid>
		<description>I will start with your last question. I said that the bug is that your code cannot tolerate the PendSV running twice. This is because you really do half of the PendSV work in the task and your PendSV code does not know or check if the latter part has finished. This can be fixed in one of two ways: either you mark the state so you know that the PendSV has run (and created a new stack frame) or you do what you do and clear PendSV once interrupts are disabled. Your code crashed because it made two fake frames (from PendSV running twice) and when the 1st finished, the return to &quot;original&quot; task went into &quot;scheduler&quot; instead which blew up.

Now, I want to be clear that I do not agree with your comment about other processors. If you need to create a new task stacked over an old task, you have to have saved enough registers of the original task to ensure it can return safely. This means that you manually did what the Cortex-M core is doing for you - either the ISRs do it in their prologue or you do it in code or they go into some sort of shadow regs. If you want nested interrupts, all of those end up having to deal with that to. But, no matter method is used, if you are creating a new task, enough regs have to be preserved of the current task - you cannot escape that. For many processors you pay for pop and then repush to do this. For Cortex-M, you get the savings on average and certainly for this case (old task&#039;s regs are on the frame for you, which is why you create a new one).
Creating a new frame for the new task is not really a waste - you always had to set the new PC and a return link of some sort. The fact that you do not pass any args means it is just stack math of 32 bytes but not pushing. The popping does happen and could be considered a waste, but this still averages out as a win over all; yes you can cheat this (ACT bit), but do you really need to? It is like people who refuse to use an RTOS because they can save a few cycles here and there. The point is that the new frame is really about preserving the old frame for the current task - invert it and think of it as saving the context of the old task and it seems OK.
So, you can save the SVC trick if your tasks are all supervisor and you really want to. Just pop the context back from the save frame vs. using the machinery of return link. Again, you can cheat this (by setting ACT bit and returning to do the pop), but does it really buy you enough?
If you really need more of the methods of other processors, there is another way. Your PendSV does not change the frame at all. Instead it puts the original frame PC (current task) in a variable (e.g. current_task_PC) and replaces with a &quot;new_task_create&quot; function which is like your schedule function. It immediately pushes R0-R3, R12, LR, current_task_PC, and xPSR - basically just what the hardware does on exception. Then, it calls the new task. When it returns, the new_task_create function 1st checks if another higher pri task to run above the stacked one and if not, just pops those regs and so returns into the old task. That is how you did it on other processors. Not sure you save much in time since the extra instructions are used. But, if this would make you happier you can do it.
My point about using the scheme I said is that it is cheap and easy and supports user tasks if you want them. The overhead is there (an extra push set (SVC) and partly an extra pop (from PendSV return to new task) for a new task) but it is relatively small and no extra cost if another higher pri task waiting. Also, no cost if task runs to completion and a new task is started (since SVC frame is new tasks frame). That is, this only overhead is when a task pre-empts another.
OK?</description>
		<content:encoded><![CDATA[<p>I will start with your last question. I said that the bug is that your code cannot tolerate the PendSV running twice. This is because you really do half of the PendSV work in the task and your PendSV code does not know or check if the latter part has finished. This can be fixed in one of two ways: either you mark the state so you know that the PendSV has run (and created a new stack frame) or you do what you do and clear PendSV once interrupts are disabled. Your code crashed because it made two fake frames (from PendSV running twice) and when the 1st finished, the return to &#8220;original&#8221; task went into &#8220;scheduler&#8221; instead which blew up.</p>
<p>Now, I want to be clear that I do not agree with your comment about other processors. If you need to create a new task stacked over an old task, you have to have saved enough registers of the original task to ensure it can return safely. This means that you manually did what the Cortex-M core is doing for you &#8211; either the ISRs do it in their prologue or you do it in code or they go into some sort of shadow regs. If you want nested interrupts, all of those end up having to deal with that to. But, no matter method is used, if you are creating a new task, enough regs have to be preserved of the current task &#8211; you cannot escape that. For many processors you pay for pop and then repush to do this. For Cortex-M, you get the savings on average and certainly for this case (old task&#8217;s regs are on the frame for you, which is why you create a new one).<br />
Creating a new frame for the new task is not really a waste &#8211; you always had to set the new PC and a return link of some sort. The fact that you do not pass any args means it is just stack math of 32 bytes but not pushing. The popping does happen and could be considered a waste, but this still averages out as a win over all; yes you can cheat this (ACT bit), but do you really need to? It is like people who refuse to use an RTOS because they can save a few cycles here and there. The point is that the new frame is really about preserving the old frame for the current task &#8211; invert it and think of it as saving the context of the old task and it seems OK.<br />
So, you can save the SVC trick if your tasks are all supervisor and you really want to. Just pop the context back from the save frame vs. using the machinery of return link. Again, you can cheat this (by setting ACT bit and returning to do the pop), but does it really buy you enough?<br />
If you really need more of the methods of other processors, there is another way. Your PendSV does not change the frame at all. Instead it puts the original frame PC (current task) in a variable (e.g. current_task_PC) and replaces with a &#8220;new_task_create&#8221; function which is like your schedule function. It immediately pushes R0-R3, R12, LR, current_task_PC, and xPSR &#8211; basically just what the hardware does on exception. Then, it calls the new task. When it returns, the new_task_create function 1st checks if another higher pri task to run above the stacked one and if not, just pops those regs and so returns into the old task. That is how you did it on other processors. Not sure you save much in time since the extra instructions are used. But, if this would make you happier you can do it.<br />
My point about using the scheme I said is that it is cheap and easy and supports user tasks if you want them. The overhead is there (an extra push set (SVC) and partly an extra pop (from PendSV return to new task) for a new task) but it is relatively small and no extra cost if another higher pri task waiting. Also, no cost if task runs to completion and a new task is started (since SVC frame is new tasks frame). That is, this only overhead is when a task pre-empts another.<br />
OK?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Miro Samek</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12552</link>
		<dc:creator>Miro Samek</dc:creator>
		<pubDate>Sun, 02 Oct 2011 17:50:24 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12552</guid>
		<description>@Paul Kimelman: Thank you for taking a look at my code. It&#039;s not every day that one has an opportunity to pick the brains of the engineer who actually designed the Cortex-M core.

I guess, I now better understand your suggestions for re-designing my current implementation of the RTC kernel for Cortex-M. I also recognize some of your suggestions in the code contributed by @42Bastian.

It seems to me that to implement your recommendations I would need to re-partition the kernel completely for the Cortex-M core, so any portability to other processors would be essentially lost. Luckily, an RTC kernel is by nature so small, that the scheduler could be written entirely in assembly, if need be. In the process of creating a CM-specific RTC kernel I can also make use of bit-banding and other goodies available only in CM.

But before I move on, I&#039;d like to make sure that this is really the best we can do for an RTC kernel on Cortex-M (?) From the aesthetic point of view I find the RTC kernel design (both mine or any of the proposed ones) rather unpleasing. I mean, the tail-chaining to PendSV is brilliant. But once inside the PendSV, the stack is already perfectly set up to either launch a higher-level tasks or return to the preempted task. This is so, because an RTC kernel works with the machine&#039;s *natural* stack protocol. So, all one needs to do is to tell the NVIC to drop to the task level (this is what I mean by a generically understood EOI command) and after this, to exception-return to the preempted task. On most processors this takes only a few machine instructions and *no* stack manipulation.

Unfortunately, on Cortex-M this &quot;telling it to the NVIC&quot; takes pushing and popping two exception stack frames (which also wastes 32-bytes of stack) as well as several assembly instructions. I find it aesthetically unpleasing, because all this stack manipulations accomplish exactly nothing. They must accomplish nothing, because the stack *is* already set-up correctly before all the pushing and popping exception stack frames.

I&#039;d greatly appreciate any comments. My ultimate goal is to come up with the simplest possible RTC kernel implementation on Cortex-M. I just don&#039;t want to sweat the little details (like bit-banding) while losing many more clock cycles and stack space on pushing and popping exception stack frames.

Finally, before moving on to re-implementing RTC kernel for Cortex-M,  I&#039;d really like to understand the failure mode illustrated in the trace discussed in my original blog. Please correct me if I misrepresent your diagnosis, but you essentially suggested that the Hard Fault at the end of the trace is due to normal preemption of the PendSV by a higher-level interrupt, which has set the PENDSVSET bit. I tested this scenario several times (with different settings of the kernel&#039;s ready-set and the current priority ceiling) in a debugger. My tests were done as follows. I&#039;ve set a breakpoint on the first instruction of PendSV. As soon as the breakpoint was hit, I&#039;ve removed it and placed it on the very next instruction. I&#039;ve also triggered an specifically instrumented interrupt (by manually writing to the PEND bit in the debugger). I than hit &quot;go&quot; in the debugger and watched the preemptions. The point is that while I could reproduce every instruction in the original trace, I could *not* reproduce the Hard Fault. The code handled the preemption (including setting the PENDSVSET bit) in the PendSV handler itself correctly every time.

Maybe, as you say, the provided hardware trace is insufficient to provide an accurate diagnosis. However, it seems to me that the one thing I could&#039;n test in a single-step debugger is the dynamic condition of the late-arrival scenario. So, I&#039;m left to believe that something is different with late-arrival. I speculate further that my use of SVCall is also implicated. Other kernels simply don&#039;t do this. Using a single PendSV exception with global variables for directing the flow of control would most likely mask the problem. Again, I would appreciate any comments or suggestions what else can be done to get to the bottom of this.</description>
		<content:encoded><![CDATA[<p>@Paul Kimelman: Thank you for taking a look at my code. It&#8217;s not every day that one has an opportunity to pick the brains of the engineer who actually designed the Cortex-M core.</p>
<p>I guess, I now better understand your suggestions for re-designing my current implementation of the RTC kernel for Cortex-M. I also recognize some of your suggestions in the code contributed by @42Bastian.</p>
<p>It seems to me that to implement your recommendations I would need to re-partition the kernel completely for the Cortex-M core, so any portability to other processors would be essentially lost. Luckily, an RTC kernel is by nature so small, that the scheduler could be written entirely in assembly, if need be. In the process of creating a CM-specific RTC kernel I can also make use of bit-banding and other goodies available only in CM.</p>
<p>But before I move on, I&#8217;d like to make sure that this is really the best we can do for an RTC kernel on Cortex-M (?) From the aesthetic point of view I find the RTC kernel design (both mine or any of the proposed ones) rather unpleasing. I mean, the tail-chaining to PendSV is brilliant. But once inside the PendSV, the stack is already perfectly set up to either launch a higher-level tasks or return to the preempted task. This is so, because an RTC kernel works with the machine&#8217;s *natural* stack protocol. So, all one needs to do is to tell the NVIC to drop to the task level (this is what I mean by a generically understood EOI command) and after this, to exception-return to the preempted task. On most processors this takes only a few machine instructions and *no* stack manipulation.</p>
<p>Unfortunately, on Cortex-M this &#8220;telling it to the NVIC&#8221; takes pushing and popping two exception stack frames (which also wastes 32-bytes of stack) as well as several assembly instructions. I find it aesthetically unpleasing, because all this stack manipulations accomplish exactly nothing. They must accomplish nothing, because the stack *is* already set-up correctly before all the pushing and popping exception stack frames.</p>
<p>I&#8217;d greatly appreciate any comments. My ultimate goal is to come up with the simplest possible RTC kernel implementation on Cortex-M. I just don&#8217;t want to sweat the little details (like bit-banding) while losing many more clock cycles and stack space on pushing and popping exception stack frames.</p>
<p>Finally, before moving on to re-implementing RTC kernel for Cortex-M,  I&#8217;d really like to understand the failure mode illustrated in the trace discussed in my original blog. Please correct me if I misrepresent your diagnosis, but you essentially suggested that the Hard Fault at the end of the trace is due to normal preemption of the PendSV by a higher-level interrupt, which has set the PENDSVSET bit. I tested this scenario several times (with different settings of the kernel&#8217;s ready-set and the current priority ceiling) in a debugger. My tests were done as follows. I&#8217;ve set a breakpoint on the first instruction of PendSV. As soon as the breakpoint was hit, I&#8217;ve removed it and placed it on the very next instruction. I&#8217;ve also triggered an specifically instrumented interrupt (by manually writing to the PEND bit in the debugger). I than hit &#8220;go&#8221; in the debugger and watched the preemptions. The point is that while I could reproduce every instruction in the original trace, I could *not* reproduce the Hard Fault. The code handled the preemption (including setting the PENDSVSET bit) in the PendSV handler itself correctly every time.</p>
<p>Maybe, as you say, the provided hardware trace is insufficient to provide an accurate diagnosis. However, it seems to me that the one thing I could&#8217;n test in a single-step debugger is the dynamic condition of the late-arrival scenario. So, I&#8217;m left to believe that something is different with late-arrival. I speculate further that my use of SVCall is also implicated. Other kernels simply don&#8217;t do this. Using a single PendSV exception with global variables for directing the flow of control would most likely mask the problem. Again, I would appreciate any comments or suggestions what else can be done to get to the bottom of this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Rapid Prototyping with QP and Arduino by ดูดวง</title>
		<link>http://embeddedgurus.com/state-space/2011/02/rapid-prototyping-with-qp-and-arduino/comment-page-1/#comment-12447</link>
		<dc:creator>ดูดวง</dc:creator>
		<pubDate>Fri, 30 Sep 2011 02:57:40 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=62#comment-12447</guid>
		<description>Wonderful blogs! I love seeing this blog. Im a long time reader of your blog and always enjoy it!</description>
		<content:encoded><![CDATA[<p>Wonderful blogs! I love seeing this blog. Im a long time reader of your blog and always enjoy it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Paul Kimelman</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12441</link>
		<dc:creator>Paul Kimelman</dc:creator>
		<pubDate>Thu, 29 Sep 2011 23:55:09 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12441</guid>
		<description>Yes, I carry around eval boards too, but I try not to admit it in polite company ;-)

By the way, I agree that bit-band can be used to get around some critical sections. I wrote a small kernel back in 2005 on CM3 (on FPGA - no Si) that used bitband for sleep and ready &quot;lists&quot; and used CLZ to find the highest pri task in one instruction. I had ISRs write to a bitbanded &quot;wake list&quot; and used PendSV after (so no race possible). It did mean each task had a separate priority (no round robin) and if you wanted more than 32 tasks, it used a simple two level version of this (bit banding for the directory and then page) but was really designed for 32 or less.
I used LDREX/STREX to handle non-blocking and non-locking queues to send data between ISRs and Tasks so no critical data issues there either. Purpose was to show that you could build a powerful kernel which had no critical sections at all.</description>
		<content:encoded><![CDATA[<p>Yes, I carry around eval boards too, but I try not to admit it in polite company <img src='http://embeddedgurus.com/state-space/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>By the way, I agree that bit-band can be used to get around some critical sections. I wrote a small kernel back in 2005 on CM3 (on FPGA &#8211; no Si) that used bitband for sleep and ready &#8220;lists&#8221; and used CLZ to find the highest pri task in one instruction. I had ISRs write to a bitbanded &#8220;wake list&#8221; and used PendSV after (so no race possible). It did mean each task had a separate priority (no round robin) and if you wanted more than 32 tasks, it used a simple two level version of this (bit banding for the directory and then page) but was really designed for 32 or less.<br />
I used LDREX/STREX to handle non-blocking and non-locking queues to send data between ISRs and Tasks so no critical data issues there either. Purpose was to show that you could build a powerful kernel which had no critical sections at all.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s the state of your Cortex? by Paul Kimelman</title>
		<link>http://embeddedgurus.com/state-space/2011/09/whats-the-state-of-your-cortex/comment-page-1/#comment-12439</link>
		<dc:creator>Paul Kimelman</dc:creator>
		<pubDate>Thu, 29 Sep 2011 22:33:35 +0000</pubDate>
		<guid isPermaLink="false">http://embeddedgurus.com/state-space/?p=118#comment-12439</guid>
		<description>I should also note that I prefer critical sections to be as small as possible to avoid latency on interrupts. Further, I was suggesting that PendSV could return directly into the new task vs. your schedule/QK_schedule scheme. I understand your point about C - I did everything I could to make Cortex-M (ARMv7-M) as C friendly as possible. But, I am not sure that the few extra instructions to extract the function start and cleanup the ready list are that big a deal and it would keep the critical section as small as possible.
I also continue to suggest use of BASEPRI (and BASEPRI_MAX) vs. CPS.</description>
		<content:encoded><![CDATA[<p>I should also note that I prefer critical sections to be as small as possible to avoid latency on interrupts. Further, I was suggesting that PendSV could return directly into the new task vs. your schedule/QK_schedule scheme. I understand your point about C &#8211; I did everything I could to make Cortex-M (ARMv7-M) as C friendly as possible. But, I am not sure that the few extra instructions to extract the function start and cleanup the ready list are that big a deal and it would keep the critical section as small as possible.<br />
I also continue to suggest use of BASEPRI (and BASEPRI_MAX) vs. CPS.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

