<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Barr Code</title>
	<atom:link href="http://embeddedgurus.com/barr-code/feed/" rel="self" type="application/rss+xml" />
	<link>http://embeddedgurus.com/barr-code</link>
	<description>A Blog by Michael Barr</description>
	<lastBuildDate>Thu, 11 Mar 2010 20:08:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Firmware-Specific Bug #4: Stack Overflow</title>
		<link>http://embeddedgurus.com/barr-code/2010/03/firmware-specific-bug-4-stack-overflow/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/03/firmware-specific-bug-4-stack-overflow/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 19:52:51 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=334</guid>
		<description><![CDATA[Every programmer knows that a stack overflow is a Very Bad Thing™.  The effect of each stack overflow varies, though.  The nature of the damage and the timing of the misbehavior depend entirely on which data or instructions are clobbered and how they are used.  Importantly, the length of time between a [...]]]></description>
			<content:encoded><![CDATA[<p>Every programmer knows that a stack overflow is a Very Bad Thing™.  The effect of each stack overflow varies, though.  The nature of the damage and the timing of the misbehavior depend entirely on which data or instructions are clobbered and how they are used.  Importantly, the length of time between a stack overflow and its negative effects on the system depends on how long it is before the clobbered bits are used.</p>
<p>Unfortunately, stack overflow afflicts embedded systems far more often than it does desktop computers.  This is for several reasons, including: </p>
<ol>
<li>embedded systems usually have to get by on a smaller amount of RAM;</li>
<li>there is typically no virtual memory to fall back on (because there is no disk);</li>
<li>firmware designs based on RTOS tasks utilize multiple stacks (one per task), each of which must be sized sufficiently to ensure against unique worst-case stack depth;</li>
<li>and interrupt handlers may try to use those same stacks.</li>
</ol>
<p>Further complicating this issue, there is no amount of testing that can ensure that a particular stack is sufficiently large.  You can test your system under all sorts of loading conditions but you can only test it for so long.  A stack overflow that only occurs “once in a blue moon” may not be witnessed by tests that run for only “half a blue moon.”  Demonstrating that a stack overflow will never occur can, under algorithmic limitations (such as no recursion), be done with a top down analysis of the control flow of the code.  But a top down analysis will need to be redone every time the code is changed.</p>
<p><em>Best Practice</em>: On startup, paint an unlikely memory pattern throughout the stack(s).  (I like to use hex <code>23 3D 3D 23</code>, which looks like a fence ‘<code>#==#</code>’ in an ASCII memory dump.)  At runtime, have a supervisor task periodically check that none of the paint above some pre-established high water mark has been changed.  If something is found to be amiss with a stack, log the specific error (e.g., which stack and how high the flood) in non-volatile memory and do something safe for users of the product (e.g., controlled shut down or reset) before a true overflow can occur.  This is a nice additional safety feature to add to the watchdog task.</p>
<p><a href="/barr-code/2010/02/firmware-specific-bug-3-missing-volatile-keyword/">Firmware-Specific Bug #3</a></p>
<p>Firmware-Specific Bug #5 (coming soon)</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/03/firmware-specific-bug-4-stack-overflow/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Embedded Gurus &#8211; Site Redesign</title>
		<link>http://embeddedgurus.com/barr-code/2010/03/embedded-gurus-site-redesign/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/03/embedded-gurus-site-redesign/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 21:20:27 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/2010/03/embedded-gurus-site-redesign/</guid>
		<description><![CDATA[I am pleased to announce that the EmbeddedGurus website has been redesigned.  Among the new features of the site are:
1.  A dynamically updating home page, featuring the most recent posts from all of our bloggers.  If you prefer, you may view these posts by category.
2.  A common look and feel to [...]]]></description>
			<content:encoded><![CDATA[<p>I am pleased to announce that the <a href="http://www.embeddedgurus.com">EmbeddedGurus</a> website has been redesigned.  Among the new features of the site are:</p>
<p>1.  A dynamically updating home page, featuring the most recent posts from all of our bloggers.  If you prefer, you may view these posts <a href="/categories">by category</a>.</p>
<p>2.  A common look and feel to all of the individual blogs.</p>
<p>3.  The ability to search individual blogs, as well as to easily browse from one post to the next and via tags and categories.</p>
<p>4.  A sixth guru named <a href="/gurus/gary-stringham">Gary Stringham</a> with a blog called <a href="/embedded-bridge/">Embedded Bridge</a>.</p>
<p>A number of other minor improvements have also been made.</p>
<p>We hope you like the new look and continue to find our blogs about embedded systems design both readable and informative. </p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/03/embedded-gurus-site-redesign/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Challenge of Debugging Cache Coherency Problems</title>
		<link>http://embeddedgurus.com/barr-code/2010/02/the-challenge-of-debugging-cache-coherency-problems/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/02/the-challenge-of-debugging-cache-coherency-problems/#comments</comments>
		<pubDate>Fri, 19 Feb 2010 16:18:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[firmware]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/02/19/the-challenge-of-debugging-cache-coherency-problems/</guid>
		<description><![CDATA[The following is an example of a cache-related embedded software bug that is a real challenge to solve for several reasons, not the least of which is the fact that the actual problem was masked in the debugger&#8217;s view of memory.
One nasty bug that came up recently for us was the realization that we were [...]]]></description>
			<content:encoded><![CDATA[<p>The following is an example of a cache-related embedded software bug that is a real challenge to solve for several reasons, not the least of which is the fact that the actual problem was masked in the debugger&#8217;s view of memory.</p>
<blockquote><p>One nasty bug that came up recently for us was the realization that we were not flushing the instruction cache after leaving the bootloader which had a very confusing effect when running our application. In our design our code pretty much runs out of flash. Our bootloader is in the lowest part of flash and our 2 images sit in their own higher memory ranges of flash. So we never realized we should do this.</p>
<p>Well, we had to copy a small piece of code into RAM for the purpose of allowing firmware upgrades to be written to flash. This piece of code would be executing when the actual erases and writes took place (i.e. we couldn&#8217;t execute from AND write to flash at the same time). This code would get copied out of flash both when the bootloader started execution AND when the image would start execution because they shared the startup code that we inherited from a board development kit (BDK).</p>
<p>Another thing we didn&#8217;t realize was that the RAM code optimized differently for the bootloader image and the application images. The end result is that the instruction cache would in certain cases have a hit and return the wrong instructions for us. For instance, when we tried to perform an upgrade while running from our image, it would erase a completely different area of flash than we intended. To make things somewhat more confusing, it did NOT help to step through the code using the debugger. The debugger was not showing us that the instruction cache was providing different lines of code than the lines of source it was showing.</p>
<p>This was ultimately one of the more frustrating bugs we have chased recently. Imagine the confusion when sometimes a firmware upgrade would work fine and other times it would completely brick your board (they could be salvaged with a JTAG programmer at least).</p>
</blockquote>
<p>Thanks to Richard von Lehe of <a href="http://www.starkey.com">Starkey Labs</a> for sharing this.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/02/the-challenge-of-debugging-cache-coherency-problems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #3: Missing Volatile Keyword</title>
		<link>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-3-missing-volatile-keyword/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-3-missing-volatile-keyword/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 09:21:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/02/18/firmware-specific-bug-3-missing-volatile-keyword/</guid>
		<description><![CDATA[Failure to tag certain types of variables with C’s &#8216;volatile&#8217; keyword, can cause a number of symptoms in a system that works properly only when the compiler’s optimizer is set to a low level or disabled.  The volatile qualifier is used during variable declarations, where its purpose is to prevent optimization of the reads [...]]]></description>
			<content:encoded><![CDATA[<p>Failure to tag certain types of variables with C’s &#8216;volatile&#8217; keyword, can cause a number of symptoms in a system that works properly only when the compiler’s optimizer is set to a low level or disabled.  The volatile qualifier is used during variable declarations, where its purpose is to prevent optimization of the reads and writes of that variable.</p>
<p>For example, if you write code that says:</p>
<p><code><br />&nbsp;&nbsp; &nbsp;g_alarm = ALARM_ON; &nbsp; &nbsp;// Patient dying--get nurse!<br />&nbsp;&nbsp; &nbsp;// Other code; with no reads of g_alarm state.<br />&nbsp;&nbsp; &nbsp;g_alarm = ALARM_OFF; &nbsp; // Patient stable.<br /></code></p>
<p>the optimizer will generally try to make your program both faster and smaller by eliminating the first line above&#8211;to the detriment of the patient.  However, if g_alarm is declared as volatile this optimization will not take place.</p>
<p><i>Best Practice</i>: The &#8216;volatile&#8217; keyword should be used to declare any: (a) global variable shared by an ISR and any other code; (b) global variable accessed by two or more RTOS tasks (even when race conditions in those accesses have been prevented); (c) pointer to a memory-mapped peripheral register (or register set);  or (d) delay loop counter.</p>
<p>Note that in addition to ensuring all reads and writes take place for a given variable, the use of volatile also constrains the compiler by adding additional “sequence points”.  Accesses to multiple volatiles must be executed in the order they are written in the code.</p>
<p><a href="/barr-code/2010/02/firmware-specific-bug-2-non-reentrant.html">Firmware-Specific Bug #2</a></p>
<p><a href="/barr-code/2010/03/firmware-specific-bug-4-stack-overflow/">Firmware-Specific Bug #4</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-3-missing-volatile-keyword/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #2: Non-Reentrant Function</title>
		<link>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-2-non-reentrant-function/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-2-non-reentrant-function/#comments</comments>
		<pubDate>Mon, 15 Feb 2010 11:01:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[safety]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/02/15/firmware-specific-bug-2-non-reentrant-function/</guid>
		<description><![CDATA[Technically, the problem of a non-reentrant functions is a special case of the problem of a race condition. &#160;For that reason the run-time errors caused by a non-reentrant function are similar and also don’t occur in a reproducible way—making them just as hard to debug.&#160; Unfortunately, a non-reentrant function is also more difficult to spot [...]]]></description>
			<content:encoded><![CDATA[<p>Technically, the problem of a non-reentrant functions is a special case of the problem of a <a href="http://www.embeddedgurus.net/barr-code/2010/02/firmware-specific-bug-1-race-condition.html">race condition</a>. &nbsp;For that reason the run-time errors caused by a non-reentrant function are similar and also don’t occur in a reproducible way—making them just as hard to debug.&nbsp; Unfortunately, a non-reentrant function is also more difficult to spot in a code review than other types of race conditions.</p>
<p>The figure below shows a typical scenario.&nbsp;&nbsp;Here the software entities subject to preemption are RTOS tasks.&nbsp;&nbsp;But rather than manipulating a shared object directly, they do so by way of function call indirection.&nbsp;&nbsp;For example, suppose that Task A calls a sockets-layer protocol function, which calls a TCP-layer protocol function, which calls an IP-layer protocol function, which calls an Ethernet driver.&nbsp;&nbsp;In order for the system to behave reliably, all of these functions must be reentrant.</p>
<div class="separator" style="clear: both;text-align: center"><a href="http://embeddedgurus.net/barr-code/uploaded_images/tcpip-779901.png"><img border="0" src="http://embeddedgurus.net/barr-code/uploaded_images/tcpip-779898.png" /></a>&nbsp;</div>
<p>But the functions of the driver module manipulate the same global object in the form of the registers of the Ethernet Controller chip.&nbsp; If preemption is permitted during these register manipulations, Task B may preempt Task A after the Packet A data has been queued but before the transmit is begun.&nbsp; Then Task B calls the sockets-layer function, which calls the TCP-layer function, which calls the IP-layer function, which calls the Ethernet driver, which queues and transmits Packet B.&nbsp; When control of the CPU returns to Task A, it finally requests its transmission.&nbsp; Depending on the design of the Ethernet controller chip, this may either retransmit Packet B or generate an error. &nbsp;Either way, Packet A&#8217;s data is lost and does not go out onto the network.</p>
<p>In order for the functions of this Ethernet driver to be callable from multiple RTOS tasks (near-)simultaneously, those functions must be made reentrant.&nbsp; If each function uses only stack variables, there is nothing to do; each RTOS task has its own private stack. &nbsp;But drivers and some other functions will be non-reentrant unless carefully designed.</p>
<p>The key to making functions reentrant is to suspend preemption around all accesses of peripheral registers, global variables (including static local variables), persistent heap objects, and shared memory areas.&nbsp; This can be done either by disabling one or more interrupts or by acquiring and releasing a <a href="http://www.netrino.com/Embedded-Systems/Glossary-M#mutex">mutex</a>; the specifics of the type of shared data usually dictate the best solution.</p>
<p><i>Best Practice</i>: Create and hide a mutex within each library or driver module that is not intrinsically reentrant.&nbsp; Make acquisition of this mutex a pre-condition for the manipulation of any persistent data or shared registers used within the module as a whole.&nbsp; For example, the same mutex may be used to prevent race conditions involving both the Ethernet controller registers and a global (or static local) packet counter.&nbsp; All functions in the module that access this data, must follow the protocol to acquire the mutex before manipulating these objects.</p>
<p>Beware that non-reentrant functions may come into your code base as part of third party middleware, legacy code, or device drivers. &nbsp;Disturbingly, non-reentrant functions may even be part of the standard C or C++ library provided with your compiler. &nbsp;For example, if you are using the <a href="http://gcc.gnu.org/">GNU compiler</a> to build RTOS-based applications, take note that you should be using the reentrant “<a href="http://sourceware.org/newlib/">newlib</a>” standard C library rather than the default.</p>
<p><a href="http://www.embeddedgurus.net/barr-code/2010/02/firmware-specific-bug-1-race-condition.html">Firmware-Specific Bug&nbsp;#1</a></p>
<p><a href="http://www.embeddedgurus.net/barr-code/2010/02/firmware-specific-bug-3-missing.html">Firmware-Specific Bug #3</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-2-non-reentrant-function/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #1: Race Condition</title>
		<link>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-1-race-condition/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-1-race-condition/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 17:18:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[safety]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/02/11/firmware-specific-bug-1-race-condition/</guid>
		<description><![CDATA[A race condition is any situation in which the combined outcome of two or more threads of execution (which can be either RTOS tasks or main() plus an ISR) varies depending on the precise order in which the instructions of each are interleaved.
For example, suppose you have two threads of execution in which one regularly [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.netrino.com/Embedded-Systems/Glossary-R#race_condition">race condition</a> is any situation in which the combined outcome of two or more threads of execution (which can be either <a href="http://www.netrino.com/Embedded-Systems/Glossary-R#real_time_operating_system">RTOS</a> tasks or main() plus an <a href="http://www.netrino.com/Embedded-Systems/Glossary-I#interrupt_service_routine">ISR</a>) varies depending on the precise order in which the instructions of each are interleaved.</p>
<p>For example, suppose you have two threads of execution in which one regularly increments a global variable (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace">g_counter += 1;</span>) and the other occasionally resets it (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace">g_counter = 0;</span>).  There is a race condition here if the increment cannot always be executed atomically (i.e., in a single instruction cycle).  A collision between the two updates of the counter variable may&nbsp;never or only very rarely occur.  But when it does, the counter will not actually be reset in memory; its value is henceforth corrupt.  The effect of this may have serious consequences for the system, though perhaps not until a long time after the actual collision.</p>
<p><i>Best Practice</i>: Race conditions can be prevented by surrounding the “critical sections” of code that must be executed atomically with an appropriate preemption-limiting pair of behaviors.  To prevent a race condition involving an ISR, at least one interrupt signal must be disabled for the duration of the other code’s critical section.  In the case of a race between RTOS tasks, the best practice is the creation of a mutex specific to that shared object, which each task must acquire before entering the critical section.  Note that it is not a good idea to rely on the capabilities of a specific CPU to ensure atomicity, as that only prevents the race condition until a change of compiler or CPU.</p>
<p>Shared data and the random timing of preemption are culprits that cause the race condition.  But the error might not always occur, making tracking down such bugs from symptoms to root causes incredibly difficult.  It is, therefore, important to be ever-vigilant about protecting all shared objects.</p>
<p><i>Best Practice</i>: Name all potentially shared objects—including global variables, heap objects, or peripheral registers and pointers to the same—in a way that the risk is immediately obvious to every future reader of the code.  <a href="http://www.netrino.com/">Netrino</a>’s <a href="http://www.netrino.com/Coding-Standard">Embedded C Coding Standard</a> advocates the use of a &#8216;<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace">g_</span>&#8216; prefix for this purpose.</p>
<p>Locating all potentially shared objects is the first step in a code audit for race conditions.</p>
<p><a href="http://www.embeddedgurus.net/barr-code/2010/02/firmware-specific-bug-2-non-reentrant.html">Firmware-Specific Bug #2: Non-Reentrant Function</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-1-race-condition/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Embedded Software is the Future of Product Quality and Safety</title>
		<link>http://embeddedgurus.com/barr-code/2010/02/embedded-software-is-the-future-of-product-quality-and-safety/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/02/embedded-software-is-the-future-of-product-quality-and-safety/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 18:18:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[embedded]]></category>
		<category><![CDATA[engineering]]></category>
		<category><![CDATA[ethics]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/02/08/embedded-software-is-the-future-of-product-quality-and-safety/</guid>
		<description><![CDATA[Last year a friend had a St. Jude pacemaker attached to his heart.  When he reported an unexpected low battery reading (displayed on an associated digital watch) to his doctor a month later, he learned this was the result of a firmware bug known to the manufacturer.  The battery was fine and would [...]]]></description>
			<content:encoded><![CDATA[<p>Last year a friend had a <a href="http://www.sjmprofessional.com/en/Products/US/Pacing-Systems/Accent-Pacemaker.aspx">St. Jude pacemaker</a> attached to his heart.  When he reported an unexpected low battery reading (displayed on an associated digital watch) to his doctor a month later, he learned this was the result of a firmware bug known to the manufacturer.  The battery was fine and would last on the order of a decade more.  His new-model pacemaker&#8217;s firmware didn&#8217;t include a bug fix that was provided the year before to wearers of old-model.</p>
<p>Another friend owns a <a href="http://www.landroverusa.com/us/en/Vehicles/new-LR2/overview.htm">Land Rover LR2</a> SUV with back-up sensors.  When the car is in reverse and nearing an obstacle or another car, the driver is alerted via a beeping sound.  Except that the back-up sensors don&#8217;t always work.  Some &#8220;reboots&#8221; of the SUV don&#8217;t seem to have this feature enabled.  He suspects there is a &#8220;<a href="http://www.netrino.com/Embedded-Systems/Glossary-R#race_condition">race condition</a>&#8221; during the software startup sequence.</p>
<p>Yet another friend has driven a <a href="http://www.toyota.com/prius-hybrid/">Toyota Prius</a> hybrid over 100,000 miles.  He reports that the brakes very occasionally have an odd/different feel.  But his older model Prius is not expected to be subject to <a href="http://www.huffingtonpost.com/2010/02/08/toyota-recall-prius-hybri_n_453188.html">the 2010 model year recall</a>.</p>
<p>These are just a few of the personal anecdotes behind the headlines.  Embedded software is everywhere now, with <a href="http://www.embeddedgurus.net/barr-code/2008/03/vdc-counts-4-billion-embedded-products.html">over 4 billion new devices manufactured each year</a>.  Increasingly the quality and safety of products is a side-effect of the quality and safety of the software embedded inside.</p>
<p>One important question is, can we trust future software updates any more than we can trust the existing firmware?  How do we know that the Toyota Prius hybrids with upgraded braking firmware will be safer than those with the factory firmware?</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/02/embedded-software-is-the-future-of-product-quality-and-safety/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Is Toyota&#039;s Accelerator Problem Caused by Embedded Software Bugs?</title>
		<link>http://embeddedgurus.com/barr-code/2010/01/is-toyotas-accelerator-problem-caused-by-embedded-software-bugs/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/01/is-toyotas-accelerator-problem-caused-by-embedded-software-bugs/#comments</comments>
		<pubDate>Thu, 28 Jan 2010 16:36:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[embedded]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/01/28/is-toyotas-accelerator-problem-caused-by-embedded-software-bugs/</guid>
		<description><![CDATA[Last month I received an interesting e-mail in response to a column I wrote for Embedded Systems Design called The Lawyers are Coming!  My column was partly about the poor state of embedded software quality across all industries, and my correspondent was writing to say my observations were accurate from his perch within the [...]]]></description>
			<content:encoded><![CDATA[<p>Last month I received an interesting e-mail in response to a column I wrote for <a href="http://www.embedded.com">Embedded Systems Design</a> called <a href="http://www.embedded.com/columns/barrcode/221901488">The Lawyers are Coming!</a>  My column was partly about the poor state of embedded software quality across all industries, and my correspondent was writing to say my observations were accurate from his perch within the automotive industry.  Included in his e-mail was this interesting tidbit: </p>
<blockquote><p>I read something about the big Toyota recall being related to floor mats interfering with the accelerator, but I was told that the problem appears to be software (firmware) for the control-by-wire pedal.&nbsp; Me thinks somebody probably forgot to check ranges, overflows, or stability properly when implementing the &#8220;algorithm&#8221;.</p></blockquote>
<p>As background for those of you who have been working in <a href="http://en.wikipedia.org/wiki/Sensitive_Compartmented_Information_Facility">SCIFs</a> or other labs, the &#8220;big Toyota recall&#8221; was first announced in September 2009.  It was said to concern <a href="http://money.cnn.com/2009/09/29/news/companies/toyota_lexus_floor_mats/">removable floor mats causing the accelerator pedal to be pressed down</a>. Some 3.8 million Toyota and Lexus vehicles were involved and owners were told to remove floor mats immediately.  </p>
<p>This week several related major news events have transpired, including:</p>
<ul>
<li>Toyota recalled <a href="http://www.cbsnews.com/stories/2010/01/27/business/main6148928.shtml">millions of additional vehicles in the U.S.</a>,
<li><a href="http://detnews.com/article/20100127/AUTO01/1270400/U.S.--Toyota-was-legally-required-to-stop-selling-models">Under pressure from the U.S. NHTSA, Toyota halted production and sales</a> of eight models,
<li><a href="http://online.wsj.com/article/BT-CO-20100127-721133.html">Avis, Hertz, and Enterprise pulled affected Toyota models from their rental fleets</a>,
<li>Toyota&#8217;s recall spread to <a href="http://online.wsj.com/article/BT-CO-20100125-704637.html">Europe</a> and <a href="http://money.cnn.com/news/newsfeeds/articles/djf500/201001280844DOWJONESDJONLINE000465_FORTUNE5.htm">China</a>, and
<li><a href="http://online.wsj.com/article/SB10001424052748704194504575030891636493402.html">Ford stopped production of a full-size commercial vehicle</a> after discovering that the gas pedal came from the supplier involved in the Toyota recall.</ul>
<p>But none of the articles I&#8217;ve read have talked about software being a cause.  And it&#8217;s not clear if the affected models are <a href="http://en.wikipedia.org/wiki/Drive_by_wire">drive-by-wire</a>.  However, at least one article I read yesterday suggested that one fix being worked on is a software interlock to ensure that if both the brake and the gas pedal are depressed, the brake will override the accelerator.  On the one hand, that seems to mean that software is already in the middle; on the other, I would be extremely surprised to learn that such an interlock wasn&#8217;t already present in a drive-by-wire system.</p>
<p>So what&#8217;s the story?  Are embedded software bugs to blame for this massive recall?  Do you know?  Have you found any helpful articles pointing at software problems?  Please share what you know in the comments below, or e-mail me privately.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/01/is-toyotas-accelerator-problem-caused-by-embedded-software-bugs/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Firmware Update &#8211; A Free Newsletter for Firmware Engineers</title>
		<link>http://embeddedgurus.com/barr-code/2010/01/firmware-update-a-free-newsletter-for-firmware-engineers/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/01/firmware-update-a-free-newsletter-for-firmware-engineers/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 20:46:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[education]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[realtime]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/01/26/firmware-update-a-free-newsletter-for-firmware-engineers/</guid>
		<description><![CDATA[I&#8217;ve been writing&#160;about the practice of embedded software development&#8211;in the form of books, articles, columns, conference papers, and blog posts&#8211;for nearly 15 years. &#160;(How time flies&#8230;) &#160;I also served as editor-in-chief of Embedded Systems Design magazine for about 3-1/2 years in the middle. &#160;But it wasn&#8217;t until August of last year that it occurred to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been writing&nbsp;about the practice of embedded software development&#8211;in the form of <a href="http://www.netrino.com/Embedded-Systems/Books">books</a>, <a href="http://www.netrino.com/Embedded-Systems/How-To">articles</a>, <a href="http://www.embedded.com/columns/archive/?content_type=barrcode">columns</a>, <a href="http://www.esconline.com/">conference papers</a>, and <a href="http://www.embeddedgurus.net/barr-code">blog posts</a>&#8211;for nearly 15 years. &nbsp;(How time flies&#8230;) &nbsp;I also served as editor-in-chief of <a href="http://www.embedded.com/">Embedded Systems Design</a> magazine for about 3-1/2 years in the middle. &nbsp;But it wasn&#8217;t until August of last year that it occurred to me to write an e-mail newsletter.</p>
<p>My newsletter is called&nbsp;<a href="http://www.firmwareupdate.net/">Firmware Update</a>, and it is published about every 3 weeks. &nbsp;The stated mission of Firmware Update&nbsp;is to spread the word about the firmware development best practices I have learned in my career as an engineer, consultant, and trainer. &nbsp;In addition to&nbsp;connecting my past, present, and future writings into a coherent storyline, I am using the newsletter to link to articles and papers by others who influence my thinking.</p>
<p>In less than six months, the newsletter is up to more than 11,000 subscribers. &nbsp;We&#8217;ve placed a helpful archive of all past issues at <a href="http://www.firmwareupdate.net/">FirmwareUpdate.net</a>. &nbsp;And I&#8217;m working hard to make the format as easy and fun to read as it is informative. &nbsp;If you develop embedded software, I&#8217;m certain you will find it valuable. &nbsp;&nbsp;If you&#8217;re not already a subscriber, you can join the mailing list at&nbsp;<a href="http://visitor.constantcontact.com/email.jsp?m=1101728959593">http://visitor.constantcontact.com/email.jsp?m=1101728959593</a>. </p>
<p>Note that each issue of Firmware Update is Copyright 2009-2010 by Netrino, LLC. &nbsp;However, you may reprint the newsletter for non-commercial purposes. Indeed, I encourage you to forward it to colleagues who may benefit from the information.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/01/firmware-update-a-free-newsletter-for-firmware-engineers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rate Monotonic Analysis and Round Robin Scheduling</title>
		<link>http://embeddedgurus.com/barr-code/2010/01/rate-monotonic-analysis-and-round-robin-scheduling/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/01/rate-monotonic-analysis-and-round-robin-scheduling/#comments</comments>
		<pubDate>Sat, 23 Jan 2010 00:29:00 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[RTOS Multithreading]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>

		<guid isPermaLink="false">http://www.gfcdev.org/test-stack/2010/01/22/rate-monotonic-analysis-and-round-robin-scheduling/</guid>
		<description><![CDATA[Rate Monotonic Analysis (RMA) is a way of proving a priori via mathematics (rather than post-implementation via testing) that a set of tasks and interrupt service routines (ISRs) will always meet their deadlines&#8211;even under worst-case timing. &#160;In this blog, I address the issue of what to do if two or more tasks or ISRs have [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.netrino.com/Embedded-Systems/How-To/RMA-Rate-Monotonic-Algorithm">Rate Monotonic Analysis (RMA)</a> is a way of proving <i>a priori</i> via mathematics (rather than post-implementation via testing) that a set of tasks and interrupt service routines (ISRs) will always meet their <a href="http://www.netrino.com/Embedded-Systems/Glossary-D#deadline">deadlines</a>&#8211;even under worst-case timing. &nbsp;In this blog, I address the issue of what to do if two or more tasks or ISRs have equal priority and whether round robin scheduling is necessary in an RTOS to deal with that special case.</p>
<p>First a little background. &nbsp;In order for the schedulability analysis portion of the RMA mathematics to provide meaningful results, the following assumptions must hold:</p>
<ul>
<li>A <a href="http://www.netrino.com/Embedded-Systems/How-To/RTOS-Preemption-Multitasking">preemptive</a> <a href="http://www.netrino.com/Embedded-Systems/How-To/RTOS-Selection">real-time operating system (RTOS)</a> is used for scheduling,</li>
<li>Each <a href="http://www.netrino.com/Embedded-Systems/Glossary-T#task">task</a> or <a href="http://www.netrino.com/Embedded-Systems/Glossary-I#interrupt_service_routine">ISR</a> must be assigned a fixed priority (relative to the others) that is not changed while the system runs, and</li>
<li>Unbounded <a href="http://www.netrino.com/Embedded-Systems/How-To/RTOS-Priority-Inversion">priority inversions</a> must be prevented.</li>
</ul>
<p>Under RMA, the relative priorities are assigned according to a simple rule: &#8220;<b>The more often a task or ISR runs (in the worst-case), the higher its priority.</b>&#8221; Put another way, the task or ISR with the longest period between iterations (<i>interarrival time</i>, if you prefer) is least important. This is because an infrequent but high-priority task could prevent a more frequent task from missing an entire iteration.</p>
<p>So what happens if you are using RMA to assign priorities and you wind up with two (or more) tasks or ISRs assigned equal priority? (Translation: they have the same worst-case interarrival times). Must they be assigned equal priority in the real system? What if the RTOS (in the case of tasks) or hardware (in the case of interrupts) doesn&#8217;t support round-robin scheduling&#8211;or even equal priorities with run-to-completion?</p>
<p>Interestingly, it turns out not to matter a bit whether you:</p>
<ol>
<li>Merge the two tasks into one (i.e., executed code for Task A then Task B).</li>
<li>Give them equal priority, either with round robin or run-to-completion behavior.</li>
<li>Give them adjacent unequal priorities (in either relative order).</li>
</ol>
<p>If you run through the timing diagrams for each of the above scenarios, you&#8217;ll see that all three are equivalent. Except that the equal priority with round robin potentially suffers a performance impact from unnecessary additional context switches.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/01/rate-monotonic-analysis-and-round-robin-scheduling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
