<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Barr Code &#187; safety</title>
	<atom:link href="http://embeddedgurus.com/barr-code/tag/safety/feed/" rel="self" type="application/rss+xml" />
	<link>http://embeddedgurus.com/barr-code</link>
	<description>A Blog by Michael Barr</description>
	<lastBuildDate>Wed, 25 Jan 2012 09:45:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Combining C&#8217;s volatile and const Keywords</title>
		<link>http://embeddedgurus.com/barr-code/2012/01/combining-cs-volatile-and-const-keywords/</link>
		<comments>http://embeddedgurus.com/barr-code/2012/01/combining-cs-volatile-and-const-keywords/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 11:29:17 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[Efficient C/C++]]></category>
		<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=695</guid>
		<description><![CDATA[Does it ever make sense to declare a variable in C or C++ as both volatile (i.e., &#8220;ever-changing&#8221;) and const (&#8220;read-only&#8221;)? If so, why? And how should you combine volatile and const properly? One of the most consistently popular articles on the Netrino website is about C&#8217;s volatile keyword. The volatile keyword, like const, is [...]]]></description>
			<content:encoded><![CDATA[<p><em>Does it ever make sense to declare a variable in C or C++ as both volatile (i.e., &#8220;ever-changing&#8221;) and const (&#8220;read-only&#8221;)?  If so, why?  And how should you combine volatile and const properly?</em></p>
<p>One of the most consistently popular articles on the <a href="http://www.netrino.com" title="Netrino" target="_blank">Netrino</a> website is about C&#8217;s volatile keyword. The volatile keyword, like const, is a type qualifier.  These keywords can be used by themselves or together in variable declarations.</p>
<p>I&#8217;ve written about volatile and const individually before.  If you haven&#8217;t previously used the volatile keyword, I recommend you read <a href="http://www.netrino.com/Embedded-Systems/How-To/C-Volatile-Keyword" title="How to Use C's volatile Keyword" target="_blank">How to Use C&#8217;s volatile Keyword</a> before going on.  As that article makes plain:</p>
<blockquote><p>C&#8217;s volatile keyword is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time&#8211;without any action being taken by the code the compiler finds nearby.</p></blockquote>
<p><b>How to Use C&#8217;s volatile Keyword</b></p>
<p>By declaring a variable volatile you are effectively asking the compiler to be as inefficient as possible when it comes to reading or writing that variable.  Specifically, the compiler should generate object code to perform each and every read from a volatile variable and each and every write to a volatile variable&#8211;even if you write it twice in a row or read it and ignore the result.  No read or write can be skipped.  Effectively no optimizations are allowed with respect to volatile variables.  </p>
<p>The use of volatile variables also creates additional sequence points in C and C++ programs.  The order of accesses of volatile variables A and B in the object code must be the same as the order of those accesses in the source code.  The compiler is not allowed to reorder volatile variable accesses for any reason.</p>
<p>Here are a couple of examples of declarations of volatile variables:</p>
<p><code>int volatile g_flag_shared_with_isr;</code><br />
<br />
<code>uint8_t volatile * p_led_reg = (uint8_t *) 0x00080000;</code></p>
<p>The first example declares a global flag that can be shared between an ISR and some other part of the code (e.g., a background processing loop in main() or an RTOS task) without fear that the compiler will optimize (i.e., &#8220;delete&#8221;) the code you write to check for asynchronous changes to the flag&#8217;s value.  It is important to use volatile to declare all variables that are shared by asynchronous software entities, which is important in any kind of multithreaded programming.  (Remember, though, that access to global variables shared by tasks or with an ISR must always also be controlled via a <a href="http://www.netrino.com/Embedded-Systems/How-To/RTOS-Mutex-Semaphore" title="Mutexes and Semaphores Demystified" target="_blank">mutex</a> or interrupt disable, respectively.)</p>
<p>The second example declares a pointer to a hardware register at a known physical memory address (80000h)&#8211;in this case to manipulate the state of one or more LEDs.  Because the pointer to the hardware register is declared volatile, the compiler must always perform each individual write.  Even if you write C code to turn an LED on followed immediately by code to turn the same LED off, you can trust that the hardware really will receive both instructions.  Because of the sequence point restrictions, you are also guaranteed that the LED will be off after both lines of the C code have been executed.  The volatile keyword should always be used with creating pointers to memory-mapped I/O such as this.</p>
<p>[See <a href="http://embeddedgurus.com/barr-code/2009/03/coding-standard-rule-4-use-volatile-whenever-possible/" title="Coding Standard Rule #4: Use volatile Whenever Possible" target="_blank">Coding Standard Rule #4: Use volatile Whenever Possible</a> for more on the use of volatile by itself.]</p>
<p><strong>How to Use C&#8217;s const Keyword</strong></p>
<p>The const keyword is can be used to modify parameters as well as in variable declarations.  Here we are only interested in the use of const as a type qualifier, as in:</p>
<p><code>uint16_t const max_temp_in_c = 1000;</code></p>
<p>This declaration creates a 16-bit unsigned integer value of 1,000 with a scoped name of <code>max_temp_in_c</code>.  In C, this variable will exist in memory at run-time, but will typically be located, by the linker, in a non-volatile memory area such as ROM or flash.  Any reference to the const variable will read from that location.  (In C++, a const integer may no longer exist as an addressable location in run-time memory.)</p>
<p>Any attempt the code makes to write to a const variable directly (i.e., by its name) will result in a compile-time error.  To the extent that the const variable is located in ROM or flash, an indirect write (i.e., via a pointer to its address) will also be thwarted&#8211;though at run-time, obviously.</p>
<p>Another use of const is to mark a hardware register as read-only.  For example:</p>
<p><code>uint8_t const * p_latch_reg = 0x10000000;</code></p>
<p>Declaring the pointer this way, any attempt to write to that physical memory address via the pointer (e.g., <code>*p_latch_reg = 0xFF;</code>) should result in a compile-time error.</p>
<p>[See <a href="http://embeddedgurus.com/barr-code/2009/03/coding-standard-rule-2-use-const-wherever-possible/" title="Coding Standard Rule #2: Use const Whenever Possible" target="_blank">Coding Standard Rule #2: Use const Whenever Possible</a> for more on the use of const by itself.]</p>
<p><strong>How to Use const and volatile Together</strong></p>
<p>Though the essence of the volatile (&#8220;ever-changing&#8221;) and const (&#8220;read-only&#8221;) decorators may seem at first glance opposed, there are some times when it makes sense to use them both to declare one variable.  The scenarios I&#8217;ve run across have involved pointers to memory-mapped hardware registers and shared memory areas.</p>
<p><em>(#1) Constant Addresses of Hardware Registers</em></p>
<p>The following declaration uses both const and volatile in the frequently useful scenario of declaring a constant pointer to a volatile hardware register.</p>
<p><code>uint8_t volatile * const p_led_reg = (uint8_t *) 0x00080000;</code></p>
<p>The proper way to read a complex declaration like this is from the name of the variable back to the left, as in:</p>
<blockquote><p>p_led_reg IS A constant pointer TO A volatile 8-bit unsigned integer.</p></blockquote>
<p>Reading it that way, we can see that the keyword const modifies only the pointer (i.e., the fixed address 80000h), which does not change at run-time.  Whereas the keyword volatile modifies only the type of integer.  This is actually quite useful and is a much safer version of the declaration of a p_led_reg that appears at the top of this article.  In particular, adding const means that the simple typo of a missed pointer dereference (&#8216;*&#8217;) will be caught at compile time.  That is, the mistaken code <code>p_led_reg = LED1_ON;</code> won&#8217;t overwrite the address with the non-80000h value of LED1_ON.  The compiler error leads us to correct this to <code>*p_led_reg = LED1_ON;</code>, which is almost certainly what we meant to write in the first place.</p>
<p><em>(#2) Read-Only Shared-Memory Buffer</em></p>
<p>Another use for a combination of const and volatile is where you have two processors communicating via a shared memory area and you are coding the side of this communications that will only be reading from a shared memory buffer.  In this case you could declare variables such as:</p>
<p><code>int const volatile comm_flag;</code><br />
<br />
<code>uint8_t const volatile comm_buffer[BUFFER_SIZE];</code></p>
<p>Of course, you&#8217;d usually want to instruct the linker to place these global variables at the correct addresses in the shared memory area or to declare the above as pointers to specific physical memory addresses.  In the case of pointers, the use of const and volatile may become even more complex, as in the next category.</p>
<p><em>(#3) Read-Only Hardware Register</em></p>
<p>Sometimes you will run across a read-only hardware register.  In addition to enforcing compile-time checking so that the software doesn&#8217;t try to overwrite the memory location, you also need to be sure that each and every requested read actually occurs.  By declaring your variable IS A (constant) pointer TO A constant and volatile memory location you request all of the appropriate protections, as in:</p>
<p><code>uint8_t const volatile * const p_latch_reg = (uint8_t *) 0x10000000;</code></p>
<p>As you can see, the declarations of variables that involve both the volatile and const decorators can quickly become complicated to read.  But the technique of combining C&#8217;s volatile and const keywords can be useful and even important.  This is definitely something you should learn to master to be a master embedded software engineer.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2012/01/combining-cs-volatile-and-const-keywords/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Firmware Forensics: Best Practices in Embedded Software Source Code Discovery</title>
		<link>http://embeddedgurus.com/barr-code/2011/09/firmware-forensics-best-practices-in-embedded-software-source-code-discovery/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/09/firmware-forensics-best-practices-in-embedded-software-source-code-discovery/#comments</comments>
		<pubDate>Tue, 27 Sep 2011 15:32:25 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[patents]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=692</guid>
		<description><![CDATA[Software has become ubiquitous, embedded as it is into the fabric of our lives in literally billions of new (non-computer) products per year, from microwave ovens to electronic throttle controls. When products controlled by software are the subject of litigation, whether for infringement of intellectual property rights or product liability, it is imperative to analyze [...]]]></description>
			<content:encoded><![CDATA[<p>Software has become ubiquitous, embedded as it is into the fabric of our lives in literally billions of new (non-computer) products per year, from microwave ovens to electronic throttle controls.  When products controlled by software are the subject of litigation, whether for infringement of intellectual property rights or product liability, it is imperative to analyze the embedded software (a.k.a., firmware) properly and thoroughly.  This article enumerates five best practices for embedded software source code discovery and the rationale for each.</p>
<p>In February 2011, the U.S. government’s <a href="http://www.nhtsa.gov">National Highway Traffic Safety Administration</a> and a team from <a href="http://www.nasa.gov/offices/nesc/">NASA’s Engineering and Safety Center</a> published reports of their joint investigation into the causes of unintended acceleration in Toyota vehicles. While NHTSA led the overall effort and examined recall records, accident reports, and complaint statistics, the more technically focused team from NASA performed reviews of the electronics and embedded software at the heart of Toyota’s “electronic throttle control subsystem” (ETCS). Redacted public versions of the official reports from each agency, together with a number of related documents, can be found at http://www.nhtsa.gov/UA.</p>
<p>These reports are very interesting in what they have to say about the quality of Toyota’s firmware and NASA’s review of the same.  However, of greater significance is what they are not able to say about unintended acceleration.  It appears that NASA did not follow a number of best practices for reviewing embedded software source code that might have identified useful evidence.  In brief, NASA failed to find a firmware cause of unintended acceleration—but their review also fails to rule out firmware causes entirely.  </p>
<p>This article describes a set of five recommended practices for firmware source code review that are based on my experiences as both an embedded software developer and as an expert witness.  Each of the recommendations will consider what more could have been done to determine whether Toyota’s ETCS firmware played a role in any of the unintended acceleration.  The five recommended practices are: (1) ask for the bug list; (2) insist on an executable; (3) reproduce the development environment; (4) try for the version control repository; and (5) remember the hardware.  The relative value and importance of the individual practices will vary by type of litigation, so the recommendations are presented in the order that is most readable.</p>
<p><strong>Ask for the Bug List</strong></p>
<p>Any serious litigation involving embedded software will require an expert review of the source code.  The source code should be requested early in the process of discovery.  Owners of source code tend to strenuously resist such requests but procedures limiting access to the source code to only certain named and pre-approved experts and only under physical security (often a non-networked computer with no removable storage in a locked room) tend to be agreed upon or ordered by a judge.</p>
<p>Software development organizations commonly keep additional records that may prove more important or useful than a mere copy of the source code.  Any reasonably thorough software team will maintain a bug list (a.k.a., defect database) describing most or all of the problems observed in the software along with the current status of each (e.g., “fixed in v2.2” or “still under investigation”).  The list of bugs fixed and known—or the company’s lack of such a list—is germane to issues of software quality.  Thus the bug list should be routinely requested and supplied in discovery.  (It is also recommended that a request be made for copies of software design documents, coding standards, build logs and associated tool outputs, testing logs, and other artifacts of the embedded software design and development process.)</p>
<p>Very nearly every piece of software ever written has defects, both known and unknown.  Thus the bug list provides helpful guidance to a reviewer of the source code.  Often, for example, bugs cluster in specific source files in need of major rework.  To ignore the company’s own records of known bugs, as the NASA reviewers apparently did, is to examine a constitution without considering the historical reasons for the adoption of each section and amendment.  Indeed, a simple search of the text in Toyota’s bug list for the terms “stuck” and “fuel valve” might yet provide some useful information about unintended acceleration.</p>
<p><strong>Insist on an Executable</strong></p>
<p>In software parlance, the “executable” program is the binary version of the program that’s actually executed in the product.  The machine-readable executable is constructed from a set of human-readable source code files using software build tools such as compilers and linkers.  It is important to recognize that one set of source code files may be capable of producing multiple executables, based on tool configuration and options.</p>
<p>Though not human-readable, an executable program may provide valuable information to an expert reviewer.  For example, one common technique is to extract the human-readable “strings” within the executable.  The strings in an executable program include information such as on-screen messages to the user (e.g., “Press the ‘?’ button for help.”).  In a copyright infringement case in which I once consulted several strings in the defendant’s executable helpfully contained a phrase similar to “Copyright Plaintiff”!  You may not be so lucky, but isn’t it worth a try?</p>
<p>It may also be possible to reverse engineer or disassemble an executable file into a more human-readable form.  Disassembly could be important in cases of alleged patent infringement, for example, where what looks like an infringement of a method claim in the source code might be unused code or not actually part of the executable in the product as used by customers.</p>
<p>Sometimes it is easy to extract the executable directly from the product for expert examination—in which case the expert should engage in this step.  For instance, software running on Microsoft Windows consists of an executable file with the extension .EXE, which is easily extracted.  However, the executable programs in most embedded systems are difficult, at best, to extract.   (Note that if it is possible for the expert to extract an executable from one or more exemplars of the product, an automated comparison should always be made between the installed and produced binary files.  You never know what you may find and any difference could have important implications for the facts underlying the case.)  Extraction of Toyota’s ETCS firmware might not be physically possible.  Thus the legal team should insist on production of the executable(s) actually used by the relevant customers.</p>
<p><strong>Reproduce the Development Environment</strong></p>
<p>The dichotomy between source code and executable code and the inability of even most software experts to make much sense of binary code can create problems in the factual landscape of litigation.  For example, suppose that the source code produced by Toyota was inadvertently incomplete in that it was missing two or three source code files.  Even an expert reviewer looking at the source code might not know about the absent files.  For example, if the bug the expert is looking for is related to fuel valve control and the code related to that subject doesn’t reference the missing files, the reviewer may not notice their absence.  No expert can spot a bug in a missing file.</p>
<p>Fortunately, there is a reliable way for an expert to confirm that she has been provided with all of the source code.  The objective is simply stated: reproduce the software build tools setup and compile the produced source code. To do this it is necessary to have a copy of the development team’s detailed build settings, such as make files, preprocessor defines, and linker control files.  If the build process completes and produces an executable, it is certain the other party has provided a complete copy of the source code.  (Further additional technical details include the need to start with a “clean” set of files that contains no object files or libraries.  It may also be necessary to obtain third-party header files or libraries.)  </p>
<p>Furthermore, if the executable as built matches the executable as produced (actually, ideally, the executable as extracted from the product) bit by binary bit, it is certain that the other party has provided a true and correct version of the source code.  Unfortunately, trying to prove this part may take longer than just completing a build; the build could fail to produce the desired proof for a variety of reasons.  The details here get complicated: to get exactly the same output executable, it is necessary to use all of the following: precisely the same version of the compiler, linker, and each other build tool as the original developers; precisely the same configuration of each of those tools; and precisely the same set of build instructions.  Even a slight variation in just one of these details will generally produce an executable that doesn’t match the other binary image at all—just as the wrong version of the source code would.</p>
<p><strong>Try for the Version Control Repository</strong></p>
<p>Embedded software source code is never created in an instant.  All software is developed one layer at a time over a period of months or years in the same way that a bridge and the attached roadways exist in numerous interim configurations during their construction.  The version control repository for a software program is like a series of time-lapse photos tracking the day-by-day changes in the construction of the bridge.  But there is one considerable difference: it is possible to go back to one of those source code snapshots and rebuild the executable of that particular version.  This becomes critically important when multiple software versions will be deployed over a number of years.  In the automotive industry, for example, it must be possible to give one customer a bug fix for his v2.1 firmware while also working on the new v3.0 firmware to be released the following model year.</p>
<p>Consider, for the sake of discussion, that the executable version of Toyota’s ETCS v2.1 firmware that was installed in the factory in one million cars around the world had an undiscovered bug that could result in unintended acceleration under certain rare operating conditions.  Now further suppose that this bug was (perhaps unintentionally) eliminated in the v2.2 source code, from which a subsequent executable was created and installed at the factory into millions more cars with the same model names—and also as an upgrade into some of the original one million cars as they visited dealers for scheduled maintenance.  In this scenario, an examination of the v2.2 source code proves nothing about the safety of the hundreds of thousands of cars still with v2.1 under the hood.</p>
<p>Gaining access to the entire version control repository containing all of the past versions of a company’s firmware source code through discovery may be out of the question.  For example, a judge in a source code copyright and trade secrets case I consulted in would only allow the plaintiff to choose one calendar date and to then receive a snapshot of the defendant’s source code from that specific date.  If the plaintiff was lucky it would find evidence of their proprietary code in that specific snapshot.  But the observed absence of their proprietary code from that one specific snapshot doesn’t prove the alleged theft didn’t happen earlier or later in time.</p>
<p>There are some problems with examination of an entire version control repository.  It may be difficult to make sense of the repository’s structure.  Or, if the structure can be understood, it might take many times as long to perform a thorough review of the major and minor versions of the various source code files as it would to just review one snapshot in time.  At first glance, many of those files would appear the same or similar in every version—but subtle differences could be important to making a case.  To really be productive with that volume of code, it may be necessary to obtain a chronological schedule provided by a bug list and/or other production documents describing the source code at various points in time.</p>
<p><strong>Remember the Hardware</strong></p>
<p>Embedded software is always written with the hardware platform in mind and should be reviewed in the same manner.  For example, it is only possible to properly reverse engineer or disassemble an executable program once the specific microprocessor (e.g., Pentium, PowerPC, or ARM) is known.  But knowing the processor is just the beginning, because the hardware and software are intertwined in complex ways in such embedded systems.</p>
<p>Only one or more features of the hardware are enabled or active when the hardware is in a particular configuration.  For instance, consider an embedded system with a network interface, such as an Ethernet jack that is only powered when a cable is mechanically inserted.  Some or all of the software required to send and receive messages over this network may be not be executed until a cable is inserted.  A proper analysis of the software needs to keep hardware-software interactions like this in perspective.  Ideally, testing of the firmware should be done on the hardware as configured in exemplars of the units at issue—so it is useful to ask for hardware during discovery, if you are not able to acquire exemplars in other ways.  It is not clear from the redacted reports if NHTSA’s testing of certain Toyota Camrys was done using the same firmware version on exactly the same hardware as the owners who experienced unintended acceleration.  Hardware interactions can be one of the most important considerations of all when analyzing embedded software.</p>
<p>Sometimes a bug is not visible in the software itself.  Such a bug may result from a combination of hardware and software behaviors or multi-processor interactions.  For example, one motor control system I’m familiar with had a dangerous <a href="/barr-code/2010/02/firmware-specific-bug-1-race-condition/">race condition</a>.   The bug, though, was the result of an unforeseen mismatch between the hardware reaction time and the software reaction time around a sequence of commands to the motor.</p>
<p><strong>Additional Analysis Required</strong></p>
<p>As you can see, the review of embedded software can be complicated.  This is partly because the hardware of each embedded system is unique.  In addition, the system as a whole generally involves complex interactions between hardware, software, and user.  An expert in embedded software should typically have a degree in electrical engineering, computer engineering, or computer science plus years of relevant experience designing embedded systems and programming in the relevant language(s).</p>
<p>The five best practices presented here are meant to establish the critical importance of making certain specific requests early in the legal discovery process.  They are by no means the only types of analysis that should be performed on the source code.  For example, in any case involving the quality or reliability of embedded software, the source code should be tested via static analysis tools.  This and other types of technical analysis should be well understood by any expert witness or litigation consultant with the proper background.</p>
<p>In the case of Toyota’s unintended acceleration issues, I hope that expert review in the class action litigation against Toyota will include these and other additional types of analysis to identify all of the potential causes and determine if embedded software played any role. Though government funds for analysis by NASA are understandably limited, it is suggested that transportation safety organizations, such as NHTSA, should establish rules that ensure that future investigations are more thorough and that safety-related technical findings in litigation cannot be hidden behind the veil of secrecy of a settlement agreement.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/09/firmware-forensics-best-practices-in-embedded-software-source-code-discovery/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t Follow These 5 Dangerous Coding Standard Rules</title>
		<link>http://embeddedgurus.com/barr-code/2011/08/dont-follow-these-5-dangerous-coding-standard-rules/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/08/dont-follow-these-5-dangerous-coding-standard-rules/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 19:13:57 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[Efficient C/C++]]></category>
		<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=670</guid>
		<description><![CDATA[Over the summer I happened across a brief blog post by another firmware developer in which he presented ten C coding rules for better embedded C code. I had an immediate strong negative reaction to half of his rules and later came to dislike a few more, so I&#8217;m going to describe what I don&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>Over the summer I happened across a brief blog post by another firmware developer in which he presented ten C coding rules for better embedded C code.  I had an immediate strong negative reaction to half of his rules and later came to dislike a few more, so I&#8217;m going to describe what I don&#8217;t like about each.  I&#8217;ll refer to this author as <em>BadAdvice</em>.  I hope that if you have followed rules like the five below my comments will persuade you to move away from those toward <a href="http://netrino.com/coding-standard">a set of embedded C coding rules that keep bugs out</a>.  If you disagree, please start a constructive discussion in the comments.</p>
<p><strong>Bad Rule #1: Do not divide; use right shift.</strong></p>
<p>As worded, the above rule is way too broad.  It&#8217;s not possible to always avoid C&#8217;s division operator.  First of all, right shifting only works as a substitute for division when it is integer division and the denominator is a power of two (e.g., right shift by one bit to divide by 2, two bits to divide by 4, etc.).  But I&#8217;ll give BadAdvice the benefit of the doubt and assume that he meant to say you should &#8220;use right shift as a substitute for division whenever possible&#8221;.</p>
<p>For his example, BadAdvice shows code to compute an average over 16 integer data samples, which are accumulated into a variable <code>sum</code>, during the first 16 iterations of a loop.  On the 17th iteration, the average is computed by right shifting sum by 4 bits (i.e., dividing by 16).  Perhaps the worst thing about this example code is how much it is tied a pair of <code>#define</code>s for the magic numbers 16 and 4.  A simple but likely refactoring to average over 15 instead of 16 samples would break the entire example&#8211;you&#8217;d have to change from the right shift to a divide proper.  It&#8217;s also easy to imagine someone changing <code>AVG_COUNT</code> from 16 to 15 without realizing about the shift; and if you didn&#8217;t change this, you&#8217;d get a bug in that the sum of 15 samples would still be right shifted by 4 bits.</p>
<p><em>Better Rule: Shift bits when you mean to shift bits and divide when you mean to divide.</em></p>
<p>There are many sources of bugs in software programs.  The original programmer creates some bugs.  Other bugs result from misunderstandings by those who later maintain, extend, port, and/or reuse the code.  Thus coding rules should emphasize readability and portability most highly.  The choice to deviate from a good coding rule in favor of efficiency should be taken only within a subset of the code.  Unless there is a very specific function or construct that needs to be hand optimized, efficiency concerns should be left to the compiler.</p>
<p><strong>Bad Rule #2: Use variable types in relation to the maximum value that variable may take.</strong></p>
<p>BadAdvice gives the example of a variable named <code>seconds</code>, which holds integer values from 0 to 59.  And he shows choosing <code>char</code> for the type over <code>int</code>.  His stated goal is to reduce memory use.</p>
<p>In principle, I agree with the underlying practices of not always declaring variables <code>int</code> and choosing the type (and signedness) based on the maximum range of values.  However, I think it essential that any practice like this be matched with a corresponding practice of always declaring specifically sized variables using <a href="http://www.netrino.com/Embedded-Systems/How-To/C-Fixed-Width-Integers-C99">C99&#8242;s portable fixed-width integer types</a>.</p>
<p>It is impossible to understand the reasoning of the original programmer from <code>unsigned char seconds;</code>.  Did he choose <code>char</code> because it is big enough or for some other reason?  (Remember too that a plain <code>char</code> may be naturally signed or unsigned, depending on the compiler.  Perhaps the original programmer even knows his compiler&#8217;s <code>char</code>s are default <code>unsigned</code> and omits that keyword.)  The intent behind variables declared <code>short</code> and <code>long</code> is at least as difficult to decipher.  A <code>short</code> integer may be 16-bits or 32-bits (or something else), depending on the compiler; a width the original programmer may have (or may not have) relied upon.</p>
<p><em>Better Rule: Whenever the width of an integer matters, use C99&#8242;s portable fixed-width integer types.</em></p>
<p>A variable declared <code>uint16_t</code> leaves no doubt about the original intent as it is very clearly meant to be a container for an unsigned integer value no wider than 16-bits.  This type selection adds new and useful information to the source code and makes programs both more readable and more portable.  Now that C99 has standardized the names of fixed-width integer types, declarations involving <code>short</code> and <code>long</code> should no longer be used.  Even <code>char</code> should only be used for actual character (i.e., ASCII) data.  (Of course, there may still be <code>int</code> variables around, where size does not matter, such as in loop counters.)</p>
<p><strong>Bad Rule #3: Avoid &gt;= and use &lt;.</strong></p>
<p>As worded above, I can&#8217;t say I understand this rule or its goal sufficiently, but to illustrate it BadAdvice gives the specific example of an if-else if wherein he recommends <code>if (speed &lt; 100) ... else if (speed &gt; 99)</code> instead of <code>if (speed &lt; 100) ... else if (speed &gt;= 100)</code>.  Say what?  First of all, why not just use else for that specific scenario, as <code>speed</code> must be either below 100 or 100 or above.  </p>
<p>Even if we assume we need to test for less than 100 first and then for greater than or equal to 100 second, why would anyone in their right mind prefer to use greater than 99?  That would be confusing to any reader of the code.  To me it reads like a bug and I need to keep going back over it to find the logical problem with the apparently mismatched range checks.  Additionally, I believe that BadAdvice&#8217;s terse rationale that &#8220;Benefits: Lesser Code&#8221; is simply untrue. Any half decent compiler should be able to optimize either comparison as needed for the underlying processor.</p>
<p><em>Better Rule: Use whatever comparison operator is easiest to read in a given situation.</em></p>
<p>One of the very best things any embedded programmer can do is to make their code as readable as possible to as broad an audience as possible.  That way another programmer who needs to modify your code, a peer doing code review to help you find bugs, or even you years later, will find the code hard to misinterpret.</p>
<p><strong>Bad Rule #4: Avoid variable initialization while defining.</strong></p>
<p>BadAdvice says that following the above rule will make initialization faster.  He gives the example of <code>unsigned char MyVariable = 100;</code> (not preferred) vs:</p>
<p><code><br />
#define INITIAL_VALUE 100<br />
unsigned char MyVariable;<br />
// Before entering forever loop in main<br />
MyVariable = INITIAL_VALUE<br />
</code></p>
<p>Though it&#8217;s unclear from the above, let&#8217;s assume that <code>MyVariable</code> is a local stack variable.  (It could also be global, the way his pseudo code is written.)  I don&#8217;t think there should be a (portably) noticeable efficiency gain from switching to the latter.  And I do think that following this rule creates an opening to forget to do the initialization or to unintentionally place the initialization code within a conditional clause.</p>
<p><em>Better Rule: Initialize every variable as soon as you know the initial value.</em></p>
<p>I&#8217;d much rather see every variable initialized on creation with perhaps the creation of the variable postponed as long as possible.  If you&#8217;re using a C99 or C++ compiler, you can declare a variable anywhere within the body of a function.</p>
<p><strong>Bad Rule #5: Use #defines for constant numbers.</strong></p>
<p>The example given for this rule is of defining three constant values, including <code>#define ON 1</code> and <code>#define OFF 0</code>.  The rationale is &#8220;Increased convenience of changing values in a single place for the whole file. Provides structure to the code.&#8221;  And I agree that using named constants instead of magic numbers elsewhere in the code is a valuable practice.  However, I think there is an even better way to go about this.</p>
<p><em>Better Rule: Declare constants using <code>const</code> or <code>enum</code>.</em></p>
<p>C&#8217;s <code>const</code> keyword can be used to declare a variable of any type as unable to be changed at run-time.  This is a preferable way of declaring constants, as they are in this way given a type that can be used to make comparisons properly and enabling them to be type-checked by the compiler if they are passed as parameters to function calls.  Enumeration sets may be used instead for integer constants that come in groups, such as <code>enum { OFF = 0, ON };</code>.</p>
<p><strong>Final Thoughts</strong></p>
<p>There are two scary things about these and a few of the other rules on BadAdvice&#8217;s blog.  First, is that they are out there on the Internet to be found with a search for embedded C coding rules.  Second, is that BadAdvice&#8217;s bio says he works on medical device design.  I&#8217;m not sure which is worse.  But I do hope the above reasoning and proposed better rules gets you thinking about how to develop more reliable embedded software with fewer bugs. </p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/08/dont-follow-these-5-dangerous-coding-standard-rules/feed/</wfw:commentRss>
		<slash:comments>57</slash:comments>
		</item>
		<item>
		<title>Is &#8220;(uint16_t) -1&#8243; Portable C Code?</title>
		<link>http://embeddedgurus.com/barr-code/2011/06/is-uint16_t-1-portable-c-code/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/06/is-uint16_t-1-portable-c-code/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 14:59:46 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=641</guid>
		<description><![CDATA[Twice in the last month, Netrino&#8217;s engineers have run across third-party middleware that included a statement of the form: uint16_t variable = (uint16_t) -1; which we take as the author&#8217;s clever way of coding: 0xFFFF We aren&#8217;t naturally inclined to like the obfuscation anyway, but also wondered if &#8220;(uint16_t) -1&#8243; is even portable C code? [...]]]></description>
			<content:encoded><![CDATA[<p>Twice in the last month, Netrino&#8217;s engineers have run across third-party middleware that included a statement of the form:</p>
<p><code>uint16_t variable = (uint16_t) -1;</code></p>
<p>which we take as the author&#8217;s clever way of coding:</p>
<p><code>0xFFFF</code></p>
<p>We aren&#8217;t naturally inclined to like the obfuscation anyway, but also wondered if &#8220;(uint16_t) -1&#8243; is even portable C code? And, supposing it is portable, is there some advantage we don&#8217;t know about that suggests using that form over the hex literal? In the process of researching these issues, I learned a helpful fact or two worth sharing.</p>
<p><strong>Q: Is the result of &#8220;(uint16_t) -1&#8243; guaranteed (by the ISO C standard) to be 0xFFFF?</strong></p>
<p>A: No. But it&#8217;s likely the result will be 0xFFFF on most compilers/processors, since there is really just the one common internal CPU representation of unsigned integers. (For signed integers, most/all processors will use the common 2&#8242;s complement representation underneath&#8211;even though that&#8217;s not required in any way by the language standard.)</p>
<p><strong>Q: Is there any advantage to writing 0xFFFF that way?</strong></p>
<p>A: According to the C99 Standard, all conforming implementations support uint_least16_t, but some may not support <a href="http://www.netrino.com/Embedded-Systems/How-To/C-Fixed-Width-Integers-C99">uint16_t</a>.  If the platform doesn&#8217;t support uint16_t, then &#8220;(uint16_t) -1&#8243; won&#8217;t compile, but 0xFFFF will compile as a value of some larger unsigned integer type (i.e., a bug waiting to happen). </p>
<p>Of course, platforms that don&#8217;t have a fixed-width 16-bit unsigned capability are rare, though it may be that some DSPs fall into that category. The same issue applies to <a href="http://www.netrino.com/Embedded-Systems/How-To/C-Fixed-Width-Integers-C99">uint32_t</a> and 0xFFFFFFFF, of course.  However, I suspect platforms that don&#8217;t have a fixed-width 32-bit unsigned capability are even rarer.</p>
<p><strong>Q: What is the best way to represent the maximum unsigned integer value of a given size?</strong></p>
<p>A: The very best way to represent the maximum values for unsigned (and signed) fixed-width types is to use the constants named in C99&#8242;s stdint.h header file. These are of the form UINTn_MAX (and INTn_MAX) where n is the number of bits (e.g., UINT16_MAX). That is guaranteed to either work or not compile, with no middle ground for bugs.</p>
<p><em>Hat Tip</em>: Many thanks to C and C++ standards guru <a href="http://www.dansaks.com">Dan Saks</a> for help with these answers.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/06/is-uint16_t-1-portable-c-code/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>How to Enforce Coding Standards (Automatically)</title>
		<link>http://embeddedgurus.com/barr-code/2011/05/how-to-enforce-coding-standards-automatically/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/05/how-to-enforce-coding-standards-automatically/#comments</comments>
		<pubDate>Wed, 25 May 2011 15:01:34 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[ethics]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[standards]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=623</guid>
		<description><![CDATA[Coding standards can be an important tool in the fight to keep bugs out of embedded software. Unfortunately, too many well-intentioned (especially, corporate) coding standards are ineffective and gather more dust than followers. The hard truth is that enforcement of coding standards too often depends on programmers already under deadline pressure to be disciplined while [...]]]></description>
			<content:encoded><![CDATA[<p>Coding standards can be an important tool in the fight to <a href="http://www.eetimes.com/discussion/other/4008255/Bug-killing-standards-for-firmware-coding">keep bugs out of embedded software</a>. Unfortunately, too many well-intentioned (especially, corporate) coding standards are ineffective and gather more dust than followers. The hard truth is that enforcement of coding standards too often depends on programmers already under deadline pressure to be disciplined while they code and/or to make time to perform peer code reviews. And when peer reviews are done in this scenario, they too easily devolve into compliance discussions that miss the forest for the trees.</p>
<p>To ensure your selected coding standard is followed and thus effective, your team should find as many automated ways to enforce as many of its rules as possible. And you should make such automated rule checking part of the everyday software build process. Ideally, you would also restrict version control check-ins to just code that has passed all the automated checks. No code that breaks any of these automatable rules should be allowed in peer code reviews. That way, code reviewers can focus their limited hours on (1) what the code is supposed to do, (2) whether it does so correctly, and (3) whether the code and comments together are easily understood by all involved.</p>
<p>One of the ways <a href="http://netrino.com">Netrino</a> has found to increase compliance with its <a href="http://netrino.com/Coding-Standard">Embedded C Coding Standard</a> is by configuring static analysis tools we already use to automatically enforce individual rules.</p>
<p>Perhaps the best option is to use a static analysis tool that includes built-in support for your chosen coding standard and/or the ability to be customized to proprietary rules. <a href="http://ldra.com">LDRA</a>&#8216;s static analysis engine, a screenshot from which is shown in the image below, comes preconfigured to check about 80% of the &#8220;<a href="http://www.ldra.com/netrino.asp">NETRINO</a>&#8221; coding standard rules, as well as a number of other widely used coding standards.</p>
<p align="center"><a href="http://www.ldra.com/images/integrations/Figure1_Netrino_300.JPG"><img width="700" src="http://www.ldra.com/images/integrations/Figure1_Netrino_300.JPG" alt="The LDRA Tool Suite Enforces Netrino's Embedded C Coding Standard" /></a></p>
<p>But there are other, less expensive, options available as well. Two tools that we have used are <a href="http://msquaredtechnologies.com/m2rsm/index.htm">RSM</a> and <a href="http://gimpel.com/">PC-Lint</a> from M Squared Technologies and Gimpel Software, respectively. Neither can easily and independently enforce all of our <a href="http://netrino.com/Coding-Standard">Embedded C Coding Standard</a> rules. But used together and properly configured, these two tools offer a low-cost automated coding standards enforcement mechanism that covers a large percentage of the rules.</p>
<p><strong>Configuring RSM</strong></p>
<p>The following <a href="http://msquaredtechnologies.com/m2rsm/index.htm">RSM</a> outputs can be examined and configuration options set (in version 7.75) to assist in the automated enforcement of the identified rules from the <a href="http://netrino.com/Coding-Standard">Embedded C Coding Standard</a>. Pricing for the RSM tool, which runs on Windows, Linux, and other versions of Unix (including Mac OS X), is online at <a href="http://msquaredtechnologies.com/m2rsm/order/">http://msquaredtechnologies.com/m2rsm/order/</a>.</p>
<table>
<tr>
<td><strong>Rule #</strong></td>
<td><strong>Brief Description</strong></td>
<td><strong>Tool Configuration Notes</strong></td>
</tr>
<tr>
<td>1.2.a</td>
<td>Line length limited to 80 characters.</td>
<td>Quality Notice 1</td>
</tr>
<tr>
<td>1.3.a</td>
<td>Braces surround all blocks of code.</td>
<td>Quality Notice 22</td>
</tr>
<tr>
<td>1.7.c</td>
<td>Keyword goto not used.</td>
<td>Quality Notice 9</td>
</tr>
<tr>
<td>1.7.d</td>
<td>Keyword continue not used.</td>
<td>Quality Notice 43</td>
</tr>
<tr>
<td>1.7.e</td>
<td>Keyword break not used outside switch.</td>
<td>Quality Notice 44</td>
</tr>
<tr>
<td>2.2</td>
<td>Location and content of comments.</td>
<td>Quality Notices 17, 20, 51; Options -Es -Ec -EC</td>
</tr>
<tr>
<td>3.1</td>
<td>Use of white space.</td>
<td>Quality Notices 16, 19</td>
</tr>
<tr>
<td>3.5.a</td>
<td>No tab characters.</td>
<td>Quality Notice 30; Option -Dt</td>
</tr>
<tr>
<td>3.6.a</td>
<td>UNIX-style (single-character) linefeeds.</td>
<td>Option -Du</td>
</tr>
<tr>
<td>4.1.a-d</td>
<td>Module naming conventions.</td>
<td>Option -Rn</td>
</tr>
<tr>
<td>4.2.a</td>
<td>Precisely one header file per source file.</td>
<td>Option -Rn</td>
</tr>
<tr>
<td>6.1.a-e</td>
<td>Precisely one header file per source file.</td>
<td>Quality Notice 2; Option -l</td>
</tr>
<tr>
<td>6.2</td>
<td>Cyclomatic complexity of functions.</td>
<td>Quality Notices 10, 18, 27; Option -c</td>
</tr>
<tr>
<td>6.3.b.i-iii</td>
<td>Preprocessor macro safety mechanisms.</td>
<td>Option -m</td>
</tr>
<tr>
<td>8.2.d</td>
<td>If-else if statements always have else.</td>
<td>Quality Notice 22</td>
</tr>
<tr>
<td>8.3.b</td>
<td>Switch statements always have default.</td>
<td>Quality Notices 13, 14, 56</td>
</tr>
<tr>
<td>8.5.a</td>
<td>No unconditional jumps.</td>
<td>Quality Notices 9, 43, 444</td>
</tr>
</table>
<p></p>
<p><strong>Configuring PC-Lint</strong></p>
<p>In a <a href="/barr-code/2011/06/how-to-enforce-coding-standards-using-pc-lint/">future blog post</a>, I will similarly identify <a href="http://gimpel.com">PC-Lint</a> configurations that can be used (in version 9.0) to assist in the automated enforcement of specific rules from the <a href="http://netrino.com/Coding-Standard">Embedded C Coding Standard</a>. Pricing for the PC-Lint tool, which runs on Windows, Linux, and other versions of Unix (including Mac OS X), is online at <a href="http://gimpel.com/html/order.htm">http://gimpel.com/html/order.htm</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/05/how-to-enforce-coding-standards-automatically/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What NHTSA/NASA Didn&#8217;t Consider re: Toyota&#8217;s Firmware</title>
		<link>http://embeddedgurus.com/barr-code/2011/03/what-nhtsanasa-didnt-consider-re-toyotas-firmware/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/03/what-nhtsanasa-didnt-consider-re-toyotas-firmware/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 23:10:54 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[engineering]]></category>
		<category><![CDATA[ethics]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=548</guid>
		<description><![CDATA[In a blog post yesterday (Unintended Acceleration and Other Embedded Software Bugs), I wrote extensively on the report from NASA&#8217;s technical team regarding their analysis of the embedded software in Toyota&#8217;s ETCS-i system. My overall point was that it is hard to judge the quality of their analysis (and thereby the overall conclusion that the [...]]]></description>
			<content:encoded><![CDATA[<p>In a blog post yesterday (<a href="/barr-code/2011/03/unintended-acceleration-and-other-embedded-software-bugs/">Unintended Acceleration and Other Embedded Software Bugs</a>), I wrote extensively on the report from NASA&#8217;s technical team regarding their analysis of the embedded software in Toyota&#8217;s ETCS-i system. My overall point was that it is hard to judge the quality of their analysis (and thereby the overall conclusion that the software isn&#8217;t to blame for unintended accelerations) given the large number of redactions.</p>
<p>I need to put the report down and do some other work at this point, but I have a few other thoughts and observations worth writing down.</p>
<p><strong>Insufficient Explanations</strong></p>
<p>First, some of the explanations offered by Toyota, and apparently accepted by NASA, strike me as insufficent.  For example, at pages 129-132 of <a href="http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA_FR_Appendix_A_Software.pdf">Appendix A</a> to the NASA Report there is a discussion of <a href="http://en.wikipedia.org/wiki/Recursion">recursion</a> in the Toyota firmware. &#8220;The question then is how to verify that the indirect recursion in the ETCS-i does in fact terminate (i.e., has no infinite recursion) and does not cause a stack overflow.&#8221; </p>
<blockquote><p>
&#8220;For the case of stack overflow, [redacted phrase], and therefore a stack overflow condition cannot be detected precisely. It is likely, however, that overflow would cause some form of memory corruption, which would in turn cause some <strong>bad behavior</strong> that would then cause a watchdog timer reset. Toyota relies on this assumption to claim that stack overflow does not occur because no reset occurred during testing.&#8221; (emphasis added)
</p></blockquote>
<p>I have written about what really happens during stack overflow before (<a href="http://embeddedgurus.com/barr-code/2010/03/firmware-specific-bug-4-stack-overflow/">Firmware-Specific Bug #4: Stack Overflow</a>) and this explains why a reset may not result and also why it is so hard to trace a stack overflow back to that root cause. (From page 20, in NASA&#8217;s words: &#8220;The system stack is limited to just 4096 bytes, it is therefore important to secure that no execution can exceed the stack limit. This type of check is normally simple to perform in the absence of recursive procedures, which is standard in safety critical embedded software.&#8221;)</p>
<p>Similarly, &#8220;Toyota designed the software with a high margin of safety with respect to deadlines and timeliness. &#8230; [but] documented no formal verification that all tasks actually meet this deadline requirement.&#8221; and &#8220;All verification of timely behavior is accomplished with CPU load measurements and other measurement-based techniques.&#8221; It&#8217;s not clear to me if the NASA team is saying it buys those Toyota explanations or merely wanted to write them down. However, I do not see a sufficient explanation in this wording from page 132:</p>
<blockquote><p>
&#8220;The [worst case execution time] analysis and recursion analysis involve two distinctly different problems, but they have one thing in common: Both of their failure modes would result in a CPU reset. &#8230; These potential malfunctions, and many others such as concurrency deadlocks and CPU starvation, would <strong>eventually</strong> manifest as a spontaneous system reset.&#8221; (emphasis added)
</p></blockquote>
<p>Might not a <a href="http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-7-deadlock/">deadlock</a>, starvation, <a href="http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-8-priority-inversion/">priority inversion</a>, or infinite recursion be capable of producing a bit of &#8220;bad behavior&#8221; (perhaps even unintended acceleration) before that &#8220;eventual&#8221; reset? Or might not a stack overflow just corrupt one or a few important variables a little bit and that result in bad behavior rather than or before a result? These kinds of possibilities, even at very low probabilities, are important to consider in light of NASA&#8217;s calculation that the U.S.-owned Camry 2002-2007 fleet alone is running this software a cumulative one billion hours per year.</p>
<p><strong>Paths Not Taken</strong></p>
<p>My second observation is based upon reflection on the steps NASA might have taken in its review of Toyota&#8217;s ETCS-i firmware, but apparently did not. Specifically, there is no mention anywhere (unless it was entirely redacted) of: </p>
<ul>
<li><a href="http://www.netrino.com/Embedded-Systems/How-To/RMA-Rate-Monotonic-Algorithm">rate monotonic analysis</a>, which is a technique that Toyota could have used to validate the critical set of tasks with deadlines and higher priority ISRs (and that NASA could have applied in its review),</li>
<li><a href="http://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>, which NASA might have used as an additional winnowing tool to focus its limited time on particularly complex and hard to test routines,</li>
<li><a href="http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm089543.htm">hazard analysis and mitigation</a>, as those terms are defined by FDA guidelines regarding software contained in medical devices, nor</li>
<li>any discussion or review of Toyota&#8217;s specific software testing regimen and bug tracking system.
</ul>
<p>Importantly, there is also a complete absence of discussion of how Toyota&#8217;s ETCS-i firmware versions evolved over time. Which makes and models (and model years) had which versions of that firmware? (Presumably there were also hardware changes worthy of note.) Were updates or patches ever made to cars once they were sold, say while at the dealer during official recalls or other types of service?</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/03/what-nhtsanasa-didnt-consider-re-toyotas-firmware/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Unintended Acceleration and Other Embedded Software Bugs</title>
		<link>http://embeddedgurus.com/barr-code/2011/03/unintended-acceleration-and-other-embedded-software-bugs/</link>
		<comments>http://embeddedgurus.com/barr-code/2011/03/unintended-acceleration-and-other-embedded-software-bugs/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 19:09:54 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Coding Standards]]></category>
		<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[ethics]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[outsourcing]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[standards]]></category>
		<category><![CDATA[trends]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=517</guid>
		<description><![CDATA[Last month, NHTSA and the NASA Engineering and Safety Center (NESC) published reports of their joint investigation into the causes of unintended acceleration in Toyota vehicles. NASA&#8217;s multi-disciplinary NESC technical team was asked, by Congress, to assist NHTSA by performing a review of Toyota&#8217;s electronic throttle control and the associated embedded software. In carefully worded [...]]]></description>
			<content:encoded><![CDATA[<p>Last month, <a href="http://www.nhtsa.dot.gov">NHTSA</a> and the <a href="http://www.nasa.gov/offices/nesc/home/index.html">NASA Engineering and Safety Center (NESC)</a> published reports of their joint investigation into the causes of unintended acceleration in Toyota vehicles. NASA&#8217;s multi-disciplinary NESC technical team was asked, by Congress, to assist NHTSA by performing a review of Toyota&#8217;s electronic throttle control and the associated embedded software. In carefully worded concluding statement, NASA stated that it &#8220;found no electronic flaws in Toyota vehicles capable of producing the large throttle openings required to create dangerous high-speed unintended acceleration incidents.&#8221; (The official reports and a number of supporting files are available for download at <a href="http://www.nhtsa.gov/UA">http://www.nhtsa.gov/UA</a>.)</p>
<p>The first thing you will notice if you join me in trying to judge the technical issues for yourself are the redactions: pages and pages of them. In parts and entirely for unexplained reasons, this report on automotive electronics reads like the public version of a CIA Training Manual. I&#8217;ve observed that approximately 193 of the 1,061 pages released so far feature some level of redaction (via black boxes, which obscure from a single number, word, or phrase to a full table, page, or section). The redactions are at their worst in NASA&#8217;s <a href="http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA_FR_Appendix_A_Software.pdf">Appendix A</a>, which describes NASA&#8217;s review of Toyota&#8217;s embedded software in detail. More than half of all the pages with redactions (including the vast majority of fully redacted tables, pages, and sections) are in that Appendix.</p>
<p>Despite the redactions, we can still learn some interesting facts about Toyota&#8217;s embedded software and NASA&#8217;s technical review of the same.  The bulk of the below outlines what I&#8217;ve been able to make sense of in about two days of reading.  Throughout, my focus is on embedded software inside the electronic throttle control, so I&#8217;m leaving out considerations of other potential causes, including EMI (which NASA also investigated).  First a little background on the investigation.</p>
<p><strong>Background</strong></p>
<p>Although the inquiry was taken to examine unintended acceleration reports across all Toyota, Scion, and Lexus models, NASA focused its technical inquiry almost entirely on Toyota Camry models equipped with the <em>Electronic Throttle Control System, Intelligent</em> (ETCS-i). The Camry has long been among the top cars bought in the U.S., so this choice probably made finding relevant complaint data and affected vehicles easier for NHTSA. (BTW, NASA says the voluntary complaint database shows both that unintended accelerations were reported before the introduction of electronic throttle control and that press coverage and Congressional hearings can increase the volume of complaints.)</p>
<p>According to a press release by the company made upon publication of the NHTSA and NASA reports, Toyota&#8217;s ETCS-i has been installed in &#8220;more than 40 million cars and trucks sold around the world, including more than 16 million in the United States.&#8221; Undoubtedly, ETCS-i has also &#8220;made possible significant safety advances such as vehicle stability control and traction control.&#8221; But as with any other embedded system there have been refinements made through the years to both the electronics and the embedded software. </p>
<p>Though Toyota apparently made available, under agreed terms and via its attorneys, schematics, design documents, and source code &#8220;for multiple Camry years and versions&#8221; (Appendix A, p. 9) as well as many of the Japanese engineers involved in its design and evolution, NASA only closely examined one version. In NASA&#8217;s words, &#8220;The area of emphasis will be the 2005 Toyota Camry because this vehicle has a consistently high rate of reported &#8216;UA events&#8217; over all Toyota models and all years, when normalized to the number of each model and year, according to NHTSA data.&#8221; (p. 7) Except as otherwise stated, everything else in this column concerns the electronics and firmware found in that year, make, and model.</p>
<p><strong>Event Data Recorders</strong></p>
<p>Event Data Recorder (EDR) is the generic term for the automotive equivalent of an aircraft black box <a href="http://en.wikipedia.org/wiki/Flight_data_recorder">flight data recorder</a>. EDRs were first installed in cars in the early 1990s and have increased in use as well as sophistication in the years since. Generally speaking, the event data recorder is an embedded system residing within the airbag control module located in the front center of the engine compartment. The event data recorder is connected to other parts of the car&#8217;s electronics via the CAN bus and is always monitoring vehicle speed, the position of the brake and accelerator pedals, and other key parameters. </p>
<p>In the event of an impossibly high (for the vehicle operating normally) acceleration or deceleration sensor reading, Toyota&#8217;s latest event data recorders save the prior five 1Hz samples of these parameters in a non-volatile memory area. Once saved, an event record can be read over the car&#8217;s On-Board Diagnostics (OBD) port (or, in the event of a more severe accident, directly from the airbag control module) via a special cable and PC software. If the airbag actually deploys, the event record will be permanently locked. The last 2 or 3 (depending on version) lesser &#8220;bump&#8221; records are also stored, but may be overwritten in a FIFO manner.</p>
<p>This investigation of Toyota&#8217;s unintended acceleration marked the first time that anyone from NHTSA had ever read data from a Toyota event data recorder. (Toyota representatives apparently testified in Congress that there had previously just been one copy of the necessary PC software in the U.S.) As part of this study, NHTSA validated and used tools provided by Toyota to extract historical data from 52 vehicles involved in incidents of unintended acceleration, with acknowledged bias toward geographically reachable recent events. After reviewing driver and other witness statements and examining said black box data, NHTSA concluded that 39 of these 52 events were explainable as &#8220;pedal misapplications.&#8221; That&#8217;s a very nice way of saying that whenever the driver reported &#8220;stepping on the brake&#8221; he or she had pressed the accelerator pedal by mistake. Figure 5 of a <a href="http://www.nhtsa.gov/staticfiles/nvs/pdf/NHTSA-Toyota_EDR_field_inspection.pdf">supplemental report describing these facts</a> portrays an increasing likelihood of such incidents with driver age vs. the bell curve of Camry ownership by age.</p>
<p>Note that no record is apparently ever made, in the event data recorder or elsewhere, of any events or state changes within the ETCS-i firmware. So-called &#8220;Diagnostic Trouble Codes&#8221; concerning sensor and other hardware failures are recorded in non-volatile memory and the presence of one or more such codes enables the &#8220;Check Engine&#8221; light on the dashboard. But no logging is done of significant software faults, including but not limited to watchdog-initiated resets. </p>
<p><strong>Engine Control Module</strong></p>
<p>ETCS-i is a collection of components and features that was changed in the basic engine design when Toyota switched from mechanical to electronic throttle control. (Electronic throttle control is also known as &#8220;throttle-by-wire&#8221;.) Toyota has used two different types of pedal sensors in the ETCS-i system, always in a redundant fashion. The earlier design, pre-2007, using potentiometers was susceptible to current leakage via growth of <a href="http://nepp.nasa.gov/whisker/">tin whiskers</a>. Though this type of failure was not known to cause sudden high-speed behaviors, it did seem to be associated with a higher number of warranty claims. The newer pedal sensor design uses <a href="http://en.wikipedia.org/wiki/Hall_effect_sensor">Hall effect sensors</a>.</p>
<p>Importantly, the brakes are not a part of the ETCS-i system. In the 2005 Camry, Toyota&#8217;s brake pedal was mechanically controlled. (It may still be.) It appears this is one of the reasons the NASA team felt comfortable with their conclusion that driver reports of wide open throttle behavior that could not be stopped with the brakes were not caused by software failures (alone). &#8220;The NESC team did not find an electrical path from the ETCS-i that could disable braking.&#8221; (<a href="http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA-UA_report.pdf">NASA Report</a>, p. 15) It is clear, though, that power assisted brakes lose the enabling vacuum pressure when the throttle is wide open and the driver subsequently pumps the brakes; thus any system failure that opened the throttle could indirectly make bringing the vehicle to a stop considerably harder.</p>
<p>The Engine Control Module at the heart of the ETCS-i consists of a Main-CPU and a Sub-CPU located within a pair of ASICs. The Sub-CPU contains a set of A/D converters that translates raw sensor inputs, such as voltages VPA and VPA1 from the accelerator pedal, into digital position values and sends them to the Main-CPU via a serial interface. In addition, the Sub-CPU monitors the outputs of the Main-CPU and is able to reset (in the manner of a watchdog timer) the Main-CPU.</p>
<p>The Main-CPU is reported to be a <a href="http://america2.renesas.com/docs/files/U14559EJ2V0UM00.pdf">V850E1</a> microcontroller, which is &#8220;a 32-bit RISC CPU core for ASIC&#8221; designed by Renesas (nee NEC). The V850E1 processor has a 64MB program address space, which is part of an overall 4GB linear address space. The Main-CPU also keeps tabs on the Sub-CPU and can reset it if anything is found wrong.</p>
<p>NASA reports that the embedded software in the Main-CPU is written (mostly) in ANSI C and compiled using a <a href="http://www.ghs.com">GreenHills</a> C compiler (Appendix A, p. 14). Furthermore, an <a href="http://en.wikipedia.org/wiki/OSEK">OSEK</a>-compliant real-time operating system with fixed-priority preemptive scheduling is used to manage a redacted (but apparently larger than ten, based on the size of the redaction) number of real-time tasks. The actual firmware development (design, coding and unit testing) was outsourced to <a href="http://www.globaldenso.com/en/">Denso</a> (p. 19). Toyota apparently performed integration testing and ran several commercial and in-house static analysis tools, including <a href="http://www.programmingresearch.com/qac_main.html">QAC</a> (p. 20). The code was written in English, with Japanese comments and design documents, and follows a proprietary Toyota naming convention/coding standard that predates but half overlaps with the 1998 version of <a href="http://www.misra-c.com/Activities/MISRAC/tabid/160/Default.aspx">MISRA-C</a>.</p>
<p><strong>Are There Bugs in Toyota&#8217;s Firmware?</strong></p>
<p>In the NASA Report&#8217;s executive summary it is made clear that &#8220;because proof that the ETCS-i caused the reported UAs was not found does not mean it could not occur.&#8221; (NASA Report, p. 17) The report also states that NASA&#8217;s analysis was time-limited and top-down, remarking &#8220;The Toyota Electronic Throttle Control (ETC) was far more complex than expected involving hundreds of thousands of lines of software code&#8221; and that this <a href="http://www.nhtsa.gov/staticfiles/nvs/pdf/NHTSA-Toyota_peer_review.pdf">affected the quality of a planned peer review</a>.</p>
<p>It&#8217;s stated that &#8220;Reported [Unintended Accelerations (UAs)] are rare events. Typically, the reporting of UAs is about 1/100,000 vehicles/year.&#8221; But there are millions of cars on the road, and so NHTSA has collected some &#8220;831 UA reports for Camry&#8221; alone. &#8220;Over one-half of the reported events described large (greater than 25 degrees) high-throttle opening UAs of unknown cause&#8221; (NASA Report, p. 14), the causes of which are never fully explained in these reports.</p>
<p>The NASA apparently identified some lesser firmware bugs themselves, saying &#8220;[our] logic model verifications identified a number of potential issues. All of these issues involved unrealistic timing delays in the multiprocessing, asynchronous software control flow.&#8221; (Appendix A, p. 11)  NASA also spent time simulating possible <a href="http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-1-race-condition/">race conditions</a> due to worrisome &#8220;recursively nested interrupt masking&#8221; (pp, 44-46); note, though, that simulation success is not a sufficient proof of lack of races. As well, the NASA team seems to recommend &#8220;reducing the amount of global data&#8221; (p. 38) and eliminating &#8220;dead code&#8221; (p. 40). </p>
<p>Additionally, the redacted text in other parts of Appendix A seems to be obscuring that:</p>
<ul>
<li>&#8220;<a href="http://gcc.gnu.org/">The standard gcc compiler</a> version 4&#8243; generated a redacted number of warnings (probably larger than 100) about the code, in 11 different warning categories. (p. 25)</li>
<li>&#8220;<a href="http://www.coverity.com/">Coverity</a> version 4.2&#8243; generated a redacted number of warnings (probably larger than 154) about the code, in 10 different warning categories. (p. 27)</li>
<li>&#8220;<a href="http://www.grammatech.com/products/codesonar/">Codesonar</a> version 3.6p1&#8243; generated a redacted number of warnings (probably larger than 136) about the code, in 10 different warning categories.</li>
<li>&#8220;<a href="http://spinroot.com/uno/">Uno</a> version 2.12&#8243; generated a redacted number of warnings (probably larger than 72) about the code, in 9 different warning categories.</li>
<li>The code contained at least 347 deviations from a subset of 14 of the <a href="http://www.misra-c.com/Activities/MISRAC/tabid/160/Default.aspx">MISRA-C rules</a>.</li>
<li>The code contained at least 243 violations of a subset of 9 of the 10 &#8220;<a href="http://spinroot.com/p10">Power of 10&#8211;Rules for Developing Safety Critical Code</a>,&#8221; which was published in IEEE Computer in 2006 by NASA team member Gerard Holzmann.</li>
</ul>
<p>It looks to me like Figure 6.2.3-1 of the NASA Report (p. 30) shows that UA complaints filed with NHTSA increased in the year of introduction of electronic throttle control for the vast majority of Toyota, Scion, and Lexus models&#8211;and that complaint counts have remained higher but generally declined over time since those transitions years. Such a complaint data pattern is perhaps consistent with firmware bugs. (Note to NHTSA: It would be helpful to see this same chart normalized by number of vehicles sold by model year and with the rows sorted by the year of ETC introduction. It would also be nice to see a chart of ETCS-i firmware versions and updates, which vehicles they apply to, and the dates on which each was put into new production vehicles or distributed through dealers.)</p>
<p><strong>Final Thoughts</strong></p>
<p>I am not privy to all of the facts considered by the NHTSA or NASA review teams and thus cannot say if I agree or disagree with their overall conclusion that embedded software bugs are not to blame for reports of unintended acceleration in Toyota vehicles. How about you? If you&#8217;ve spotted something I missed in the reports from NHTSA or NASA, please send me an e-mail or leave a comment below. Let&#8217;s keep the conversation going.</p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2011/03/unintended-acceleration-and-other-embedded-software-bugs/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #10: Jitter</title>
		<link>http://embeddedgurus.com/barr-code/2010/12/firmware-specific-bug-10-jitter/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/12/firmware-specific-bug-10-jitter/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 11:56:26 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[RTOS Multithreading]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=421</guid>
		<description><![CDATA[Some real-time systems demand not only that a set of deadlines be always met but also that additional timing constraints be observed in the process. Such as managing jitter. An example of jitter is shown in Figure 1. Here a variable amount of work (blue boxes) must be completed before every 10 ms deadline. As [...]]]></description>
			<content:encoded><![CDATA[<p>Some real-time systems demand not only that a set of deadlines be always met but also that additional timing constraints be observed in the process. Such as managing jitter.</p>
<p>An example of jitter is shown in Figure 1. Here a variable amount of work (blue boxes) must be completed before every 10 ms deadline. As illustrated in the figure, the deadlines are all met. However, there is considerable timing variation from one run of this job to the next. This jitter is unacceptable in some systems, which should either start or end their 10 ms runs more precisely.</p>
<p><a href='http://eetimes.com/ContentEETimes/Images/Design/Embedded/2010/1110/1110esdBarr03.gif'>Jitter Figure 1</a></p>
<p>If the work to be performed involves sampling a physical input signal, such as reading an analog-to-digital converter, it will often be the case that a precise sampling period will lead to higher accuracy in derived values. For example, variations in the inter-sample time of an optical encoder&#8217;s pulse count will lower the precision of the velocity of an attached rotation shaft.</p>
<p><em>Best Practice</em>: The most important single factor in the amount of jitter is the relative priority of the task or ISR that implements the recurrent behavior. The higher the priority the lower the jitter. The periodic reads of those encoder pulse counts should thus typically be in a timer tick ISR rather than in an RTOS task. </p>
<p>Figure 2 shows how the interval of three different 10 ms recurring samples might be impacted by their relative priorities. At the highest priority is a timer tick ISR, which executes precisely on the 10 ms interval. (Unless there are higher priority interrupts, of course.) Below that is a high-priority task (TH), which may still be able to meet a recurring 10-ms start time precisely. At the bottom, though, is a low priority task (TL) that has its timing greatly affected by what goes on at higher priority levels. As shown, the interval for the low priority task is 10 ms +/- approximately 5 ms.</p>
<p><a href='http://eetimes.com/ContentEETimes/Images/Design/Embedded/2010/1110/1110esdBarr04.gif'>Jitter Figure 2</a></p>
<p><a href="/barr-code/2010/11/firmware-specific-bug-9-incorrect-priority-assignment/">Firmware-Specific Bug #9</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/12/firmware-specific-bug-10-jitter/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #9: Incorrect Priority Assignment</title>
		<link>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-9-incorrect-priority-assignment/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-9-incorrect-priority-assignment/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 12:50:03 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[RTOS Multithreading]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=419</guid>
		<description><![CDATA[Get your priorities straight! Or suffer the consequence of missed deadlines. Of course, I&#8217;m talking here about the relative priorities of your real-time tasks and interrupt service routines. In my travels around the embedded design community, I&#8217;ve learned that most real-time systems are designed with ad hoc priorities. Unfortunately, mis-prioritized systems often &#8220;appear&#8221; to work [...]]]></description>
			<content:encoded><![CDATA[<p><em>Get your priorities straight! Or suffer the consequence of missed deadlines. Of course, I&#8217;m talking here about the relative priorities of your real-time tasks and interrupt service routines. In my travels around the embedded design community, I&#8217;ve learned that most real-time systems are designed with ad hoc priorities. </em></p>
<p>Unfortunately, mis-prioritized systems often &#8220;appear&#8221; to work fine without discernibly missing critical deadlines in testing. The worst-case workload may have never yet happened in the field or there is sufficient CPU to accidentally succeed despite the lack of proper planning. This has lead to a generation of embedded software developers being unaware of the proper technique. There is simply too little feedback from non-reproducible deadline misses in the field to the original design team—unless a death and a lawsuit forces an investigation.</p>
<p><em>Best Practice</em>: There is a science to the process of assigning relative priorities. That science is associated with the &#8220;rate monotonic algorithm,&#8221; which provides a formulaic way to assign task priorities based on facts. It is also associated with the &#8220;rate monotonic analysis,&#8221; which helps you prove that your correctly-prioritized tasks and ISRs will find sufficient available CPU bandwidth between them during extreme busy workloads called &#8220;transient overload.&#8221; It&#8217;s too bad most engineers don&#8217;t know how to use these tools.</p>
<p>There&#8217;s insufficient space in this column for me to explain why and how RMA works. But I&#8217;ve written on these topics before and recommend you start with &#8220;<a href="http://www.netrino.com/Embedded-Systems/How-To/RMA-Rate-Monotonic-Algorithm">Introduction to Rate-Monotonic Scheduling</a>&#8221; and then read my column &#8220;<a href="http://embeddedgurus.com/barr-code/2010/08/3-things-every-programmer-should-know-about-rma/">3 Things Every Programmer Should Know About RMA</a>.&#8221;</p>
<p>Please know that if you don&#8217;t use RMA to prioritize your tasks and ISRs (as a set), there&#8217;s only one entity with any guarantees: the one highest-priority task or ISR can take the CPU for itself at any busy time—barring priority inversions!—and thus has up to 100% of the CPU bandwidth available to it. Also note that there is no rule of thumb about what percentage of the CPU bandwidth you may safely use between a set of two or more runnables unless you do follow the RMA scheme.</p>
<p><a href="/barr-code/2010/11/firmware-specific-bug-8-priority-inversion/">Firmware-Specific Bug #8</a></p>
<p><a href="/barr-code/2010/12/firmware-specific-bug-10-jitter/">Firmware-Specific Bug #10</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-9-incorrect-priority-assignment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Firmware-Specific Bug #8: Priority Inversion</title>
		<link>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-8-priority-inversion/</link>
		<comments>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-8-priority-inversion/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 15:42:15 +0000</pubDate>
		<dc:creator>Michael Barr</dc:creator>
				<category><![CDATA[Firmware Bugs]]></category>
		<category><![CDATA[RTOS Multithreading]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[firmware]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[rtos]]></category>
		<category><![CDATA[safety]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://embeddedgurus.com/barr-code/?p=414</guid>
		<description><![CDATA[A wide range of nasty things can go wrong when two or more tasks coordinate their work through, or otherwise share, a singleton resource such as a global data area, heap object, or peripheral&#8217;s register set. In the first part of this column, I described two of the most common problems in task-sharing scenarios: race [...]]]></description>
			<content:encoded><![CDATA[<p>A wide range of nasty things can go wrong when two or more tasks coordinate their work through, or otherwise share, a singleton resource such as a global data area, heap object, or peripheral&#8217;s register set. In the first part of this column, I described two of the most common problems in task-sharing scenarios: race conditions and non-reentrant functions. But resource sharing combined with the priority-based preemption found in commercial real-time operating systems can also cause priority inversion, which is equally difficult to reproduce and debug.</p>
<p>The problem of priority inversion stems from the use of an operating system with fixed relative task priorities. In such a system, the programmer must assign each task it&#8217;s priority. The scheduler inside the RTOS provides a guarantee that the highest-priority task that&#8217;s ready to run gets the CPU—at all times. To meet this goal, the scheduler may preempt a lower-priority task in mid-execution. But when tasks share resources, events outside the scheduler&#8217;s control can sometimes prevent the highest-priority ready task from running when it should. When this happens, a critical deadline could be missed, causing the system to fail.</p>
<p>At least three tasks are required for a priority inversion to actually occur: the pair of highest and lowest relative priority must share a resource, say by a mutex, and the third must have a priority between the other two. The scenario is always as shown in the figure below. First, the low-priority task acquires the shared resource (time t1). After the high priority task preempts low, it next tries but fails to acquire their shared resource (time t2); control of the CPU returns back to low as high blocks. Finally, the medium priority task—which has no interest at all in the resource shared by low and high—preempts low (time t3). At this point the priorities are inverted: medium is allowed to use the CPU for as long as it wants, while high waits for low. There could even be multiple medium priority tasks.</p>
<p><a href="http://embeddedgurus.com/barr-code/files/2010/11/FiveMoreBugs_fig2.gif"><img src="http://embeddedgurus.com/barr-code/files/2010/11/FiveMoreBugs_fig2-300x243.gif" alt="Priority Inversion" title="Priority Inversion" width="300" height="243" class="aligncenter size-medium wp-image-415" /></a></p>
<p>The risk with priority inversion is that it can prevent the high-priority task in the set from meeting a real-time deadline. The need to meet deadlines often goes hand-in-hand with the choice of a preemptive RTOS. Depending on the end product, this missed deadline outcome might even be deadly for its user!</p>
<p>One of the major challenges with priority inversion is that it&#8217;s generally not a reproducible problem. First, the three steps need to happen—and in that order. And then the high priority task needs to actually miss a deadline. One or both of these may be rare or hard to reproduce events. Unfortunately, no amount of testing can assure they won&#8217;t ever happen in the field.[5]</p>
<p><em>Best Practice</em>: The good news is that an easy three-step fix will eliminate all priority inversions from your system.<br />
Choose an RTOS that includes a priority-inversion work-around in its mutex API. These work-arounds come by various names, such as priority inheritance protocol and priority ceiling emulation. Ask your sales rep for details.<br />
Only use the mutex API (never the semaphore API, which lacks this work-around) to protect shared resources within real-time software.</p>
<p>Take the additional execution time cost of the work-around into account when performing the analysis to prove that all deadlines will always be met. Note that the method for doing this varies by the specific work-around.<br />
Note that it&#8217;s safe to ignore the possibility of priority inversions if you don&#8217;t have any tasks with consequences for missing deadlines.</p>
<p><em>Footnotes</em></p>
<p>[5] Barr, Michael and Dave Stewart. &#8220;Introduction to Rate Monotonic Scheduling,&#8221; Beginner&#8217;s Corner, Embedded Systems Programming, February 2002. Available online at www.embedded.com/showArticle.jhtml?articleID=9900522.</p>
<p><a href="/barr-code/2010/11/firmware-specific-bug-7-deadlock/">Firmware-Specific Bug #7</a></p>
<p><a href="/barr-code/2010/11/firmware-specific-bug-9-incorrect-priority-assignment/">Firmware-Specific Bug #9</a></p>
]]></content:encoded>
			<wfw:commentRss>http://embeddedgurus.com/barr-code/2010/11/firmware-specific-bug-8-priority-inversion/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

