embedded software boot camp

Configuring hardware – part 2.

December 15th, 2010 by Nigel Jones

This is the second in a series on configuring the hardware peripherals in a microcontroller. In the first part I talked about how to set / clear bits in a configuration register.  Now while setting bits is an essential part of the problem, it is by no means the most difficult task. Instead the real problem is this. You need to configure the peripheral but on examining the data sheet you discover that the peripheral has twenty registers, can operate in a huge number of modes and has multiple interrupt sources. To compound the difficulty, you may not fully understand the task the peripheral performs – and the data sheet appear to have been written by someone who has clearly never written a device driver in their life. If this sounds a lot like what you have experienced, then read on!

When I first started working in embedded systems, I used to dread having to write a device driver. I knew I was in for days, if not weeks of anguish trying to make the stupid thing work. Today I can usually get a peripheral to do what I want with almost no heartache – and in a fraction of the time it used to take me. I do this by following a standard approach that helps minimize various problems that seem to crop up all the time in device drivers. These problems are as follows:

  1. Setting the wrong bits in a register
  2. Failing to configure a register at all.
  3. Setting the correct configuration bits – but in the wrong temporal order.
  4. Interrupts incorrectly handled.

To help minimize these types of problems, this is what I do.

Step 0 – Document *what* the driver is supposed to do

This is a crucial step. If you can’t write in plain English (or French etc) what the driver is supposed to do then you stand no chance of making it work correctly.  This is a remarkably difficult thing to do. If you find that you can’t succinctly and unambiguously describe the driver’s functionality then attempting to write code is futile. I typically put this explanation in the module header block where future readers of the code can see it. An explanation may look something like this.

This is a serial port driver. It is intended to be used on an RS232 line at 38400 baud, 8 data bits, no parity, one stop bit. The driver supports CTS / RTS handshaking. It does not support Xon / Xoff handshaking.

Characters to be transmitted are buffered and sent out under interrupt. If the transmit buffer fills up then incoming characters are dropped.

Characters are received under interrupt and placed in a buffer. When the receive buffer is almost full, the CTS line is asserted. Once the receive buffer has dropped below the low threshold, CTS is negated. If the host ignores the CTS line and continues to transmit then characters received after the receive buffer is full are discarded.

As it stands, this description is incomplete; for example it doesn’t say what happens if a receiver overrun is detected. However you should get the idea.

Incidentally I can’t stress the importance of this step enough. This was the single biggest breakthrough I made in improving my driver writing. This is also the step that I see missing from almost all driver code.

Step 1 – Create standard function outlines

Nearly all drivers need the following functions:

  1. Open function. This function does the bulk of the peripheral configuration, but typically does not activate (enable) the peripheral.
  2. Close function. This is the opposite of the open function in that it returns a peripheral to its initial (usually reset) condition. Even if your application would never expect to close a peripheral it is often useful to write this function as it can deepen your understanding of the peripheral’s functionality.
  3. Start function. This function typically activates the peripheral. For peripherals such as timers, the start function is aptly and accurately named. For more complex peripherals, the start function may be more of an enable function. For example a CAN controller’s start function may start the CAN controller listening for packets.
  4. Stop function. This is the opposite of the start function. Its job is to stop the peripheral from running, while leaving it configured.
  5. Update function(s). These function(s) are highly application specific. For example an ADC peripheral may not need an update function. A PWM channel’s update function would be used to update the PWM depth. A UART’s update function would be the transmit function. In some cases you may need multiple update functions.
  6. Interrupt handler(s). Most peripheral’s need at least one interrupt handler. Even if you aren’t planning on using an interrupt source, I strongly recommend you put together a function outline for it. The reason will become clear!

At this stage, your driver looks something like this:

/*
 Detailed description of what the driver does goes here
*/

void driver_Open(void)
{
}

void driver_Close(void)
{
}

void driver_Start(void)
{
}

void driver_Stop(void)
{
}

void driver_Update(void)
{
}

__interrupt void driver_Interrupt1(void)
{
}

__interrupt void driver_Interrupt2(void)
{
}

Step 2 – Set up power, clocks, port pins

In most modern processors, a peripheral does not exist in isolation. Many times peripherals need to be powered up, clocks need to routed to the peripheral and port pins need to be configured. This step is separate from the configuration of the peripheral. Furthermore documentation on these requirements is often located in non-obvious places – and thus this step is often overlooked. This is an area where I must give a thumbs-up to NXP. At the start of each of their peripherals is a short clear write up documenting the ancillary registers that need to be configured for the peripheral to be used. An example is shown below:

Basic Configuration Steps for the SSP

Personally, I usually place the configuration of these registers in a central location which is thus outside the driver. However there is also a case for placing the configuration of these registers in the driver open function. I will address why I do it this way in a separate blog post.

Step 3 – Add all the peripheral registers to the open function

This step is crucial. In my experience a large number of driver problems come about because a register hasn’t been configured. The surest way to kill this potential problem is to open up the data sheet at the register list for the peripheral and simply add all the registers to the open function. For example, here is the register list for the SSP controller on an NXP ARM processor:

Ten registers are listed.  Even though one register is listed as read only, I still add it to the driver_Open function as I may need to read it in order to clear status flags. Thus my open function now becomes this:

void driver_Open(void)
{
 SSP0CR0 = 0;
 SSP0CR1 = 0;
 SSP0DR = 0;
 SSP0SR;            /* Status register - read and discard */
 SSP0CPSR = 0;
 SSP0IMSC = 0;
 SSP0RIS = 0;
 SSP0MIS = 0;
 SSP0ICR = 0;
 SSP0DMACR = 0;
}

At this stage all I have done is ensure that my code is at least aware of the requisite registers.

Step 4 – Arrange the registers in the correct order

For many peripherals, it is important that registers be configured in a specific order. In some cases a register must be partially configured, then other registers must be configured, and then the initial register must be completely configured. There is no way around this, other than to read the data sheet to determine if this ordering exists. I should note that the order that registers appear in the data sheet is rarely the order in which they should be configured. In my example, I will assume that the registers are correctly ordered.

Step 5 – Write the close function

While manufacturer’s often put a lot of effort into telling you how to configure a peripheral, it’s rare to see information on how to shut a peripheral down. In the absence of this information, I have found that a good starting point is to simply take the register list from the open function and reverse it. Thus the first pass close function looks like this:

void driver_Close(void)
{
 SSP0DMACR = 0;
 SSP0ICR = 0;
 SSP0MIS = 0;
 SSP0RIS = 0;
 SSP0IMSC = 0;
 SSP0CPSR = 0;
 SSP0DR = 0;
 SSP0CR1 = 0;    
 SSP0CR0 = 0;
}

Step 6 – Configure the bits in the open function

This is the step where you have to set and clear the bits in the registers. If you use the technique that I espoused in part 1 of this series, then your open function will now explicitly consider every bit in every register.  An example of a partially completed open function is shown below:

void driver_Open(void)
{
 SSP1CR0 = ((4 - 1) << 0) |    /* DSS = 4 bit transfer (min value allowed) */
            (0U << 4) |        /* SPI format */
            (1U << 6) |        /* CPOL = 1 => Clock idles high */
            (1U << 7) |        /* CPHA = 1 => Output data valid on rising edge */
            (5U << 8);         /* SCR = 5 to give a division by 6 */

 SSP1CR1 =  (0U << 0) |        /* LPM = 0 ==> no loopback mode */
            (1U << 1) |        /* SSE = 1 ==> SSP1 is enabled */
            (0U << 2) |        /* MS = 0 ==> Master mode */
            (0U << 3);         /* SOD = 0 (don't care as we are in master mode */

 SSP0DR = 0;
 SSP0SR;            /* Status register - read and discard */
 SSP0CPSR = 0;
 SSP0IMSC = 0;
 SSP0RIS = 0;
 SSP0MIS = 0;
 SSP0ICR = 0;
 SSP0DMACR = 0;
}

Clearly this is the toughest part of the exercise. However at least if you have followed these steps, then you are guaranteed not to have made an error of omission.

This blog posting has got long enough. In the next part of this series, I will address common misconfiguration issues, interrupts etc.

Configuring hardware – part 1.

November 13th, 2010 by Nigel Jones

One of the more challenging tasks in embedded systems programming is configuring the hardware peripherals in a microcontroller. This task is challenging because:

  1. Some peripherals are stunningly complex. If you have ever configured the ATM controller on a PowerQUICC processor then you know what I mean!
  2. The documentation is often poor. See for example just about any LCD controller’s data sheet.
  3. The person configuring the hardware (i.e. me in my case) has an incomplete understanding of how the peripheral works.
  4. One often has to write the code before the hardware is available for testing.
  5. Manufacturer supplied example code is stunningly bad

I think I could extend this list a little further – but you get the idea. Anyway, I have struggled with this problem for many years. Now while it is impossible to come up with a methodology that guarantees correct results, I have come up with a system that really seems to make this task easier. In the first part of this series I will address the most elemental task – and that is how to set the requisite bits in the register.

By way of example, consider this register definition.

This is a control register for an ADC found in the MSP430 series of microcontrollers. The task at hand is how to write the code to set the desired bits. Now in some ways this is trivial. However if you are serious about your work, then your concern isn’t just setting the correct bits – but doing so in such a manner that it is crystal clear to someone else (normally a future version of yourself) as to what you have done – and why. With this as a premise, let’s look at some of the ways you can tackle this problem.

Magic Number

Probably the most common methodology I see is the magic number approach. For example:

ADC12CTL0 = 0x362C;

This method is an abomination. It’s error prone, and very difficult to maintain. Having said that, there is one case in which this approach is useful – and that’s when one wants to shutdown a peripheral. In which case I may use the construct:

ADC12CTL0 = 0;   /* Return register to its reset condition */

Other than that, I really can’t see any justification for this approach.

Bit Fields

Even worse than the magic number approach is to attempt to impose a bit field structure on to the register. While on first glance this may be appealing – don’t do it! Now while I think bitfields have their place, I certainly don’t recommend them for mapping on to hardware registers. The reason is that in a nutshell the C standard essentially allows the compiler vendor carte blanche in how they implement them. For a passionate exposition on this topic, see this comment on the aforementioned post. Anyway, this approach is so bad I refuse to give an example of it!

Defined fields – method 1

This method is quite good. The idea is that one defines the various fields. The definitions below are taken from an IAR supplied header file:

#define ADC12SC             (0x001)   /* ADC12 Start Conversion */
#define ENC                 (0x002)   /* ADC12 Enable Conversion */
#define ADC12TOVIE          (0x004)   /* ADC12 Timer Overflow interrupt enable */
#define ADC12OVIE           (0x008)   /* ADC12 Overflow interrupt enable */
#define ADC12ON             (0x010)   /* ADC12 On/enable */
#define REFON               (0x020)   /* ADC12 Reference on */
#define REF2_5V             (0x040)   /* ADC12 Ref 0:1.5V / 1:2.5V */
#define MSC                 (0x080)   /* ADC12 Multiple Sample Conversion */
#define SHT00               (0x0100)  /* ADC12 Sample Hold 0 Select 0 */
#define SHT01               (0x0200)  /* ADC12 Sample Hold 0 Select 1 */
#define SHT02               (0x0400)  /* ADC12 Sample Hold 0 Select 2 */
#define SHT03               (0x0800)  /* ADC12 Sample Hold 0 Select 3 */
#define SHT10               (0x1000)  /* ADC12 Sample Hold 1 Select 0 */
#define SHT11               (0x2000)  /* ADC12 Sample Hold 1 Select 1 */
#define SHT12               (0x4000)  /* ADC12 Sample Hold 2 Select 2 */
#define SHT13               (0x8000)  /* ADC12 Sample Hold 3 Select 3 */

With these definitions, one can now write code that looks something like this:

ADCT12CTL0 = ADC12TOVIE + ADC12ON + REFON + MSC;

However, there is a fundamental problem with this approach. To see what I mean, examine the comment associated with the define REF2_5V. You will notice that in this case, setting the bit to zero selects a 1.5V reference. Thus in my example code, I have implicitly set the reference voltage to 1.5V. If one examines the code at a later date, then it’s unclear if I intended to select a 1.5V reference – or whether I just forgot to select any reference – and ended up with the 1.5V by default. One possible way around this is to add the following definition:

#define REF1_5V             (0x000)   /* ADC12 Ref = 1.5V */

One can then write:

ADCT12CTL0 = ADC12TOVIE + ADC12ON + REF1_5V + REFON + MSC;

Clearly this is an improvement. However there is nothing stopping you writing:

ADCT12CTL0 = ADC12TOVIE + ADC12ON + REF1_5V + REFON + MSC + REF2_5V;

Don’t laugh – I have seen this done. There is also another problem with the way the fields have been defined, and that concerns the fields which are more than 1 bit wide. For example the field SHT0x is used to define the number of clock cycles the sample and hold should be active. It’s a 4 bit field, and thus has 16 possible combinations. If I need 13 clocks of sample and hold, then I have to write code that looks like this:

ADCT12CTL0 = ADC12TOVIE + ADC12ON + REF1_5V + REFON + MSC + SHT00 + SHT02 + SHT03;

It’s not exactly clear from the above that I desire 13 clock samples on the sample and hold. Now clearly one can overcome this problem by having additional defines – and that’s precisely what IAR does. For example:

#define SHT0_0               (0*0x100u)
#define SHT0_1               (1*0x100u)
#define SHT0_2               (2*0x100u)
...
#define SHT0_15             (15*0x100u)

Now you can write:

ADCT12CTL0 = ADC12TOVIE + ADC12ON + REF1_5V + REFON + MSC + SHT0_13;

However, if you use this approach you will inevitably end up confusing SHT00 and SHT0_0 – with disastrous and very frustrating results.

Defining Fields – method 2

In this method, one defines the bit position of the fields. Thus our definitions now look like this:

#define ADC12SC             (0)   /* ADC12 Start Conversion */
#define ENC                 (1)   /* ADC12 Enable Conversion */
#define ADC12TOVIE          (2)   /* ADC12 Timer Overflow interrupt enable */
#define ADC12OVIE           (3)   /* ADC12 Overflow interrupt enable */
#define ADC12ON             (4)   /* ADC12 On/enable */
#define REFON               (5)   /* ADC12 Reference on */
#define REF2_5V             (6)   /* ADC12 Ref */
#define MSC                 (7)   /* ADC12 Multiple Sample Conversion */
#define SHT0                (8)   /* ADC12 Sample Hold 0 */
#define SHT1                (12)  /* ADC12 Sample Hold 1 */

Our example configuration now looks like this:

ADCT12CTL0 = (1 << ADC12TOVIE) + (1 << ADC12ON) + (1 << REFON) + (0 << REF2_5V) + (1 << MSC) + (13 << SHT0);

Note that zero is given to the REF2_5V argument and 13 to the SHT0 argument. This was my preferred approach for a long time. However it had certain practical weaknesses:

  1. It relies upon the manifest constants being correct / me using the correct manifest constant. You only need to spend a few hours tracking down a bug that ends up being an incorrect #define to know how frustrating this can be.
  2. It still doesn’t really address the issue of fields that aren’t set. That is, was it my intention to leave them at zero, or was it an oversight?
  3. There is often a mismatch between what the compiler vendor calls a field and what appears in the data sheet. For example, the data sheet shows that the SHT0 field is called SHT0x. However the compiler vendor may choose to simply call this SHT0, or SHT0X etc. Thus I end up fighting compilation errors because of trivial naming mismatches.
  4. When debugging, I often end up looking at a window that tells me that ADC12CTL0 bit 6 is set – and I’m stuck trying to determine what that means. (I recognize that some debuggers will symbolically label the bits – however it isn’t universal).

Eschewing definitions

We now come to my preferred methodology. What I wanted was a method that has the following properties:

  1. It requires me to explicitly set / clear every bit.
  2. It is not susceptible to errors in definition / use of #defines.
  3. It allows easy interaction with a debugger.

This is what I ended up with:

ADC12CTL0 =
 (0u << 0) |        /* Don't start conversion yet */
 (0u << 1) |        /* Don't enable conversion yet */
 (1u << 2) |        /* Enable conversion-time-overflow interrupt */
 (0u << 3) |        /* Disable ADC12MEMx overflow-interrupt */
 (1u << 4) |        /* Turn ADC on */
 (1u << 5) |        /* Turn reference on */
 (0u << 6) |        /* Reference = 1.5V */
 (1u << 7) |        /* Automatic sample and conversion */
 (13u <<  8) |      /* Sample and hold of 13 clocks for channels 0-7 */
 (0u << 12);        /* Sample and hold of don't care clocks for channels 8-15 */

There are multiple things to note here:

  1. I have done away with the various #defines. At the end of the day, the hardware requires that bit 5 be set to turn the reference on. The best way to ensure that bit 5 is set is to explicitly set it. Now this thinking tends to fly in the face of conventional wisdom. However, having adopted this approach I have found it to be less error prone – and a lot easier to debug / maintain.
  2. Every bit position is explicitly set or cleared. This forces me to consider every bit in turn and decide what it’s appropriate value should be.
  3. The layout is important. By looking down the columns, I can check that I haven’t missed any fields. Just as important, many debuggers present the bit fields of a register as a column just like this. Thus it’s trivial to map what you see in the debugger to what you have written.
  4. The value being shifted has a ‘u’ appended to it. This keeps the MISRA folks happy – and it’s a good habit to get into.
  5. The comments are an integral part of this approach

There are still a few problems with this approach. This is what I have discovered so far:

  1. It can be tedious with a 32 bit register.
  2. Lint will complain about shifting zero (as it considers it pointless). It will also complain about shifting anything zero places (as it also considers it pointless). In which case you have to suppress these complaints. The following macros do the trick:
#define LINT_SUPPRESS(n)  /*lint --e{n} */
LINT_SUPPRESS(835)        /**< Inform Lint not to worry about zero being given as an argument to << */
LINT_SUPPRESS(845)        /**< Inform Lint not to worry about the right side of the | operator being zero */

In the next part of this article I will describe how one can extend this technique to make configuring peripherals a lot less painful.

Subscribing to comments

November 11th, 2010 by Nigel Jones

I heard from Jeff Gros the other day asking if it’s possible to subscribe to all the comments posted on this blog. Given the quality of the comments that are posted here, I thought it was an excellent request. Anyway, the answer is yes.  Just follow this link.

Median Filter Performance Results

November 9th, 2010 by Nigel Jones

In my earlier post on median filtering I made the claim that for filter sizes of 3, 5 or 7 that using a simple insertion sort is ‘better’ than using Phil Ekstrom’s technique.  It occurred to me that this claim was based upon my testing with 8 bit processors quite a few years ago, and that the results might not be valid for 32 bit processors with their superior pointer manipulation.  Accordingly I ran some bench marks comparing an insertion sort based approach with Ekstrom’s method.

The procedure was as follows:

  1. I generated an array of random integers on the interval 900 – 1000. The idea is that these would represent data from a typical 10 bit ADC found on many microcontrollers.
  2. I then put together a base line project which performed all the basic house keeping functions, but without performing any filtering. The idea was to try and get a feel for the non-algorithm specific overhead.
  3. I then put together a project which median filtered using an insertion sort, for sizes, 3, 5, 7, 9, 11, and 13. Note that I elected to take a copy of the data prior to sorting. See this comment thread for a discussion of whether this is necessary or not.
  4. I put together another project which median filtered using Ekstrom’s method.
  5. I compiled the above for an ARM Cortex M3 target using an IAR compiler with full speed optimization.

The results were a clear win for Ekstrom. His code size was 132 bytes versus 224. His code was 5%, 32%, 61%, 89%,113% and 146% faster than the insertion sort for filters sizes of 3, 5, 7, 9, 11 and 13 respectively. To be fair to the insertion sort technique, I have made no effort to optimize it. Notwithstanding this, I think I can say that for 32 bit targets, you may as well just use Ekstrom’s approach for all filter sizes.

I’ll endeavor to update this post with results for a 16 bit target (MSP430) in the next few days.

Well I finally got around to running the tests on an MSP430 target. In this case Ekstrom’s method produced a larger code size (186 bytes versus 160). Much to my surprise, Ekstrom’s method was dramatically superior to the insertion sort approach, with speeds of 69% faster for a filter size of 3, going up to a whopping 250% faster with a filter size of 13.  The bottom line: I think my original claim is bunk. Use Ekstrom’s method by default!

DigiView Logic Analyzer

October 6th, 2010 by Nigel Jones

Today is one of those rare days on which I recommend a product. I only do this when I find a product that has genuinely made my life easier, and which by extension I think will also make your life easier. The product in question is a  DigiView logic analyzer. Now the fact that logic analyzers are useful tools should not be news to you. Indeed if you have been in this business long enough you will no doubt remember the bad old days of debugging code by decoding execution traces on a logic analyzer. That being said, I almost stopped using logic analyzers because they were big, expensive, difficult to set up and highly oriented towards bus-based systems. Given that I had my own consulting company with limited cash, limited space and a propensity to work on non-bus based systems (i.e. single chip microcontrollers), it’s hardly surprising that a logic analyzer wasn’t part of my toolbox.

This state of affairs persisted for a number of years until I obtained via a convoluted route a DigiView DV1-100. This is a USB powered, hand-sized box, with 18 channels at 100 MHz. It’s successor (The DV3100) sells for $499. The device sat on my shelf for a while until I decided to give it a spin one day. Since then I have found it to be an indispensable tool. Interestingly I find I use it the most when implementing the myriad of synchronous protocols that seem to exist on peripheral ICs today. While it is of course very useful for getting the interfaces working, I also find it extremely useful in fine tuning the interfaces. Via the use of the logic analyzer I can really examine set-up and hold times, clock frequencies, transmission latencies and so on. Doing so has allowed me to dramatically improve the performance of these interfaces in many cases. Indeed, I have had such success in this area that I now routinely hook the analyzer up, even when the interface works first time. If nothing else it gives me a nice warm fuzzy feeling that the interface is working the way it was designed – and not by luck.

Another area where I find it very useful is when I need to reverse engineer a product. I do this a lot as part of my expert witness work – and it is really quite remarkable how much you can learn from looking at a logic analyzer trace.

Anyway, the bottom line is this. $499 gets you an 18 channel 100 MHz personal logic analyzer that can handle most of the circuitry most of us see on a daily basis. If you value your time at all, then the DigiView will pay for itself the first time you use it. Go hassle your boss to get one.