embedded software boot camp

Real world variables

Wednesday, January 16th, 2013 by Nigel Jones

Part of what makes embedded systems fun for me is that they normally interact with the physical world. The physical world contains real parameters which we measure using transducers, signal conditioning circuits and so on, such that ultimately we end up with a variable in our embedded code that purports to represent this real world parameter. For example, we might have this:

uint16_t pressure;     /* Pressure */

Don’t laugh. I have seen this a million times. What’s wrong with this you ask? Well, when dealing with real world variables, it is crucial that you as the author of the code make crystal clear at least four things about the real world variable:

  1. What the parameter actually is.
  2. The units of the variable.
  3. The resolution of the variable, otherwise known as the value of a LSB.
  4. The dynamic range of the variable.

Identifying the parameter

Many embedded systems measure a multitude of real world parameters, many of which may be of the same ‘type’ (for example, ‘pressure’). Thus while it is blindingly obvious to you when you write the code which pressure you are thinking of, it most certainly isn’t to someone coming into the code cold. Even if there is only one measured pressure in your system, the chances are there are various flavors of it. For example, a common architecture when measuring real world variables is to:

  1. Have the raw value. This is typically the value resulting from converting the latest ADC reading. This raw value is then:
  2. Median filtered so as to eliminate egregious outliers. The median value is then:
  3. Low pass filtered so as to remove Gaussian noise.

There are thus at least three representations of the pressure in the system I have described. My preference is that the variable name identify which instance you are dealing with. However, if for various reasons this makes for unwieldy names then at the very least make sure the comment that accompanies the variable declaration spells it out. For example:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure */

Units

Failure to specify the units drives me up the wall. For example there are literally dozens of units of pressure, there are at least four units of temperature and a huge number of  units for speed. While it is of course obvious to you the author that you are measuring pressure in lbs/square inch, you are likely to find that the rest of world that works on the SI system probably aren’t even aware that such an arcane unit exists. The bottom line: specify your units. My example now becomes:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. Units: bar */

Related to this is the requirement that the units are consistent with the variable name. For example, most of us would criticize a declaration that looks like this:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. Units: Celsius */

This is clearly wrong. However, what about this declaration:

uint16_t symbol_rate;    /* Symbol rate reported by demodulator. Units: bps */

What’s confusing about this declaration is that a symbol rate should have units of symbols per second. For the special case, where there is one bit per symbol, the symbol rate does happen to equal the bit rate, which is usually measured in bits per second or bps. Thus upon seeing a declaration such as this, I’m left in a quandary, as the following are all possibilities:

  1. I’m dealing with the special case where the symbol rate and bit rate are the same.
  2. The author was sloppy in his commenting and meant to write ‘sps’ rather than ‘bps’ and so the variable genuinely does reflect a symbol rate.
  3. The author was sloppy in his variable naming, such that the variable actually represents a bit rate with units of bps.

That’s a lot of confusion to sow for failure to ensure that the variable name and its units are consistent.

Resolution

Closely allied with units is the resolution or scaling. In a nutshell what does 1LSB of the variable represent? Again I find that this all too important parameter is deemed obvious by the author of the code. While it is common that 1 LSB = 1 unit, systems that need to maximize dynamic range will often use a different scaling. For example: 1 LSB = 0.0125 bar. The bottom line: if you don’t specify what 1LSB represents you are really doing future readers of your code a major disservice. It’s common to incorporate the resolution and units together, such that our example now becomes:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. 1 LSB = 0.0125 bar */

Dynamic range

The last thing that should be reported is the expected dynamic range of the variable. This covers a multitude of issues:

  1. What is the legal range of values? For example, consider the popular LM35 series of temperature sensors. These devices are inherently designed to report temperatures above zero Celsius. As such negative temperatures are not expected when using this sensor. Other sensors will of course have other physical limits which it’s important to report.
  2. What is the sensor offset? For example, absolute pressure sensors will typically have a non zero output when exposed to atmospheric pressure.

Thus it is essential that you specify the expected range. If you do so, the resolution can be implicit. However I still like to make it explicit. For example:

uint16_t pressure_co2_median;    /* Median filtered CO2 pressure. 
                                    Range 0x0000 - 0x3FFF = 1.000 - 204.7875 bar. 
                                    1 LSB = 0.0125 bar. */

I urge you to take a look at your code and see if you are doing this for your real world variables. If you aren’t then consider making some changes. Future generations will thank you.

29 Responses to “Real world variables”

  1. SteveL says:

    Might I suggest including the units within the variable name, e.g. pressure_bar_co2_median, so that you are reminded of the units when using the variable in an expression?

    • Nigel Jones says:

      It is a nice technique. However in practice there are a lot of units that don’t map well on to it. For example if the variable is an acceleration with units of meters per second per second, then it’s tough to incorporate the units into the variable name.

      • Steve Huggins says:

        I have used units in identifiers for a long time, but I can see the short-coming of this approach now. I have been lucky enough not to meet long or cumbersome units. However, I do like the idea of it because the units are in your face every place that variable is used.

        I think that the ‘resolution or scaling’ issue is actually just one of units. If 1 LSB = 0.0125 bar, then pressure_co2_median is not in bar, but in units of 0.0125 bar, and as such a name like pressure_bar_co2_median would be misleading. Think of duration_seconds and duration_milliseconds. Personally, I would consider replacing

        uint16_t duration; /* 1 LSB = 0.001 seconds */

        with

        uint16_t duration_milliseconds;

        but not with

        uint16_t duration_seconds; /* 1 LSB = 0.001 seconds */

      • George says:

        /* Median filtered CO2 pressure.
        Range 0x0000 – 0x3FFF = 1.000 – 204.7875 bar.
        1 LSB = 0.0125 bar. For further information please refer to
        http://en.wikipedia.org/wiki/Bar_%28unit%29
        (Damn it. I should never work for Nigel.He is killing me
        with these F@c#ng comments) */

  2. Ben says:

    I just wanted to mention Andrew Kennedy’s great work on using type systems to track units of measure. I’m not aware of any implementation targeted at embedded systems, but it would be cool if someone did that.

    http://research.microsoft.com/en-us/um/people/akenn/

    Ben

  3. Manfred Bartz says:

    I totally agree that the unit must be part of the identifier name. Usually the unit is also sufficient to unambigously identify the physical property being measured. Here are my suggestions for identifiers:

    int Pos3V3supply_centiVolt; // +3.3V supply rail, measure voltage in 10mV units
    int PcrChamber_deciDegC; // PCR chamber temperature in 0.1 deg C
    int Inlet_kPa; // Inlet pressure in kPa
    int Settling_1p333mSecTicks; // Settling time in 1.333mSec units

    PS: the font used on this web page is inappropriate for the subjects discussed.

  4. Brad says:

    1LSB –> 1 LSB?

  5. The idea of using strongly types variables for simulations is older than 1992 by at least 4 years.

    Using a strongly typed language, Ada, we implemented a complete SI imperial typing system. Instead of naming variables using the system outlined by Mike above, we named variables by typing them with the physical quantity expressed. For example (please excuse me if the syntax is not perfect):

    type length is real 0 .. system.max_float ;
    type time is real range system.min_float .. system.max_float;
    type speed is limited private;
    type miles_per_hour is limited private;

    function “/”(left : length, right: time) return speed ;

    function to_miles_per_hour(fast:speed) return miles_per_hour;

    Using properly defined types for each variable in the system resulted in code that would not even compile if proper unit conversions were not used. All of the conversions were put in a table structuring the allowable conversion (both unit and type) rules, and a n automatic code generation scheme generated the several dozen combinations (with private types specifying the allowable conversion operations, e.g. +,-,/,*). ( I believe I wrote the code generator).

    While this approach was shown to result in provably reliable simulation software, the resulting libraries were so large that system build times often exceeded a day (on a VAX). A similar approach was tried with the C++ compilers of the day, the templates could not be compiled. A modern C++ compiler might provide similar results. We noted very little, if any performance penalty at run time because all the wrappers were optimized away.

    While it was painful to educate the developers at first (“this won’t compile”), after a while, it became second nature for the team and reduced integration issues considerably.

    A “large” embedded system using Linux and several megabytes of memory might benefit from a similar approach, and the boost++ libraries might be a place to start, but results would be doubtful for a microcontroller with 64K of memory.

    I do agree with Mike’s premise that naming the variables and specifying units, range, and precision is an essential tool for smoothing development and ensuring product quality.

  6. Harold says:

    Fixed point arithmetic is much nicer in Ada. Instead of a human-readable comment that can easily be missed, you tell the compiler to check things for you. It knows how many bits to use for the type. Constants look like real numbers, but the compiler compiles integer arithmetic. It will refuse to do arithmetic on variables of the wrong type (no apples + oranges). It can do optional runtime bounds checking. Something like this:

    type pressure_bar is delta 0.0125 range 1.000 .. 204.7875; — pressure in bar.
    for pressure’Small use 0.0125;

    pressure_co2_median : pressure_bar; — Median filtered CO2 pressure

  7. Brad says:

    In school we were told to use:

    LSB = least significant byte
    lsb = least significant bit

    Is this an universal way of differentiating? Using this convention over the years has really helped me be clear of what I was talking about in my documentation.

    • Fred says:

      Yep, I agree. I noticed Michael using LSB when he meant lsb. But this bigger issue of making sure that a reader understands the details of each variable as Michael suggests is key.

    • Jon Titus says:

      Why not LSB and LSBy? I have never used lsb (lowercase characters) and don’t plan to. Is an integrated circuit an IC or an ic?

      • Brad says:

        This is a real-world need to avoid confusion. I’ve read documentation before that was frustratingly ambiguous as to whether they were talking about bits or bytes. Don’t use the same upper-case acronym within the same document to refer to both. If you refuse to use lower-case for an acronym in a special case like this, then spell out “least significant bit” instead.
        “IC” isn’t easily confused with anything (that I’m aware of), so there’s no reason to consider another form for it.

  8. David says:

    I heartily concur with using fully descriptive names. I don’t balk at long names, because I can always copy and paste them once constructed. I often use “”MKS” as in gMKS for acceleration in m/sec^2, e_MKS for the electron charge in Coulombs. I have no problem with:

    uint_16 instrument_pressure_SiH4&NF3_median;
    double instrument_calibration _factor_millibar = 0.3459;
    /…/
    double pressure_SiH4&NF3_median_MKS= instrument_calibration _factor_millibar*
    pressure_SiH4&NF3_median/1.0E8;

    uint_16 pixel_density_MKS_peta;

    where “_MKS_peta” signifies that the unit is 10^15 pixels/meter^2, or 1000 pixels per square micron.

    • Gerardo says:

      It sounds good to have those self-explaining variables.
      But try now to do some arithmetical operations on it: no screen wide will be enough 😉
      E.g. (no sense, only to show the point)
      uint16_t instrument_calibration _factor_millibar = pressure_SiH4&NF3_median_MKS / pixel_density_MKS_peta * pressure_CoNe&NG3_peak_MKS; /* and so on, but putting here also a comment would be an even longer line… */

      Obviously, the line may be splitted in many lines. But is it readable?

  9. Rhys Drummond says:

    I’ve done a lot oj Objective C in the last year and the value of highly descriptive function and variable names can’t be understated. For years I tried to keep variable names to a minimum to make the code look elegant (temp, vol, pres, etc) and some compilers supported only short names, and while the full version of these works is an improvement there is definitely time saving in incorporating extra information in the names.

    You can’t practically include every detail in the name but there is definitely a compromise. For compilers with poor type checking Hungarian notation has saved me from errors countless times also.

    So for me, this is helpful:
    word wRoomTempADCRaw;
    byte bRoomTempDegC;

  10. Rhys Drummond says:

    (sorry- accidental early post)
    So for me, this is helpful:
    word wRoomTempADCRaw; // Value as read from ADC, no filtering
    byte bRoomTempDegC; // Real world value = bValue – 40degC. e.g., use this as serialisable form
    float fRoomTempDegCFiltered; // Use this as input to logic

    etc. The code is always easy to read, and while it takes a little more space, makes the intention clear with only the occasional referring back to the mapping definition if necessary.

    And I don’t want to hear from the anti-Hungarian Notation club; this is a style preference that works for me for embedded C. You don’t have to use it of course. 🙂

  11. David Paktor says:

    Thanks, Ben for the pointer to Andrew Kennedy’s great work.
    I’d like to point out that, even without the sophistication of F# or Visual Studio, just in ordinary “C”, one can make use of typing to prevent crossover of units.
    Types, even in “C”, do not need to be limited to storage-classes. To use one of the examples in Mr. Jones’s paper: Declare a type name for “pressure measured in eightieths-of-a-bar” as a uint16 and not only will you have preserved the storage-class intended here, you also will guarantee that all variables declared to be of that type will be checked for compatibility at compile-time.
    Granted, this does not give convenient support for conversion; if you’re using “C”, you’ll have to create your own conversions. But it is a stronger way to protect against accidents than relying purely on variable names.
    Not that I’m dis-commending the use of variable-names that convey physical-type information: it’s a valuable practice, too, but not always sufficient.

  12. Michael says:

    Another useful convention when using fixed point quantities stored in integers is to append a Q-format[0] description to the variable name. For example, a temperature reading with an lsb size of 0.125 degC is essentially a fixed point number with the 3 least significant bits representing the fractional component of the temperature. Therefore:

    uint_1

  13. Michael says:

    Another useful convention when using fixed point quantities stored in integers is to append a Q-format[0] description to the variable name. For example, a temperature reading with an lsb size of 0.125 degC is essentially a fixed point number with the 3 least significant bits representing the fractional component of the temperature. Therefore:

    uint_16 temperature_celsiusQ3;

    Is a useful way to summarise info in a place you need it.

    [0] http://en.wikipedia.org/wiki/Q_%28number_format%29

  14. Niklas Holsti says:

    For a new, practical way to manage and verify the physical dimensions of variables in Ada programs, see http://docs.adacore.com/gnat-unw-docs/html/gnat_ugn_28.html. While this is not yet a standard feature of Ada, it is available in the most accessible and gcc-based Ada compiler, gnat.

    In addition to the pure type-based method that Bert Williams described, which requires a large number of types and operations, there is another method that can be implemented in standard Ada (or in any language with operator overloading, such as C++), by extending each variable to a record (struct) with components that identify the dimensions, and combining and checking the dimensions at run-time in the arithmetic operations. However, that method obviously has run-time overhead both in time and space.

    The new method, referenced above, uses new features of Ada 2012, causes no run-time overhead, and does not require the programmer to declare new types and operators for each possible dimensionality.

  15. Brian Drummond says:

    @Bert : of course that Ada compile time penalty has long vanished into the noise, while the benefits of its type system are as strong as ever; or stronger if you add SPARK..

    And just to be clear : while the merits of fully representing physical types on a 64k embedded system may be doubtful, Ada is being used on systems like AVR including Arduinos, and (experimentally) MSP430 that are often much smaller than 64k.

    It’s not just for defence and aerospace any more.

  16. Gary Lynch says:

    I, too, prefer to embed units in my variable names, but I think of the quantities as rational numbers, and infer the denominator with a suffix:
    unsigned int freqX16;
    unsigned int vBusInX100, iRectX100[3];

    In this example, I presume frequency is in hertz, potential in volts, and current in amps, subject to the modifiers.

    I have found over the years that quantities you’re going to display work well with denominators that are powers of 10, while those you will have to divide by a constant downstream trend toward powers of 2.

    As somebody else said, keeping the units ‘in my face’ saves me a lot of time trying to find that definition and any comments that go with it.

  17. Stuart Rubin says:

    I think NASA lost a Mars rover because on software module dealt with thrust in Newtons, and another in foot-pounds!

    Whenever we deal with real physical units (which is a lot!), we try to name the variable with the unit and scale. So, for reading a current, we’ll have these:
    uint16_t pulseCurrentADCCount; /* Raw ADC count of pulse current */
    uint16_t pulseCurrnetADCfiltered; /* Median filtered ADC count of pulse current */
    uint16_t pulseCurrent0MA1; /* Pulse current in 0.1 mA units */

    Note that I co-opted the Asian component notation for the units. 1K5 means 1.5K; you can’t miss the decimal place. Here 0MA1 means 0.1 mA.

  18. Daniel Szabo says:

    What are people’s thoughts on where to convert data from an ADC value to a real world value in the context of maximizing code portability?

    If I have a software module that controls the ADC of processor, should it return real world values, or should it return an ADC value leaving the responsibility of conversion to the calling function? Perhaps an intermediate module should provide a function that gets the ADC value from the ADC module and returns a real world value?

  19. Anonymous says:

    It wasn’t a rover, it was Mars Climate Orbiter, and the problem was caused by the contractor not following the metric requirement in the contract. (http://en.wikipedia.org/wiki/Mars_Climate_Orbiter)

    NASA has only lost rovers due to age, although we almost lost two due to programming problems related to flash file systems.

    On the ADC conversion, that’s a good question. I guess by real world numbers, you mean the actual quantity being measured, not the voltage at the ADC input. As you suggested, I would have the ADC driver return the raw ADC integer values, and have a middle layer convert the ADC values into real-world quantities (pressure, temperature, etc) that is then passed back to the requestor. This would isolate the code into several easy to digest parts, and if the ADC needed to be replaced, would minimize the locations where updates needed to occur. This is most useful for multiple ADC inputs that may or may not be symmetrical. This way there is only one place that you need to document that ADC1 is 8 bits, 0-1.5V and ADC2 is 12 bits, 0-5V, etc.

    That being said, if I was working on a system with only one ADC input, I would probably have the caller do the conversion from the raw ADC value to the real world quantity.

  20. Errand Wolfe says:

    In some cases it may be too cumbersome to include every attribute in a variable name, in which case having a smart editor is helpful: hover over the variable name and it displays the associated comment.

  21. Adam says:

    Great notes and conversation. I like to use the typedef to describe the variable better.

    uint16_t pressure_co2_median; /* Median filtered CO2 pressure. 1 LSB = 0.0125 bar */

    becomes

    typedef pressure_0_0125_bar_t uint16_t;
    pressure_0_0125_bar_t pressure_median;
    pressure_0_0125_bar_t pressure_last_sample;

Leave a Reply

You must be logged in to post a comment.