embedded software boot camp

EEPROM Wear Leveling

Wednesday, July 5th, 2017 by Nigel Jones

A problem that occurs from time to time is the need for an embedded system to keep track of the number of doodads it has processed since it was put in to service. Clearly you can’t keep this value in RAM as it will be lost when power is removed. Thus the obvious solution is to store the count in EEPROM. However, common EEPROM  specifications are 10^5 guaranteed write cycles, with 10^6 typical write cycles. If you expect to process say 20 million doodads during the life of your product you clearly can’t do something like this:

__eeprom uint32_t doodad_cntr;
...
doodad_cntr++;

The above assumes that the __eeprom memory space qualifier is sufficient for the compiler to generate the requisite code sequences to access EEPROM.

The first part of the solution is to define an array in EEPROM for holding the doodad counter. Thus:

__eeprom uint32_t doodad_cntr[DOODAD_ARRAY_SIZE];

Since the number of doodads processed increases monotonically, on power up one simply searches the array looking for the last entry, – i.e. the entry in the array with the biggest value. One records this offset into a static variable. Thus:

static uint8_t doodad_array_offset;
doodad_array_offset = find_last_used_entry();

The next time a doodad is processed, then the write occurs at the next location beyond doodad_array_offset in doodad_cntr[]. This simple technique immediately gives one a guaranteed increase of DOODAD_ARRAY_SIZE more writes than using a single variable. Whilst this is nice, there are a number of other things that one can do to improve the situation. These are all based on the observation that the erased state of EEPROM is a ‘1’ and not a zero. Thus a ‘blank’ EEPROM actually consists of 0xFF …. 0xFF, without a zero in sight. To take advantage of this, rather than writing the actual doodad count value to the array, instead write the 1’s complement of the value. This means that rather than writing, for example, the value 0x00000001 to EEPROM, you’ll instead write 0xFFFFFFFE. In this case the actual number of bits that have to change state is just 1 rather than 31, resulting in considerably less stress on the EEPROM, potentially increasing the life of the EEPROM. Note that this technique is equivalent to initializing the EEPROM to 0xFFFFFFFF and then decrementing.

Writing the 1’s complement also opens up another potential improvement. EEPROM is often byte or word addressable. Furthermore, to program the EEPROM is usually a 2-step process consisting of erasing the memory location (i.e. erase it to 0xFF) and then programming the erased location with the desired value. Using 1’s complement values, it should be apparent that for most of the time, many of the bytes in the EEPROM will be at 0xFF. If we combine this with the fact that most of the time only one out of the four bytes in a uint32_t will change value when a uint32_t value is incremented, then we can dramatically minimize the actual number of erases / writes performed on the EEPROM. The code to do this looks a bit like this:

typedef union {
    uint8_t cnt_bytes[4U];
    uint32_t counter;
} eeprom_union_t;
__eeprom eeprom_union_t doodad_cntr[DOODAD_ARRAY_SIZE];

void increment_doodad(void) {
    eeprom_union_t current_value.counter = doodad_cntr[doodad_array_offset].counter;
    if (++doodad_array_offset >= DOODAD_ARRAY_SIZE) {
        doodad_array_offset = 0U;
    }
    --current_value.counter;  //1's complement increment
    for (uint8_t i = 0U; i < 4U; ++i) {
        //Are the bytes different? If no, then do nothing
        if (doodad_array[doodad_array_offset].cnt_bytes[i] != current_value.cnt_bytes[i]) {
            //Need to erase byte
            erase(doodad_array[doodad_array_offset].cnt_bytes[i]);
            //See if we need to actually program now
            if (current_value.cnt_bytes[i] != 0xFFU) {
                doodad_array[doodad_array_offset].cnt_bytes[i] = current_value.cnt_bytes[i];
            }
        }
    }
}

Clearly this is quite a bit more work. However given that EEPROM erase / write times are in the 2 – 10ms arena, the time taken in comparison to execute the above code is insignificant. Given that it will also save millions of EEPROM writes over the life of the product, then the complexity is well worth it.
Finally if your product needs to keep track of an enormous number of doodads, then you’ll likely have no choice other than to keep track of when EEPROM cells go bad. This is typically done by assigning another area of EEPROM that has  DOODAD_ARRAY_SIZE bits – e.g._eeprom uint8_t bad_cell[DOODAD_ARRAY_SIZE / sizeof(uint8_t)]. These bits are erased to 1. Once you detect that a write has failed at a certain cell in doodad_cntr[], then you change the corresponding bit in bad_cell[] from ‘1’ to a ‘0’ and the cell is considered bad for all time.  Obviously you then have to interrogate the bad_cell[] array to determine whether the code should use a specific cell.

9 Responses to “EEPROM Wear Leveling”

  1. Keroronsk says:

    Array counter is good advice, but IMO 1’s complement is problematic. Many modern EEPROM IC’s have paged structure (similar to FLASH) inside, like 4 bytes (AT24C01C), and up ti 256 bytes (25LC1024), so even if you write a byte in the next cell, you will wear out 4 near cell anyway. So far as I know, the only best solution to wear-out leveling is to write to EEPROM as little as possible. For example, to have this doodad counter in the RAM, and to store it into EEPROM only on Vcc fail (brown-out detect, separate MCU core and EEPROM voltage with large cap or ION), FRAM instead of EEPROM, etc.

    • Deniz Can Cigsar says:

      I think he is refferring internal eeprom of MCU. They usually support byte access.. I a class that supports key-value system.. Map for instance. It manages the page writes, erases, cleanups.. Like following snipped:

      InternalDataStore dataStore(startingAdress, endingAddress);
      dataStore.read(counterId, &counterValue);
      counterValue++;
      dataStore.write(counterId, &counterValue);

      That will use the internal flash to store values.. If you code a class that uses Eeprom, it will use it.. InternalEepromDataStore, AT20C02DataStore will use I2C connection to store/retrieve the values from the chip with same interface etc..

    • Nigel Jones says:

      You are right that the 1’s complement approach only works on byte or word addressable EEPROM. Of course with the more modern EEPROMs you can simply make your array bigger! Regarding writing to EEPROM on power down. I’ve always found this to be tough to do in practice. What I usually find after doing a tolerance analysis on the voltage references, capacitors etc. is that I cannot guarantee to store everything to EEPROM that needs to be stored.

  2. Tuomas says:

    I don’t see how updating only the bytes that are different helps at all. Statistically, 255 out of 256 times, the LSB is different and needs to be erased. Thus, the LSB will be the most worn-out byte of each word. Once the LSB wears out and stops working, the whole word must be discarded anyway. Unless I missed something here.

    How about using an array of N bits, clearing one bit for each doodad. The array needs to be erased once per N doodads and a counter would be used to count how many times the array has been erased. Combined, the array and the counter would be used to initialized the doodad counter value at boot. This would give N*10^5 guaranteed doodads with the cost of N/8 bytes of EEPROM memory.

    • James E. says:

      > I don’t see how updating only the bytes that are different helps at all. … Unless I missed something here. <

      I think Nigel is assuming here that a byte-addressible EEPROM could re-write a byte without erasing the word.

      That is, that
      10001011 11111111 11111111 11111111
      could be updated to
      11001011 11111111 11111111 11111111
      without causing wear on the latter 3 bytes.

      I like your idea of setting the bits low sequentially; if Nigel's assumption is wrong, then that's certainly the way to go.

  3. David Bakin says:

    Do I understand you to mean that writing a ‘0’ (zero) to a bit in an EEPROM counts against wear, but clearing it to ‘1’ does not? I didn’t know that (if that’s what you meant).

    So in that case I think if you really want to reduce the number of bit flips (to avoid wear) – write a counter encoded as Gray code. The book Hacker’s Delight – as well as Knuth’s TAOCP and many other sources – explain how to easily increment in Gray code – and how to convert it to/from binary counting.

    By definition, of course, incrementing a Gray code counter changes only a single bit. (Actually, as an embedded engineer, you probably already know all this. But perhaps this application is new. Or perhaps manipulating a Gray code counter is too expensive for the MCUs you’re talking about. Or something else.)

  4. Ashleigh Quick says:

    One of my colleagues had all kind of similar ideas about management of wear in EEPROMS.

    He actually then tested it, instead of relying on a theoretical approach to things like bank structures and so on.

    What he found was:

    1. Sometimes wearing out a single byte would take out a series of bytes all around.

    2. Sometimes wearing out a single byte would take out the whole device.

    3. Manufacturers vary, so what works for wear level on one device will be different on another.

    4. Manufacturers change process without a change of part number (eg at the time of a die shrink, there might be more going on than meets the eye); which means that what worked yesterday fails tomorrow.

    All those things put together led us to take a very prudent approach: Try to minimise writes, and use a carefully constructed append-to-list style data structure (and software to walk the structure to find the next place to write).

    Doing this, with enough EEPROM or flash for that matter, you can get lifetimes of hundreds of years in normal use and all other clever techniques are not needed.

    Basically, don’t assume that wearing out a cell is going to leave other cells OK. The underlying structure might cause results you don’t expect, you can’t work around, you can’t predict, and that will change over time and outside your control.

  5. Frank Theile says:

    There’s also a HW-based approach for EEPROM wear leveling:
    only write to the EEPROM if the system is powered down.

    For sure, this isn’t cheap and not applicable in all cases. It requires a reliable power down detection and enough capacitance from the power supply to save the data to EEPROM.

Leave a Reply

You must be logged in to post a comment.