Real-Time Systems Inc.

Specializing in Embedded Controller Design

Building Endian-Safe Code with C++

Memory in most computers is byte-addressable, meaning that each memory byte has its own unique address. Since accessing a memory location moves the entire byte as one operation, it's not useful to talk about the order of the bits within the byte. We can discuss the significance of the bits, but not their order.

In contrast, the bytes of a multi-byte number can be accessed individually, so the order of the bytes becomes important. Two reasonable orders exist: big-endian and little-endian.

Big-endian machines store the most-significant byte at the lowest-numbered address, with the other bytes following in decreasing order of significance. Little-endian machines store multi-byte numbers in the opposite order: from least-significant to most-significant.

Within a single computer, the byte order doesn't much matter because the processor does all its memory loads and stores in the correct order, whichever order that may be. The problem arises when move data between machines with different endianness.

We could convert all multi-byte numbers to ASCII representations for transmission, but that would require extra bandwidth and would complicate generating and parsing the messages. To avoid these costs, most low-level protocols use fixed-size binary fields, but if these fields are larger than a single byte then their endianness becomes an issue.

In this article, we'll show how to use C++ to deal with such mixed-endian systems in a simple and reliable way.

A real-world example

Consider the Internet Protocol. Each Internet datagram begins with an IP header, which consists of a series of fixed-size fields:

    Field Name               Size in bits
    ---------------------   --------------
    Version/Header Length          8
    Type of Service                8
    Total Length                  16
    Identification                16
    Flags/Fragment Offset         16
    Time to live                   8
    Protocol                       8
    Header Checksum               16
    Source Address                32
    Destination Address           32

(The header can be extended with option fields, but we'll ignore that for now.)

The fields are transmitted in the order shown. For single-byte fields, the bit order isn't visible since the processor transfers bytes to and from the communications hardware as complete bytes, just as it does with memory bytes. The transmitted bit order can, in fact, differ for various physical media, but this is not visible to the software.

But what about the multi-byte fields? Here we need an agreement on the byte order, otherwise some systems will interpret the multi-byte fields in the wrong order. For the Internet protocols, we transmit all multi-byte fields most-significant byte first, or in big-endian order. For example, we transmit the source address field as:

    Partial Field           Size in bits
    -------------------    --------------
    Source Address, MSB         8
    Source Address, 3SB         8
    Source Address, 2SB         8
    Source Address, LSB         8

Representing fixed-format data in C/C++

When building a networked system, we'd like to use a single code base to describe the communications packet structures. A common code base avoids nasty copy-paste errors during development and makes maintenance easier. If the system includes multiple CPU types, however, we'll need to make the code base architecture-independent, which includes endian-independence.

In C or C++, fixed-format structures like those in the Internet protocols are best described with struct's. As a first try, we might define the IP datagram header as:

    struct IpHeader
    {
        char  ver;            // version and header length
        char  tos;            // type of service
        short len;            // total length of datagram
        short id;             // datagram identification
        short flags;          // flags and fragment offset
        char  ttl;            // time to live
        char  proto;          // protocol of payload
        short sum;            // header checksum
        int   src;            // source address
        int   dst;            // destination address
    };

This looks plausible, but there a few problems:

while char's are always 8 bits nowadays, short's and int's can vary in size on different architectures
we should use unsigned types, since in this case the semantics of the fields are unsigned
most importantly, we haven't specified the byte order of the multi-byte
fields.

We can easily solve the first two problems using fixed-size types from stdint.h:

    struct IpHeader
    {
        uint8_t  ver;         // version and header length
        uint8_t  tos;         // type of service
        uint16_t len;         // total length of datagram
        uint16_t id;          // datagram identification
        uint16_t flags;       // flags and fragment offset
        uint8_t  ttl;         // time to live
        uint8_t  proto;       // protocol of payload
        uint16_t sum;         // header checksum
        uint32_t src;         // source address
        uint32_t dst;         // destination address
    };

The stdint.h header comes from C99; it guarantees that the uintN_t types will be N bits wide on all machines. If used, the "u" prefix indicates an unsigned type.

The byte-order problem is harder; the rest of this article addresses it.

The traditional C approach to endianness

In our example of the IP header, big-endian machines will see the multi-byte fields in the order they expect, since that's the order they appear in the actual datagrams on the wire. Little-endian machines, however, need the multi-byte fields byte-swapped before they can be used.

Take the len field for example. A datagram length of 64 bytes shows up in the datagram as:

    Partial Field         Contents
    -----------------    ----------
    Total Length, MSB       0x00
    Total Length, LSB       0x40

When a big-endian machine reads this two-byte field, it sees 0x0040, the correct number. A little-endian machine loading this field into a register sees 0x4000, which is incorrect. The little-endian machine needs to byte-swap the number before use.

The POSIX standard provides the functions htonl(), htons(), ntohl(), and ntohs(), where the h means "host", the n means "network", and l and s represent "long" (32-bits) and "short" (16-bits) respectively. These routines perform the indicated transformation on all architectures, whether or not this requires a byte swap. For example, htonl() might be defined as:

    uint32_t htonl(uint32_t x)
    {
      #if BYTE_ORDER == BIG_ENDIAN
        return x;
      #else // LITTLE_ENDIAN
        return   x << 24 & 0xFF000000
               | x <<  8 & 0x00FF0000
               | x >>  8 & 0x0000FF00
               | x >> 24 & 0x000000FF;
      #endif
    }

(You can find BYTE_ORDER in <endian.h> on many systems. If not, it's easy enough to set up something similar yourself.)

As an example, let's check whether a datagram's source address is in a "class-A" address block (Class-A addresses have their highest bit clear):

    bool isSourceClassA(IpHeader* hdr)
    {
      return ntohl(hdr->src) & 0x8000000 == 0;
    }

In this (admittedly trivial) example, passing hdr->src to ntohl() before use ensures we're testing the proper address bit, regardless of the endianness of the processor. Note that to modify a header field, we need an extra step: call ntohl(), modify the value, then finally call htonl() before storing the result.

Problems with the traditional approach

The POSIX byte-swapping functions are fine for big-endian Internet datagrams, but what do we do for a schizophrenic protocol like ISO 11783? This protocol has big-endian headers but little-endian data fields. There's also the occasional 24-bit field to deal with.

We could start to define more byte-swappers, like these:

    uint64_t htole64(uint64_t);
    uint32_t htole32(uint32_t);
    uint16_t htole16(uint16_t);
        .
        .
        .

and so on for about nine more variants, not including the 24-bit cases, of course. Oh, and if we want fixed-endian int's, long's, float's or double's, we'll need to cast to and from the unsigned integral types provided by these new byte-swappers. We'd like to eliminate this sort of casting if we can, in the interest of type-safety, but that would take many more variants.

Of course, all of these are actually just aliases for either no-op's (where swapping isn't required) or for one of a small number of byte-swap routines: swap16(), swap24(), swap32(), and so forth.

But aside from this explosion of routines, there's also the maintenance problem. Every time we work on code which touches a fixed-endian field (either reading or writing it), we must manually remember to use the proper byte-swappers. If we miss just one, the program will be incorrect, but we may not notice until that particular part of the program runs on a machine with an endianness opposite that of the network.

Of course even with these drawbacks, there's been plenty of good network code written over the years. But using C++ intelligently we can do better.

What we'd like to see

Before we move to C++, let's identify some design goals:

Build a solution for every reasonable data size, optimizing the most common cases (16, 32, 64 bits) if possible.
Write a single piece of code to handle the whole thing -- avoid error-prone copy-paste coding.
Avoid the need to remember to use the swapper routines every time we touch a fixed-endian structure.
Maintain type safety -- avoid the need to explicitly cast to and from the network structure fields.

The last two requirements bear closer examination.

The real problem with the traditional approach is the implicit nature of the data type and endianness of the fields. In other words, the type and endianness of the fields must be held in the programmer's mind instead of being made explicit in the field declarations.

Wouldn't it be better if we could simply declare a packet field as, say, a big-endian 16-bit integer and have the compiler sort out the required byte swapping and type conversions automatically?

To achieve this goal, we need to do two things: automate the byte-swapping and generalize the solution over any desired data type.

Automatic byte swapping

As a concrete example, let's build a BigEndianInt16 data type. This 16-bit integral type will be stored in big-endian order on all machines, regardless of the machine's endianness. When we access it, however, we'll automatically do any necessary byte swapping both after reading it and before writing it. Of course we need only byte-swap on little-endian machines, but we'll try to hide that decision from the application code.

To demonstrate BigEndianInt16, let's consider a fictitious protocol whose header includes a 16-bit field to record the number of machines a packet has passed through. Each machine which handles a packet will increment this hop count.

    class Header
    {
      // ...
      BigEndianInt16 hopCount;          // number of hops
      void updateHopCount();            // add a hop to the count
      // ...
    };

When we receive a packet, it will come to us in a buffer pointed to by a void*. We'll then "map" the header structure onto the buffer so we can manipulate the header fields:

    void receive(void* buf)
    {
      Header* hdr = static_cast<Header*>(buf);
      hdr->updateHopCounter();
      // ...
    }

Using static_cast<>, we can tell the compiler that's it's safe to assume that buf points to a Header. (We presumably know this since we're familiar with the code that calls receive().) Once we have a Header, we can call updateHopCount() on it.

For updateHopCount(), we'd like to write something like the following and have it do the right thing regardless of the endianness of the machine running the code:

    void Header::updateHopCount() { hopCount++; }

BigEndianInt16 will be a class with two versions: one for big-endian machines and another for little-endian machines. Each version will have the same signature, meaning the same memory layout and public member functions. Having the same signature, they'll be interchangeable -- we'll choose the correct version for the target machine based on its endianness:

    #if BYTE_ORDER == BIG_ENDIAN
      class BigEndianInt16 { }; // no byte-swapping needed
    #else
      class BigEndianInt16 { }; // with byte-swapping on read and write
    #endif

The version for big-endian machines looks like this:

    #if BYTE_ORDER == BIG_ENDIAN
      class BigEndianInt16   // no byte-swapping needed
      {
        int16_t rep;        // in big-endian order
       
      public: 
        BigEndianInt16() { }
        BigEndianInt16(int16_t i) : rep(i) { }
        operator int16_t() const { return rep; }
      };
    #else
      class BigEndianInt16 { }; // with byte-swapping on read and write
    #endif

The rep field holds the actual contents of the BigEndianInt16 object, in big-endian byte order. The default constructor (used when mapping onto a packet from the network) doesn't change rep's contents; it just allows us to interpret the existing contents of rep as a BigEndianInt16.

The second constructor is special: it's a constructor with a single argument of a type other than the class type. It's called a conversion constructor because by creating a BigEndianInt16 from an int16_t, it, in at least some sense, converts an int16_t to a BigEndianInt16. Since on a big-endian machine the byte-order is already correct, this particular conversion constructor just stores its argument in the rep field.

The operator int16_t() conversion operator does the opposite of the conversion constructor; it converts a BigEndianInt16 to an int16_t. Note that we don't specify the return value of a conversion operator; it's assumed to be the type we're converting to. Again, no byte swap is needed.

While we're here, a mention of const is in order. The const on the conversion operator is a promise that the operator won't modify the state of the BigEndianInt16 object. In other words, a const member function is a read-only function. The compiler can check our code against this promise to detect coding errors and sometimes use it to generate better code.

Now the version for little-endian machines:

    #if BYTE_ORDER == BIG_ENDIAN
      class BigEndianInt16 { }; // as shown above
    #else
      class BigEndianInt16   // with byte-swapping on read and write
      {
        int16_t rep;        // in big-endian order
       
      public: 
        BigEndianInt16() { }
        BigEndianInt16(int16_t i) : rep(swapInt16(i)) { }
        operator int16_t() const { return swapInt16(rep); }
      };
    #endif

Since this code will run on a little-endian processor, the argument to the conversion constructor will be in little-endian byte order. We must therefore call swapInt16() to swap the bytes of the argument before storing them in memory in big-endian byte order. Likewise, the return value of operator int16_t() must be in little-endian order, so we swap the bytes of rep before returning them.

The function swapInt16() would look something like this:

    int16_t swapInt16(int16_t x) 
    { 
      return   x << 8 & 0xFF00 
             | x >> 8 & 0x00FF; 
    }

Now we're ready to try our example. Unfortunately, we can't use exactly the syntax we wanted, but we can get close. Recall that we wanted to write:

    void Header::updateHopCount() { hopCount++; }

We can't use the post-increment operator because the compiler doesn't know what "++" means when applied to an BigEndianInt16. We can, however, add one to the hopCount and store it back:

    void Header::updateHopCount() { hopCount = hopCount + 1; }

The compiler evaluates this code by taking the following steps:

The compiler first observes that the addition operation involves an int16_t (1) and something else (a BigEndianInt16). The compiler knows what addition means for an int16_t, so it looks for a user-defined conversion from BigEndianInt16 to int16_t. It applies our conversion operator BigEndianInt16::operator int16_t() to hopCount to obtain an int16_t.
The compiler now generates code to add the two integers.
The compiler then observes that the assignment operator requires assigning an int16_t (the sum) to a BigEndianInt16. It doesn't know what that means, but it does know how to convert an int16_t to a BigEndianInt16 by calling the conversion constructor.
Finally, the compiler performs the default assignment of class objects, which is a blind byte-by-byte copy of the contents. Since at this point we have two BigEndianInt16's, that's exactly what we want.

Notice that the conversion operator is called when reading the original contents of hopCount, and that the conversion constructor is called before writing the new value to hopCount. In each case the bytes are swapped if needed. These automatic conversions to and from user-defined class types are a standard feature of C++.

The byte swaps are now automatic. So long as we declare hopCount as a BigEndianInt16, it will be stored in memory in big-endian order, but it will be operated on in the proper host-endian order, regardless of the endianness of the host system.

Generalizing the field type

We can use the C++ template facility to build a set of data types with common characteristics. For example, rather than defining BigEndianInt16 or BigEndianDouble, we can generalize to a BigEndian template with a compile-time argument of int16_t or double:

So instead of many separate types such as

    BigEndianInt16  int16Field;
    BigEndianDouble doubleField;

we can define a BigEndian template and instantiate it for any desired type:

    BigEndian<int16_t> int16Field;
    BigEndian<double>  doubleField;

What's the advantage? Simply that by using a template we can write the code once for all big-endian data types, instead of doing a copy-paste-edit every time we need a new BigEndian type. If we need a new big-endian type we can just create it on the fly.

This time rather than write two BigEndian template classes (one each for big-endian and little-endian systems), let's write a single template class and push the optional byte-swapping as far down inside as possible.

The syntax for the declaration of the class template looks like this:

    template <typename T> 
    class BigEndian
    {
      // details of class, in terms of type "T"
    };

So what's inside those brackets? First, we need storage for the object itself:

    template <typename T> 
    class BigEndian
    {
      T rep;
      // more details, in terms of type "T"; see below
    };

Whatever type T is, rep will be of that type -- the compiler will determine the type from the template argument when we instantiate the template.

Next a routine to swap the bytes in a T object. Here we'll use a template function, so the compiler can automatically create the many variations we need. Here's the plan:

Since the big-to-little and little-to-big swaps actually use identical code, we can write a single routine to do either.
We'll add a boolean template argument to indicate which endianness we want, setting bigInMem true if we want big-endian byte order in memory.
Since T can be of any size, we'll use sizeof(T) to find out how many bytes to swap.

Written once, this single template function will expand into the exact code needed in any given situation:

    template <typename T> 
    T swap(const T& arg, bool bigInMem)
    {
      #if (BYTE_ORDER == BIG_ENDIAN) == bigInMem
        return arg;             // no byte-swapping needed
    
      #else                     // swap bytes
        T ret;
    
        char* dst = reinterpret_cast<char*>(&ret);
        const char* src = reinterpret_cast<const char*>(&arg + 1);
    
        for (size_t i = 0; i < sizeof(T); i++)
          *dst++ = *--src;
    
        return ret;
      #endif
    }

If the desired endianness matches the system endianness, we just return the argument without swapping. In the mismatched-endianness case, we'll return the argument with its bytes swapped.

Since a T can be any number of bytes in size, shifts and masks won't work -- instead we'll treat our T's as character arrays. We can't, however, simply assign the address of a T to a character pointer; we need a cast. We'll use reinterpret_cast<>, which completely turns off type-checking for the pointer assignment. We should indeed use reinterpret_cast<> only rarely, but this is one case where it's helpful.

Once we have the pointers and size, we simply copy the bytes in reverse order from the source to the destination then return the result. The iteration may appear slow, but a good optimizing compiler (such as gcc) will unroll the loop into straight-line code, at least for reasonable values of sizeof(T). Writing the code as a loop has the advantage of being correct for any T. (If necessary for some other compiler, we can unroll the loop explicitly for the common cases after first ensuring the general version works properly.)

Now we can complete the BigEndian template declaration and the complementary LittleEndian template:

    template <typename T> 
    class BigEndian
    {
      T rep;
      
      public: 
        BigEndian() { }
        BigEndian(const T& t) : rep(swap(t, true)) { }
        operator T() const { return swap(rep, true); }
    };
    
    template <typename T> 
    class LittleEndian
    {
      T rep;
      
      public: 
        LittleEndian() { }
        LittleEndian(const T& t) : rep(swap(t, false)) { }
        operator T() const { return swap(rep, false); }
    };

It really is that simple. Note the similarity to the BigEndianInt16 class we defined above. Instead of int16_t, we now have the placeholder T; the rest of the code is similar, except that both the big-endian and little-endian templates use swap(), passing in the desired endianness.

We could stop here, but there are still a few things to improve.

Extract a base class

First, we can reduce global namespace pollution by making swap() a private member of the template classes. We could do this by giving both BigEndian and LittleEndian their own copies of swap(), but we'd like to avoid such "copy-paste" coding if we can.

Instead, let's factor out a base class template, FixedEndian. As always with base classes, let's pull in everything possible from the derived classes to reduce redundancy.

    template <typename T, bool bigInMem>
    class FixedEndian
    {
      T rep;
    
      static T swap(const T& arg)
      {
        // Exactly as above.
      }
    
      public:
        FixedEndian();
        FixedEndian(const T& t) : rep(swap(t)) { }
        operator T() const { return swap(rep); }
    };

Now we can rewrite BigEndian and LittleEndian in terms of FixedEndian:

    template <typename T>
    class BigEndian : public FixedEndian<T, true>
    {
    public:
      BigEndian(const T& t) : FixedEndian<T, true>(t) { }
    };
    
    template <typename T>
    class LittleEndian : public FixedEndian<T, false>
    {
    public:
      LittleEndian(const T& t) : FixedEndian<T, false>(t) { }
    };

Notice that we can use a template class as a base class. (In this case the derived class is also a template class, but that's by no means required.) The constructor syntax is a bit involved, but if you remember that the base class's full name is FixedEndian<typename T, bool>, it should become clear.

Notice also that we've moved swap()'s bool bigInMem argument to the base class template argument list since otherwise the calls to swap() in the base class wouldn't know the desired endianness. Although not as common as type arguments, integral template arguments like bigInMem are completely legal, and are handy in situations like these. A bool template argument like bigInMem reduces to a compile-time switch in the much same way #ifdef does.

Eliminate dependency on `BYTE_ORDER`

Using a symbol like BYTE_ORDER makes our code dependent on the system headers, which can vary among different development environments. Though such dependencies are sometimes necessary, we can avoid BYTE_ORDER with a few lines of C++ code.

Consider:

    union HostEndianness
    {
      int i;
      char c[sizeof(int)];
    };

Recall that unlike a struct's fields, a union's fields all begin at the same memory address, so that what we store in one field we can read back from another field.

Let's store 1 into the i field of a HostEndianness union. On a big-endian machine, the four bytes of the c array field will be {0, 0, 0, 1}, but on a little-endian machine, the bytes will be {1, 0, 0, 0}. To determine the machine's endianness, we simply look at the first character in the array -- on big-endian machines it will hold 0; on little-endian machines it will hold 1.

In C++, we can encapsulate all of this logic into member functions of the union:

    union HostEndianness
    {
      int i;
      char c[sizeof(int)];

      HostEndianness() : i(1) { }

      bool isBig() const { return c[0] == 0; }
      bool isLittle() const { return c[0] != 0; }
    };

Now we can create a HostEndianness object (which sets i to 1) and invoke its isBig() method to determine the endianness of the processor. Since we only need to run a single method on the object, we don't even need to name the object -- we can just invoke the constructor and call the method on its output.

FixedEndian::swap() now looks like this:

    static T swap(const T& arg)
    {
      if (HostEndianness().isBig() == bigInMem)
        return arg;
      else
      {
        T ret;
     
        char* dst = reinterpret_cast<char*>(&ret);
        const char* src = reinterpret_cast<const char*>(&arg + 1);
       
        for (size_t i = 0; i < sizeof(T); i++)
          *dst++ = *--src;
    
        return ret;
      }
    }

While this may look as if it generates a lot of code, it needn't. A decent optimizing compiler will completely eliminate the HostEndianness object and the HostEndianness().isBig() == bigInMem expression, leaving only the boolean result to guide the code generation. Since it knows this boolean at compile-time, the compiler will generate either a direct reference to arg or the code to byte-swap arg before use. The result is strong encapsulation and great generality in the source code with no added cost at run-time.

Conclusion

This article demonstrates a number of C++ techniques and how to use them in a real-world system. We've demonstrated:

template classes to avoid copy-paste coding where only data type differs
using a template class as a base class, with template arguments supplied by the derived class
template functions, which expand differently based on the types of their arguments
conditional compilation guided by integral template arguments
using sizeof() to measure the size of objects in templates
using conversion constructors and conversion operators to build a class whose objects can participate in operations with atomic types
using static_cast<> and default constructors to allow "mapping" a structure onto a raw memory buffer
using reinterpret_cast<> to turn off type checking when necessary.

As used in this article, these techniques incur little or no run-time penalty, but they make the code shorter, clearer, more secure, and more robust under long-term maintenance.

The full source code is available here. If you've found this article useful, we'd like to hear from you. Please email the author.