Data Serialisation
Another essential need in embedded development is an ability to serialise data. Most embedded products read data from some kind of sensors and/or communicate with the control centre via some wired or wireless serial interface.
Before data is sent via a communication link, it must be serialised into a buffer, and when received, deserialised from bytes also in a different buffer on the other end. The data may be serialised using big or little endian, based on the communication protocol used. The embxx library provides a generic code with an ability to read and write integral values from/to any buffer. Here is the source code for the functions described below.
The functions below (defined in namespace embxx::io
) support read and write of an integral value using any type of iterator:
These functions receive reference to iterator of a buffer/container. When bytes are read/written from/to the buffer, the iterator is incremented. The iterator can be of any type as long as it supports dereferencing (operator*()
), pre-increment (operator++
) and assignment to dereferenced object. For example, serialising several values of various lengths into the array using big endian:
The contents of the buffer will be: {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c 0x0d, 0x0e, …}
Similar code of reading values from the buffer would be:
Another example is serialising data into a container that has push_back()
member functions, such as std::vector or circular buffer. The data will be added at the end of the existing one:
Depending on a communication protocol there may be a need to serialise only part of the value. For example some field of communication protocol is defined having only 3 bytes. In this case the value will probably be stored in a variable of std::uint32_t
type. There is similar set of functions, but with additional template parameter that specifies how many bytes to read/write:
So to read/write 3 bytes will look like the following:
Sometimes the endianness of data serialisation may depend on some traits class parameters. In order to be able to choose “Little” or “Big” variant functions at compile time instead of runtime the tag parameter dispatch idiom must be used.
There are similar read/write functions, but instead of being differentiated by name they have additional tag parameter to specify the endianness of serialisation:
The traits::endian::Big
and traits::endian::Little
are defined as empty tag classes:
For example:
So the code above is not aware what endianness is used to serialise the data. It is provided as internal type of Traits
class named Endianness
. The compiler will generate the call to appropriate writeData()
function, which in turn forward it to writeBig()
or writeLittle()
.
To serialise data using big endian the traits should be defined as following:
The interface described above is very easy and convenient to use and quite easy to implement using straightforward approach. However, any variation of template parameters create an instantiation of new binary code which may create significant code bloat if not used carefully. Consider the following:
Read/write of signed vs unsigned integer values. The serialisation/deserialisation code is identical for both cases, but won't be considered as such when instantiating the functions. To optimise this case, there is a need to implement read/write operations only for unsigned value, while the “signed” functions become wrappers around the former. Don't forget a sign extension operation when retrieving partial signed value.
The read/write operations are more or less the same for any length of the values, i.e of any types:
(unsigned) char
,(unsigned) short
,(unsigned) int
, etc... To optimise this case, there is a need for internal function that receives length of serialised value as a run time parameter, while the functions described above are mere wrappers around it.Usage of the iterators also require caution. For example reading values may be performed using regular
iterator
as well asconst_iterator
, i.e. iterator pointing to const values. These are two different iterator types that will duplicate the “read” functionality if both of them are used:
It is possible to optimise the case above for random access iterator by using temporary pointers to unsigned characters to read the required value. After retrieval is complete, just increment the value of the passed iterator with number of characters read.
All the consideration points stated above require quite complex implementation of the serialisation/deserialisation functionality with multiple levels of abstraction which is beyond the scope of this book. It would be a nice exercise to try and implement it yourself. Another option is to use the code as is from embxx library.
Last updated