An Article on Improving Data Efficiency

Arca avatar
In categories

On the CanSat, when the data is read from the sensors, it needs to be stored as efficiently as possible – we only have 16 MB of onboard storage. Furthermore, the data needs to be sent over LoRa, which has very limited transfer rate – in the hundreds of bytes per second – especially when optimised to increase range. Therefore, it is crucial to keep the data, both sent and stored, to be as compact as possible.

Data as a string

Here is a line of data:

[360502],[27.342033,38.079166],[26.861551,36.603718,100091.320312,105015.664062],[[0.003784,0.922485,-0.351501,-0.213740,0.290076,-0.290076,-17.250000,54.750000,-5.250000][0.004822,0.918884,-0.353760,-0.290076,0.221374,-0.366412,-17.400002,55.949997,-6.150000][0.004578,0.920715,-0.355896,-0.167939,0.251908,-0.305344,-17.099998,53.699997,-6.450000][0.007812,0.916138,-0.353027,-0.267176,0.213740,-0.312977,-17.849998,55.949997,-6.900000][0.006836,0.918274,-0.354492,-0.274809,0.267176,-0.335878,-18.299999,56.250000,-5.550000]],[1050,],[0,],[0.000000,0.000000,0.000000,18,20,47,0],[4.087133,90.594452],[6987776,14680064],[-96]

It is formatted as a slightly modified version of CSV – we call it “CSV with Arrays” – and is the data that is stored on the CanSat and sent over LoRa to the ground station. However, in this state, it is 627 characters, or 628 if we count the newline. If we were to send this amount of data every second, we would have to greatly compromise the LoRa range. Storing this data on the CanSat is a little more doable – roughly 2.3 MB of data per hour excluding filesystem overhead – which does not sound too bad. But we can do a lot better. Over 4X better.

Another downside of this is that it is a little difficult to parse – not hard by any means, however off-the-shelf libraries cannot be used. Even if we used plain CSV, it still is not 100% trivial and uses a relatively high amount of processor resources too. Using a format such as JSON reduces this problem but going that route would almost triple the already large data size. Not great. Formatting data in character form is inherently, by design, inefficient – it trades size for human readability.

Data as a struct

Note: I am using C/C++ as it is the language we are using on the CanSat, however if you are using an embedded version of Python (i.e. CircuitPython or MicroPython), firstly, rethink your decision and secondly, there isn’t an exact equivalent of a struct. However, you can use a dictionary, tuple or even a plain array. The closest native equivalent would be to use a class with a field for each unit of data you need to store, for example:

class GPSData():
    def __init__(self, timestamp, latitude, longitude, altitude):
		self.timestamp = timestamp
        self.latitude = latitude
        self.longitude = longitude
        self.altitude = altitude

For demonstration purposes, I will be using a simplified data record containing the following data:

  • Timestamp (32-bit unsigned integer)
  • Latitude (float)
  • Longitude (float)
  • Altitude (float)

(I am using GPS data as an example but this will work with any type of data)
In character form, when populated with sample data (assuming it is formatted as plain CSV) it would be:

4294967295,27.342033,-90.594452,91.320312

This data, including the newline at the end, is 42 characters, meaning 42 bytes. Now assume we have this data in RAM. It would take up (1*uint32_t) + (3*float) = (1*4) + (3*4) = 16 bytes. That’s quite a bit more efficient, isn’t it?

To do this in C is extremely simple: create a struct that has fields for each piece of data.

struct GPSData {
	uint32_t timestamp;
	float latitude;
	float longitude;
	float altitude;
} GPSData;

In this form, it isn’t much use in RAM though. We can now cast it to a char pointer:

struct GPSData gpsData{};

char *p_gpsData = (char*)(&gpsData);

This can now be sent wirelessly, stored on disk, etc. Data parsing is greatly simplified too – just cast it into a struct. No additional processing needed.

TL;DR: Store/send data as is is stored in RAM, it’s much more efficient.

Published under:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *