Disclaimer: This is a fairly specific benchmark, it addresses only serialization performance of a single big Python object. That was the problem we we

When to dump json?. Disclaimer: This is a fairly specific… | by Piotr Zakrzewski | plotwise | Sep, 2021 | Medium

submited by
Style Pass
2021-09-28 08:30:03

Disclaimer: This is a fairly specific benchmark, it addresses only serialization performance of a single big Python object. That was the problem we were solving at the time at Plotwise. It might not be relevant to any other case, but you can probably adapt our benchmark code to test for a different scenario.

You may instinctively know that JSON is not the most efficient format. It is human readable, which means it is pretty redundant as far as efficient encoding goes (think for instance of all the keys that need repeating, or the need for utf-8 encoding). Let’s say you are dealing with a single, big structured data object (Python class instance more specifically ..), is it worth using binary encoding? Or better dump it into Json? What about pickling? How much speed can you possibly gain at what cost to the code complexity?

The use-case we have at Plotwise is storing a snapshot of a route planning. Those plannings grow with the number of delivery events, vehicles and driver shifts. They also change dynamically as they are optimized over time. It is important for us to be able to quickly serialize (…dump) the planning and persist it, but also to be able to restore it from persistence. The planning is a nested multi-field JSON with mostly numeric fields and some low-cardinality string fields. A typical planning would be anywhere between 200 kb and 1.5 mb. At this size the benefits of switching to a more compact format should start to be apparent. But how big would it be exactly? Is it worth it?

Leave a Comment