Daniel Irvine on building software
The case for data consistency
17 May 2012
This issue came up at work today and I wanted to briefly discuss it. We’re adding a new feature that would ensure a 100% validity of objects in a certain data store. I’ll describe the scenario: Suppose you have an object A in the data store and you want to ask the question is A valid for use? The system has a large number of objects of this type and performs a variety of operations on them, from simple “read” operations to more complicated operations like “install” and “calculate dependencies”. Before any of these operations are performed we must ensure the validity of the object, and so an “is valid” operation becomes very important.
Given that, isn’t it better to ensure that all objects going in to the data store are valid and remain valid? This way, all operations that read the data can safely assume that the answer to the validity question is always yes, that the object is valid. So there’s no need to do any extra processing for these operations, and the “is valid” operation itself becomes a no-op.
The feature we are building should ensure that only valid objects can be added and that any operations that may cause an object to become invalid are either blocked or cause affected objects to be removed from the data store before the operation is performed.
Generalizing this idea is useful: keeping data as consistent as possible will simplify your system. The more consistent your data is, the less conditional (if [this] else [that]) processing will be required for all operations acting on that data.
In this particular case, by ensuring that objects in the data store are consistently valid, we no longer have to worry about checking for validity.