As my frequent readers would have noticed already, I have been diving into NoSQL databases recently – in particular Document Databases. But why do we need NoSQL databases in the first place. Surely the relational databases we have been using work fine – in fact they work very well in most situations.
The problem with them is they don’t always scale very well – and when I mean scale, I mean scale really big – across multiple nodes in a cluster. I alluded to this in the first post in this series, but as an aside I have decided to explore why in further detail in this post.
It all comes down to the CAP (or Consistency, Availability and Partition Tolerance) Theorem. This theorem states that when working with distributed systems you can only achieve two of the three goals, so you need to decide what is important to you.
If Consistency is most important then you would choose a relational database which excels at Consistency. Examples of when Consistency would be important are in financial systems or nuclear power stations, where it is critical that every piece of data is accounted for.
However, in most cases, “eventual consistency” is all that matters. For example in a content management system if a Page Editor updates a page it is not critical that every user sees that change immediately, as long as at some time in the near future (seconds or minutes) the update is visible.
In most applications it is more important that some version of the data is available at all times – i.e. the A or Availability of the CAP theorem. In systems that focus on Availability, server uptime is more important than the exact version of the data.
Finally, the P or Partition Tolerance defines the ability of the data to be distributed (partitioned) across multiple servers – this allows for hugely scalable systems, like Google or Facebook.
Document Databases (as with most NoSQL databases) excel at the A (Availability) and P (Partition Tolerance) corners of the triad and so are less good at Consistency - eventual consistency is good enough.
Thus Document Databases are fast and can scale, but would never be used to run a financial system or a nuclear power station.