Wednesday, June 17, 2015

To NoSQL or Not to NoSQL, That is the Question by David Ostrovsky

We are moving to a polyglot persistence world; however relational databases, in my opinion, should not be overlooked as a solution. David Ostrovsky talk is aims to orient you in the world of NoSQL databases and assist in making a good choice for your needs.

The seminar begins with a run through of a user story for a company that wanted to update to a NoSQL database where the solution end up not being optimal for the company needs. Mr. Ostrovsky, with all right, criticize the term NoSQL since this is a very rough separation. Below is the groupings that were presented in the seminar with short descriptions.
  • Key-Value
    • Simple collection of keys and values, very fast and scalable. 
  • Document 
    • Similar to key-value but the keys are known to the database so they can be searched and operated on.
  • Row
    • Similar to a RDBM database but with row partitioning with a id for each row.
  • Column
    • Turns the storage so the column information is stored sequentially on the physical disc. 
  • Graph
    • Uses graph structure to store data which allows for graph theory to search the data.

Using SQL has many advantages and many SQL flavors today support sharding and clustering. It has also been around for 40 years and it easy to find developers for SQL applications. However if the data structure is very complex or the dataset starts becoming so large that you have to give up functionality such as indexes and joins Mr. Ostrovsky suggests that it is time to look at NoSQL solution.

A use case of storing tweets on a row database is discussed, the row database is a good choice since it has very low latency for writes. This leads in to a run through of the CAP theorem. Since network partitioning cannot be controlled it basic comes down to selection if the database should have high consistency or high availability.

The futility in benchmarks is pointed out. The best comparison is to model the actual use case for the different databases and also populate the databases with this model and measure that.

The seminar ends with an example of using Cassandra to store tweets as mentioned earlier. For those not versed in the world I would think this seminar would be a excellent primer, however an even better solution would be to read NoSQL Distilled by Martin Fowler.

No comments:

Post a Comment