Is the database dead? Its an interesting question, and one that analysts like Donald Feinberg from Gartner are beginning to raise. The noSQL movement is in full gear, with a variety of alternatives to the traditional RDBMS like Google’s BigTable, Amazon’s Dynamo, Hadoop HBase. In fact, most of the new wave of large internet-based businesses uses a variation of these key/value stores. So why are they doing this? What is wrong with the traditional database that would drive these companies to alternatives? Were they just looking to avoid paying the DBMS vendors, or has the world simply moved on?
As it turns out, the nature of these new internet-based businesses (Google, Facebook, LinkedIn, Yahoo, Amazon, etc.), the kinds of information they need to store, and their patterns of access were what guided them to find a new kind of solution. There are three primary characteristics of the way these companies manage their data that has baffled traditional databases:
- Their data tends to be distributed across large grids of systems, often geographically dispersed
- Their data is inconsistently structured (no consistent schema)
- Access to the data is more search-oriented (what would amount to full table scans without predictable indexes in an RDBMS)
These three characteristics are particularly nasty for your traditional RDBMS to handle. Most RDBMSs were designed to run on large SMP systems, as they rely heavily on fast channels to disk and memory. Some vendors, notably IBM with DB2 pureScale and Oracle with RAC, have provided the ability to scale out across servers (though only pureScale has been shown to have near linear scalability over 100’s of nodes).
From a structure perspective, RDBMSs require a set schema. Some databases like DB2 have the ability to also store XML data natively, but even that needs to have a set schema. When data isn’t rigidly structured, there is no efficient way for the database to manage it.
In addition, databases are built for fast transactional retrieval based on specific keys and indexes, or queries based on specific relational structures. Search-like scanning queries not based on indexes can be horribly inefficient in a RDBMS.
So I guess that means the database is dead, right? Well, not so fast… While the key/value stores are extremely scalable, great at fast search retrieval and able to deal with inconsistent data structures (or even unstructured data), they aren’t particularly efficient at managing transactional application-oriented access. They are really designed for pulling back everything you know about something based on a keyword. When you know exactly what multiple related things you want (like the accounts for a specific customer), relational databases are much more efficient (and much more predicable).
In addition, things like transaction integrity, security, compression, and workload management are much more advanced in RDBMSs – and all of these are table stakes for most business applications. I liken the comparison to the initial debate about REST vs. SOAP. REST, with its lack of restrictions and overhead, appeared poised to overtake standards-happy SOAP. In the end, they both found their niche – SOAP in places where the depth and control were needed, and REST in places where that stuff doesn’t matter.
That said, there is nothing to prevent these new kinds of databases from developing advanced capabilities. And the advantages of free-structured search, massive scalability, and data type independence make key/value stores a very attractive addition to a data management portfolio. So, while I think it will be a while before they mature enough to be able to deal with mainstream application processing, I do think that more and more companies will be adopting them as a complement to their RDBMSs – and doing so soon.