Is the database dead?

Is the database dead? Its an interesting question, and one that analysts like Donald Feinberg from Gartner are beginning to raise. The noSQL movement is in full gear, with a variety of alternatives to the traditional RDBMS  like Google’s BigTable, Amazon’s Dynamo, Hadoop HBase. In fact, most of the new wave of large internet-based businesses uses a variation of these key/value stores. So why are they doing this? What is wrong with the traditional database that would drive these companies to alternatives? Were they just looking to avoid paying the DBMS vendors, or has the world simply moved on?

As it turns out, the nature of these new internet-based businesses (Google, Facebook, LinkedIn, Yahoo, Amazon, etc.), the kinds of information they need to store, and their patterns of access were what guided them to find a new kind of solution. There are three primary characteristics of the way these companies manage their data that has baffled traditional databases:

  1. Their data tends to be distributed across large grids of systems, often  geographically dispersed
  2. Their data is inconsistently structured (no consistent schema)
  3. Access to the data is more search-oriented (what would amount to full table scans without predictable indexes in an RDBMS)

These three characteristics are particularly nasty for your traditional RDBMS to handle. Most RDBMSs were designed to run on large SMP systems, as they rely heavily on fast channels to disk and memory. Some vendors, notably IBM with DB2 pureScale and Oracle with RAC, have provided the ability to scale out across servers (though only pureScale has been shown to have near linear scalability over 100’s of nodes).

From a structure perspective, RDBMSs require a set schema. Some databases like DB2 have the ability to also store XML data natively, but even that needs to have a set schema. When data isn’t rigidly structured, there is no efficient way for the database to manage it.

In addition, databases are built for fast transactional retrieval based on specific keys and indexes, or queries based on specific relational structures. Search-like scanning queries not based on indexes can be horribly inefficient in a RDBMS.

So I guess that means the database is dead, right? Well, not so fast… While the key/value stores are extremely scalable, great at fast search retrieval and able to deal with inconsistent data structures (or even unstructured data), they aren’t particularly efficient at managing transactional application-oriented access. They are really designed for pulling back everything you know about something based on a keyword. When you know exactly what multiple related things you want (like the accounts for a specific customer), relational databases are much more efficient (and much more predicable).

In addition, things like transaction integrity, security, compression, and workload management are much more advanced in RDBMSs – and all of these are table stakes for most business applications. I liken the comparison to the initial debate about REST vs. SOAP. REST, with its lack of restrictions and overhead, appeared poised to overtake standards-happy SOAP. In the end, they both found their niche – SOAP in places where the depth and control were needed, and REST in places where that stuff doesn’t matter.

That said, there is nothing to prevent these new kinds of databases from developing advanced capabilities. And the advantages of free-structured search, massive scalability, and data type independence make key/value stores a very attractive addition to a data management portfolio. So, while I think it will be a while before they mature enough to be able to deal with mainstream application processing, I do think that more and more companies will be adopting them as a complement to their RDBMSs – and doing so soon.

Advertisements
Tagged , , , ,

2 thoughts on “Is the database dead?

  1. […] Is the database dead? « Michael Curry: Information Explosion […]

  2. Nice blog post Michael. I agree 100%. The noSQL movement has envealoped a number of products that were developed to address situations where the RDBMS was not the right answer. However, the RDBMS continues to be the right answer for many, many situations. You only have to look at the relative addressable markets for RDBMS and noSQL products to get an idea of the huge gulf that exists between them. While the noSQL movement might be catchy, the fact remains that it is a collection of products that are still in the “emerging technology” category in terms of product and market maturity, and also that the noSQL products that eventually cross the chasm will be adopted primarily in the situations they were designed for (i.e. situations where RDBMS is not the right answer).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: