The Register has a long and proud history of publishing snarky, provocative articles. Recently a piece on NoSQL and Google’s Spanner database appeared (Google goes back to the future with SQL F1 database) that I felt deserved a response.
First, let me provide my own snarky summary of the piece:
Opening: NoSQL is a dead end.
Middle: Hey, look how Google has enhanced its own highly scalable NoSQL technologies to make awesome stuff.
Closing: NoSQL is not necessary for scalability.
Now, the article is more nuanced than that, and after having exchanged emails with the author of the piece, Jack Clark, I know my characterization is unfair, but I don’t know that readers (and customers) will recognize that. So, let’s talk about where we’ve been and where we’re going.
When computers were still young, it was obvious that storing and retrieving data was among the most vital concerns. You can’t process data that you don’t have.
As Dr. Eric Brewer discussed at RICON 2012, what we now call NoSQL actually predates the SQL model. These ideas have been around for a long time, but the relational model was so powerful (and the ability to use a declarative language–SQL–so attractive) that it has been the focus of the industry and academia for many decades.
Over the last several years, the companies that are pushing the edge of data processing on a global scale (e.g., Google, Amazon, Yahoo, Facebook) have found it necessary to rethink data storage. The relational model starts to show its rough edges as soon as you need two servers for redundancy or increased capacity, much less thousands of servers in multiple data centers.
With the explosion of the web came a stream of research papers from Google and later Amazon that form the foundation for much of today’s NoSQL world:
And to add to the mix:
From these ideas (and from pioneering work by Leslie Lamport and others) sprang Hadoop, Riak, Cassandra, and many other platforms.
NoSQL is problematic as both a label (see the Basho blog) and a technology.
To achieve scalability, the guarantees and declarative model of relational databases were largely abandoned. Instead, scalable data stores provide more granular primitives that are easier to reason about for a distributed architecture but place more work at the feet of application developers.
Furthermore, data modeling is more complex, and is vital to do well or else the application may fail to scale properly (or work at all).
And Jack Clark highlights a valid concern: it does seem as though some NoSQL companies do too good of a job marketing their technology without regard for the very real challenges that still exist.
(And developers are often a little too eager to embrace new technologies. Friday morning at a conference I encouraged someone to rethink using Hadoop or Riak for a 2GB/day analytics problem. NoSQL is rarely the answer if your data can fit on a USB stick.)
The relational world has had decades to standardize and refine tools, and train generations of administrators and developers on its best practices and perils. The NoSQL world lacks that level of maturity.
Developing powerful tools and infiltrating the academic world (in other words, following the same path that the relational databases started down many years ago) will help, but will take quite a while.
An obvious approach to making NoSQL more palatable to developers and thus more attractive to customers is to add features that bring it closer to the relational model. Lo and behold, this is exactly what Google is doing with Spanner.
Conversely, PostgreSQL last year announced new NoSQL-like features.
So disregard the hyperbole, treat marketing claims from any company with healthy skepticism, and treat terminology like NoSQL (or NewSQL) as less of a classification and more of a shorthand for some basic ideas that are as relevant today as they were five years ago. The relational and non-relational worlds have much in common, much to learn from each other, and it’s inevitable that databases will increasingly leverage ideas from both traditions.
riak dynamo distsys nosql