The current high-availability options for file sharing and SQL services seem shockingly primitive compared to what’s now available for modern web APIs and key/value stores like Amazon S3. “State of the art” servers like Owncloud are often in the Stone Age when it comes to having single points of failure. For example, Owncloud points to GlusterFS as a way of making its shared filesystem robust against a server failure, but if you read the GFS docs carefully, it looks like recovering from a failure is a seriously involved process that requires manually logging in (!) to fix things and get out of “degraded mode”.
Come on, it’s 2015! Individual server instances must be treated as disposable minions that can come and go at will. S3 doesn’t just “go down” when a single machine fails somewhere. It might glitch for a few seconds or require retrying a request, but that’s about it. There’s no “degraded mode”, just slightly more or less redundancy at any given time.
Even the stolid SQL databases like MySQL and Postgres offer very limited high-availability options (without exotic third-party plugins), basically consisting of a pair of duplicate servers where one can fail over to the other. The fail-over process is delicate; if something fouls up along the way, you could be left off-line for hours to resynchronize tables or rebuild them from backups. Surely this is not robust enough to handle something like a username/password database for Facebook.
After dabbling with MongoDB and looking at etcd for a while, I’ve begun to take for granted the notion of a self-healing cluster of individually unreliable server instances. When is this approach going to become an easy-to-use part of off-the-shelf databases? Standardizing this form of high-availability is important because, while you could write your own reliability layer on top of a set of unreliable servers, doing so is going to be labor-intensive to develop and difficult to maintain.
It turns out a lot of activity is taking place in this area now, and there’s even a name for the trend: “NewSQL”. The Usenix talk linked below is a good high-level overview of why and how databases are evolving in this direction: