Flocker Adds Storage Virtualization To Docker Containers
The Docker application container system developed by platform cloud provider dotCloud, now known as Docker Software, has been gaining momentum in recent months and is on the way to become an alternative virtualization method alongside of full-on server virtualization. Docker is far from being complete, however, and some hosting experts who know a thing or two about data management on public and private clouds are applying that expertise to help virtualize the storage that underlies Docker containers.
Docker is a means of encapsulating a Linux operating system runtime and an application stack in a software container with specific resources allocated to it. The containers allow for application stacks to be managed as a whole and for multiple application stacks to be run side-by-side on physical servers, much as you can do with a server virtualization hypervisor such as KVM from Red Hat, Xen from Citrix Systems, ESXi from VMware, or Hyper-V from Microsoft. The important thing about Docker is that it is a much thinner layer of software and therefore imposes much less of a performance penalty compared to hypervisors. The commercialized version of Docker 1.0 was announced back in June, including the Docker Engine, which provides the hardware abstraction layer, and Docker Hub, a repository for software images that run inside of containers.
While Docker does a fine job of abstracting the compute and memory resources and doling them out to containers, Docker really only covers a subset of the application stack. So, for instance, Docker is fine for Web server front ends or API servers that have access to shared storage and that are replicated in the application stack for high availability. But, says Luke Marsden, CEO at ClusterHQ, Docker cannot offer the same flexibility and portability for application components such as message queuing servers, NoSQL databases, or relational databases.
"Docker is great for stateless applications that do not write data to a file system," Marsden explains to EnterpriseTech. "But as soon as you put an application into a container and you mount a file system, that container gets stuck on that machine."
All of the same things that have been added to server virtualization hypervisors to virtualize storage and networking in addition to compute and memory now have to be added to the Docker stack. And ClusterHQ is taking on the storage issues first. This means virtualizing local storage for Docker containers and giving them backup and restore capability as well as failover and high availability for both the container and its associated storage. It also means virtualizing storage so containers and their associated storage can be moved around as workloads change on a cluster, and allowing for clustering and orchestration for complex multi-tier applications, treating them as a unit instead of disparate components.
ClusterHQ has some experience in this area already, which is why it is taking on the storage virtualization challenge for Docker with a product it is calling Flocker. The company spent five years developing a hosting stack called HybridCluster, which was based on the FreeBSD variant of Unix and mashed up its own container implementation, called Jails, with the open source implementation of the former Sun Microsystems' Zettabyte File System (ZFS) to provide exactly the needed capabilities. This was all done on local storage without resorting to a shared storage area network hooked into all nodes in the cluster (which is the easy but expensive way to do it). The HybridCluster software allows for Jails to be live migrated, and because it is synchronized with ZFS, the data underpinning the Jails is moved along with the application in a container. And therefore, you can do high availability and distributed resource management on HybridCluster in a way that is not possible today with Docker. (VMware's vSphere extensions and vCloud Suite add-ons do the same for the ESXi hypervisor, and have for years. This layer of management software is a big part of the company's revenue stream these days, although VMware does not say how much in its financial reports.) HybridCluster can run on a private cloud in your own datacenter atop OpenStack/KVM or VMware vSphere or it can be layered on top of Amazon Web Services, Google Compute Engine, or Rackspace Cloud. The HybridCluster software was designed by hosters for hosters, as Marsden puts it, and its ideas have been battle-tested supporting thousands of applications in production. They are also the foundation of the Flocker companion to Docker that ClusterHQ has just launched.
Flocker 0.1 is not a complete product yet, but it will nonetheless be useful for early adopters of Docker technology looking for a storage virtualization layer. At the moment, Flocker requires ZFS to be used on the compute/storage nodes in the cluster. Flocker is based on the open source implementation of ZFS, and Marsden says the company is heavily involved in the OpenZFS project and contributes heavily to the Linux variant. ClusterHQ is looking at making the storage back-end for Flocker pluggable so other file systems can underpin Flocker, but at the moment Marsden says that BTRFS, the other up-and-coming advanced file system for Linux machines, does not have all of the necessary features to do what Flocker needs.
The way Flocker works is simple enough to explain. The tool creates a virtual disk volume, called a Flocker volume, that rides atop ZFS local storage. This is where the persistent data behind a database server or a NoSQL datastore node resides. Docker rides atop the Linux kernel, partitioning up the operating system and laying down containers for applications. Flocker drops a network proxy on top of that that links all of the server nodes together in the cluster, all of the containers on a single node to each other, and all of the containers to the outside world where users access the applications. This proxy is the secret sauce because it manages all of the links between containers, storage, and users as both the containers and their storage are live migrated around a cluster.
At the moment, Flocker is just a command line tool, and Marsden says it will take somewhere between 12 and 18 months to get Flocker fully developed to a 1.0 release, which will include a runtime environment and features that allow for the movement of data and containers to be done in a fully automated fashion. At the moment, Flocker 0.1 is free and open source, but at some point there will likely be an enterprise edition with extra goodies that have a subscription as well as subscriptions for supporting the open source edition.
Marsden says that Flocker could be adapted to support FreeBSD Jails in addition to the plain vanilla LXC containers that are part of the current Linux server editions from Red Hat, SUSE Linux, Canonical, Oracle, and others. But the question is whether any of these will be necessary, given the popularity of Docker. And the real question is when Docker Software just bites the bullet and acquires ClusterHQ to unite the Docker and Flocker tools. Docker has just raised $40 million, so it has the cash if it wants to.
ClusterHQ has also raised $3 million in seed money to fund the development of Flocker, so it doesn't need to be acquired. The company is based in Bristol, England and has a dozen employees as it launched the Flocker effort. In addition to Marsden, who was a software engineer at TweetDeck (acquired by Twitter three years ago for £25 million) as well as being the CEO and co-founder at ClusterHQ, the Flocker team includes chief architect Jean-Paul Calderon, who is heavily involved in the Twisted event-driven network engine project, and chairman Mark Davis, who was a founder and CEO at storage virtualization company Virsto Software, which VMware acquired in February 2013 for an undisclosed sum.
Maybe VMware should buy Docker Software and ClusterHQ before this Docker container craze gets even crazier. At the moment, the majority of VMware's server virtualization business is still driven by the need to partition servers and aggregate Windows Server workloads, but if Docker takes off as many expect, it could become the preferred method of virtualization for Linux. And that would eat into VMware's business. Alternatively, Red Hat could swoop in and position the Docker/Flocker combo as an alternative to its own KVM as well as other hypervisors. Docker has a valuation in the range of around $400 million, and at five times its seed funding, ClusterHQ could probably be had for around $15 million. Call it a cool $600 million for a nascent – but important – software technology. Anything that emulates the way that Google partitions and manages its workloads, which Docker plus the open source Kubernetes container manager created by Google does, has to command attention in the market.