Docker as a High-Performance Enabler for Cloud Storage
The container-based service Docker is gaining massive momentum in the industry, and for many good reasons. The current appeal of Docker stems from issues associated with running VMs, namely that each one has to be provided with virtualized memory and storage resources. Containers make it far easier to run enterprise-grade services and address concerns for data portability, scaling, processing, performance, extensibility and latency.
When it comes to storage, however – there’s a rub. Early implementers of Docker utilizing storage applications say that accessing this storage is the most significant challenge in working with Docker, apart from designing the Docker containers themselves. Typically, containers perform a task on the data stored within them, and then disappear, which can be problematic if the data in a container has to be permanent.
Storage options for Docker implementation are nascent, yet emerging modalities appear to follow two distinct approaches. The usual approach is to place storage in compute. Flip this concept around and place the compute into the storage, and we are getting into a very interesting area of exciting new options for high performance enterprise-grade cloud storage.
Docker Storage Options in the Cloud - Storage in Compute
This is currently the most common way of using Docker, mostly because it is a rather simple way of implementation into existing architectures. Docker is pointed at the local storage of the server on which it’s running (this is how one would be using Docker in a public cloud like Amazon Web Services or Microsoft Azure).
Given that server storage is local, as implemented by most cloud providers, this approach offers neither high availability (HA) nor sharing. Another downside is that this approach creates isolated islands of storage within the physical servers. When one application runs out of space, it cannot access excess capacity allocated to another application.
An additional concern is that in widely distributed systems, even with Docker containers, there can be a lack of privacy and isolation, which are important for applications hosting sensitive information or requiring high Quality of Service (QoS) (due to noisy neighbors).
Compute inside the Storage
When it comes to cloud storage, perceptions exist that performance and QoS will be less than on-premise, traditional arrays. While this perception is not necessarily always accurate, improving overall performance is generally a priority, regardless of the architecture. There are multiple ways to achieve this and enabling Docker right inside the storage is one such unique approach. By placing the compute right inside the storage, users can bypassing networking, and achieve practically zero latency between the processor and the storage in the cloud.
There are several important advantages to this approach, especially for high availability and high performance needs.
- Reduced Network Latency
The data doesn’t have to travel over a network that connects servers and storage if the application is running inside the storage and outputs right into the storage.
- Sharing
One of the main advantages of centralized storage is that it’s shared. That means that the same data set can be accessed by multiple containers and/or virtual machines. For data sets like map data, this is a requirement. Without it, each container would need its own copy of the map data.
- Persistence
Despite many Docker use cases being ephemeral and local, certain common applications – such as media transcoding – require persistence. In those instances, the data needs to be shared by, and to, multiple VMs and containers and the same data needs to be persistent (the resulting transcoded videos need to be available for streaming after the container has finished transcoding). Docker is an elegant solution for this, because it can be programmed to be launched when needed (e.g., when a new media file is added) and then terminated upon completion (e.g., when the transcoding is done).
- Extensibility
Since the containers run inside the storage, they give users the ability to write code that adds to the storage functionality. By using containers within the storage, customers can quickly modify, expand and customize the storage behavior and properties, by introducing code for new functionality.
- Data-Aware Storage
Traditional storage solutions keep and protect the data, but are not aware of the data content. With Docker, it’s relatively easy to write code that runs within the storage and looks into the data, for number of useful purposes like:
- Indexing of Big Data for quick future search
- Search for illegal content
- Security related scans
- Data format change – for instance converting multimedia file formats (different video/images standards and resolutions)
Containerization is allowing savvy IT teams to extend a storage service in such fundamental ways – without forcing users to take a step backward in enterprise-grade features that are critical to performance. We’ve only just begun to see its utility and benefits for high-performance applications, as enterprises find new Docker-based solutions to their business challenges every day.
Oded Kellner is Vice President of Product Management at Zadara Storage, an enterprise Storage as a Service (STaaS) player and early adopter of containers for high performance cloud storage.