Adaptive Computing Spans The DigitalGlobe
Given a choice between developing their own cluster and data management software or buying it off the shelf from third parties, most large enterprises would prefer to buy rather than build. This is not always possible, particularly at companies engaged in innovative businesses outside the norm. Such has been the case with satellite imagery provider DigitalGlobe throughout most of its two decade history, but gradually, as tools mature, the company is replacing homegrown code with outside tools.
A case in point is the cluster management tools for the systems that DigitalGlobe has created to ingest and process images from its fleet of five satellites, and another is the homegrown tools that the company has created to stage the delivery of image files from its massive tape and disk archives so they can be processed and distributed to customers.
DigitalGlobe has worked for many years with Adaptive Computing, a maker of cluster and cloud management tools, to make its clusters more flexible, and the two are working together to adapt the Moab job scheduler at the heart of the Adaptive tools so they can hook into the data scheduling systems, putting both compute and data under the control of the same job scheduler. This effort, called Big Workflow, is part of the Moab 7.5 update that came out this week and the idea is to commercialize the data scheduling hooks created for DigitalGlobe and a number of other customers. (You can see our detailed analysis of Big Workflow over at HPCwire.)
Everything at DigitalGlobe starts with the constellation of satellites that the company operates, which is the largest private satellite fleet in the world. The company was founded in 1993 and has three satellites that it launched itself – QuickBird, WorldView-1, and WorldView-2. DigitalGlobe was in the process of building its WorldView-3 satellite, due to launch sometime this year, when it acquired its main rival in earth imagery, GeoEye, for $900 million in cash and stock. GeoEye had two satellites of its own in orbit at the time of the merger – GeoEye-1 and Ikonos – and was building GeoEye-2 to boost its satellite fleet. (One of these new satellites will be launched this year and the other kept in reserve for when imaging capacity is needed.)
These DigitalGlobe satellites fly in polar orbits at between 300 and 500 miles above the surface of the planet and can do a loop in about an hour and a half. With each loop, the earth spins a bit, giving the satellite a slightly different patch of ground to capture in images. Some of the satellites take black and white images, others can do color imagery, and WorldView-3 will be able to see in other parts of the spectrum outside of the visible light range. A series of eleven remote ground terminals, the locations of which by necessity are not disclosed, are hooked to telecom networks by fiber optic or satellite links so they can pump images back to DigitalGlobe's datacenter in its Longmont, Colorado headquarters.
DigitalGlobe posted $612.7 million in revenues in 2013, an increase of 45 percent thanks in part to the acquisition of GeoEye but also due to the expanded use of satellite imagery by governments and businesses alike. With the world going increasingly mobile, mapping systems that show what is around us are increasingly important. The Earth is constantly changing, too, due to the climate, disasters, man-made construction, and other forces, so images have to be updated frequently to remain relevant. The satellites are constantly taking snapshots of the planet as they make their orbits, but DigitalGlobe is also proactive in that it will do imagery ahead of natural disasters, possible political conflicts, and other areas of interest so it can do before-after imagery on behalf of customers. This includes imagery for first-responders in the wake of a disaster that they know is coming, like a hurricane, or insurance companies that use imagery and analysis from DigitalGlobe to assess the damages caused by hurricanes.
Equally important to reactive and recent imagery is the archive that DigitalGlobe has amassed, which spans fourteen years. This archive allows the company to provide time-based imagery services to its customers over a much longer span. This is particularly important for land and crop management, or, in a famous example, for one customer that used satellite imagery to count cars in Wal-Mart parking lots to come up with an algorithm to predict the performance of Wal-Mart stock over time.
Jason Bucholtz, principal architect at DigitalGlobe, talked to EnterpriseTech about the challenges of managing the company's image processing clusters and the raw image data it chews on and the finished image products it creates based on those raw images.
The DigitalGlobe archive has over 4.5 billion square kilometers of total imagery, says Bucholtz, and with the current satellite fleet, it is generating over 1 billion square kilometers of new imagery every year. That works out to about 2 PB of images per year, which is then turned into 8 PB of imagery that it sells to customers. The US government is currently DigitalGlobe's largest customer, accounting for 58.4 percent of revenues in 2013, but the company is branching out its services to commercial customers for mapping systems and for a variety of other applications where geospatial data is raw material.
Like many public companies, DigitalGlobe is skittish about talking about particular suppliers for the systems that it uses to capture, process, and store images, but Bucholtz walked EnterpriseTech through the basics of its systems and the management issues it is trying to solve with the help of Adaptive Computing.
DigitalGlobe has adopted network-attached storage rather than storage area networks as its main storage platform. The NAS arrays have over 25,000 disk drives and have a combined 41 PB of raw capacity, says Bucholtz. Once an image is beamed down from the satellite to the remote ground terminal, it is immediately placed on a NAS array that has a lot of flash-based read cache on it because, as Bucholtz puts it, it is highly likely that this new image will be turned into a product for a customer. In many cases, these images are pushed out to customers who are subscribers to DigitalGlobe's cloud-based Earth imagery service. The company can now turn around such imagery in as little as 90 minutes.
As this fast imagery is being done, copies of that image are placed into the deep archive, which consists of a mix of disk arrays and tape libraries. The images are indexed and wrapped in metadata so they can be searched. As orders come in for time-series imagery sets, the data is fetched from this deep archive. DigitalGlobe has 22 PB of active archive capacity in its tape libraries – again, Bucholtz can't say which libraries the company uses, but Spectra Logic, IBM, and Oracle are the three main players here – and the robots in those libraries have traveled a combined 6,600 miles, moving tapes to drives for data to be pushed from archive to production.
"We try to leverage as open a protocol as possible, and NAS lends itself very well to that," says Bucholtz. "That allows us to use the best product with the best capabilities for the application that we are running at that time. Some NAS platforms are good at large block sequential I/O, and others may be better at a random I/O pattern. So the layout of the data is actually important, and I think that this is where Big Workflow comes into play. We want to ensure that our data is in the right place, on the right platform, for us to be able to do the transformations that customers demand."
At the moment, DigitalGlobe has a homegrown tool moves images in and out of production to coordinate with the compute cluster that does the basic sharpening, color correction, orthorectification (tying the image to a precise spot on the globe), and stitching work for images. The company had its own homegrown job scheduler for the compute cluster, but four years ago adopted Moab to replace it for both the Linux and Windows portion of that cluster. About two years ago, DigitalGlobe started using the dynamic provisioning features of Moab to drive its clusters more efficiently and make it respond to changes in the workload. Moab has hooks to talk to the open source xCAT tool for Linux and Windows Compute Cluster Server for Windows, which both do bare-metal provisioning under the direction of Moab. Now, as new work comes in, nodes on the compute cluster, which has just under 300 nodes, can be allocated for whatever work is necessary. The pixel manipulation functions of image processing, which includes color tuning, cloud removal, and orthorectification, tend to run on Linux while the format processing, which includes reformatting the data for GeoTIFF, NTF, or other formats, tends run on Windows.
In an analysis that Bucholtz did internally at DigitalGlobe in early 2012 and presented at a Gartner conference this past December, you can see the effect that the dynamic allocation of compute through Moab had on the amount of work that DigitalGlobe could push through the cluster:
Dynamic provisioning driven by the job scheduler has been a boon for DigitalGlobe, and so has the adoption of GPU coprocessors to do image processing. The company adopted GPU coprocessors several years ago, and has said in past public statements that it has seen anywhere from a factor of 10 to 20 speedup for its image processing applications thanks to GPUs. Bucholtz was not at liberty to talk about the details of the hybrid cluster, except to say that the nodes are based on two-socket X86 designs and that at 256 GB per node they had relatively fat memory compared to a typical HPC cluster. (DigitalGlobe is a reference customer for Nvidia's Tesla GPU accelerators, so it stands to reason that whatever number of nodes have GPUs in this compute cluster, the GPUs probably have the Tesla brand on them.) The important bit for DigitalGlobe is that the Moab job scheduler knows how to dispatch work to both CPUs and GPUs in the system.
DigitalGlobe is doing the testing to see how to integrate its homegrown image management system with the Moab 7.5 job scheduler's Data Expert feature, which makes the scheduler "data aware." The Data Expert component knows who owns the data, its security restrictions, and where it can be moved to so archiving and retrieval software can stage files (in this case images) in concert with the compute jobs that work on them.
At the moment, says Adaptive Computing, so many companies in varied industries have homegrown tools for this task that it will have to do custom services engagements to do such integration. But over time, as third party tools for data management mature, the idea is to provide hooks into these as well. The near-term goal at DigitalGlobe, in fact, is to find such a third party data management tool, stop supporting its own, and let Moab tell both the compute and storage what to do and when to do it.