Driverless Cars Generate Massive Amounts of Data. Are We Ready?
When future drivers tell their kids to “get in the car” they’ll be instructing them to climb into a mobile computer – that’s consuming very large, and growing, quantities of data. Autonomous vehicles (AV) are essentially rolling computers connected to the environment, infrastructure and the Internet. This computer continuously produces and processes data to keep passengers, pedestrians, and other drivers safe on the w ay to their destination.
AV data is growing even faster thanks to autonomous test vehicles, which generate between 5TB and 20TB of data per day per vehicle. All this data must be received, stored, protected, analyzed in real-time – and retained for research and legal information.
While most automotive manufacturers and suppliers understand the need for data analysis and retention, few have a strategy to digitalize and manage massive amounts of data. However, technology solutions exist to make this possible.
The "3-2-1" Rule
The IT infrastructures of global automakers are highly complex and have often grown haphazardly over the decades. One of the most glaring shortcomings of these environments is traditional silo data structures, which prevent cost-effective, powerful, and agile data management. They also impact data protection strategies, which are critical for petabyte-sized volumes of AV data with long retention periods.
Intelligent data protection starts with a simple strategy, the classic "3-2-1" rule, in which at least three copies of data are stored on two different media while one copy is held off-site. In fact, new concerns regarding ransomware suggest one of those copies should also be offline. Even in the era of the mobile and multi-cloud, this is still the optimal method against hardware failures, cyber-attacks, data corruption and data loss.
Tiered Storage
AV makers and suppliers use high performance storage to carry out their critical analyses and simulation tasks involved in the development of AV AI.
For longer-term data retention in cold storage or archives, data does not require high performance/low-latency access. The primary drivers are low cost and security. Tape remains one of the best media for long-term on-premises storage of massive data. Public cloud solutions have gained a lot of visibility and are commonly deployed where data sets are not long lived, and where storage demand needs a high degree of flexibility. However, IT should do its due diligence around cloud egress charges, migration policies, physical and cyber security, and backup indexing and search ability.
When choosing the storage medium, IT should always follow tiered storage principles, which move data to optimal storage media according to different requirements, including access time, security, performance and cost. The goal is to reduce overall IT costs by assigning high performance media to active data and moving less active data to less expensive storage tiers.
Automated Tiering
Although IT can manually tier data among storage devices, this kind of manual labor does not work for long in busy data centers. Storage tiering requires automation to work efficiently, especially across massive data volumes.
A properly implemented automated tiering solution will not be noticed by users. If a file has been automatically moved to a different storage media, the storage solution enables end-users to locate and launch their files without knowing where it physically resides. This is a critical capability in the collaborative environments popular in automotive R&D, including connected vehicle (CV) and AV development.
Solutions such as these enable what is commonly referred to as “active archives,” in which data is always placed on the media that is best suited for where it is in its lifecycle. If you think about it, data typically lives most of its life in the archive; being able to retrieve that data at will and have it automatically moved to performance storage while it is active is exactly what users need to manage productivity and cost.
Another crucial capability is detailed monitoring, alerting, and scan/search capabilities for demanding workloads in complex IT environments.
Storage Platforms for the 5G Age
There are promising new developments in the 5G mobile communications standard. 5G demonstrates speeds of 10Gb per second and latencies of less than 10 milliseconds, which delivers high speed, wide bandwidth and extremely low latency for mobile networks carrying massive AV-related data. Once that data is collected and ingested into storage, the platform should enable error-free communication, immediate protection and instantaneous analysis.
While new entrants are just discovering the massive data explosion associated with AV development, data storage solutions have already been proven to address the requirement of a complete end-to-end workflow. To enable mass deployment of autonomous vehicles a reality, car manufacturers and technology providers must work together to develop the device intelligence needed to make driving safer and more efficient.
Accomplishing this requires intelligent data management on massive data sets that are at the heart of AV development. Many auto makers and their suppliers are only now discovering the massive amounts of data generated by test vehicles that are the starting point of AV development. The good news is there are great data management and storage solutions that have been proven to address all the requirements in an end-to-end autonomous vehicle development workflow.
Mark Pastor is Director of Products and Solutions, Quantum Corporation.