Your Trusted Data Partner
As the Trusted Data Partner of the transformation programme, Anaeko performed data optimisation services including a discovery and rapidly deployed teams across data optimisation, hybrid cloud integration, integrated analytics and multicloud DevOps. Each team applied an iterative approach to understanding, analysing, and iterating data optimisation solutions that were used to communicate progress, performance and technical trade-offs.
4 Month Project
5 Billion Video Frames
800 Sensor Files per second
60% Cost Savings
The Business Problem
Processing video and sensor data faster than your competitors is critical to becoming the world-leader in autonomous driving. An automotive company had a growing fleet of self-driving cars generating huge volumes of data and the cost and effort to manage these was spiralling. To maintain a marketleading position the company needed data optimisation to accelerate their analytical research.
The Technical Problem
The infrastructure team needed to optimise data processing, reducing infrastructure costs of storing and processing data and operational costs of managing data, so that the analytics team could monitor more vehicles. The infrastructure and analytics teams were under pressure to meet aggressive deadlines, so they needed a partner who could work their existing platforms and around their busy schedules.
The Data Optimisation Challenge
The automotive company generated hundreds of Petabytes of high def video files and high frequency streaming sensor data. Each test vehicle captured video files from 8 cameras and sensor data from 15 sensors. The test fleet transmitted 800 objects per second and captured over 5 billion video frames from a short trial. The company needed to efficiently search the camera and sensor data to identify common events and outlier incidents and to do this they needed to intelligently tag and curate all data.
Data Optimisation Services
using deep inspection to
classify research data.
to increase the volume of data that could be processed.
to eliminate bottlenecks in the research pipeline.
using machine learning to enrich analytics.
to reduce data management overhead.
across hot and warm file and object storage.
to maximise network and compute usage.
The Data Optimisation Solution
In 4 months Anaeko delivered a data optimisation solution that integrated existing infrastructure and applications with elastic cloud services using a flexible microservice architecture. We provided comprehensive test results and benchmarks from performance, integrity and automated regression test suites so that in-house teams could onward maintain the solution. We produced configuration and operational user guides, in line with existing process, for operational teams to manage the solution.
The Data Optimisation Platform
After considering the technical and operational requirements, and leveraging our experience of optimising Petabyte-scale storage platforms, Anaeko developed high performance Python agents running on a scale-out Kubernetes container platform. Anaeko optimised the system using parallel processing patterns and took advantage of multiple hybridcloud services including containers, machine learning, and object storage to accelerate development. In our asynchronous processing design, the sensors and camera data was first pushed onto object storage for software agents to download and deep-inspect files, then generate metadata published to a metadata catalogue.
A metadata search user interface enabled downstream analysts to search metadata and locate matching files for further processing. We applied a bespoke algorithm for training the machine learning models.
To manage cost, files were stored in a cost effective warm tier object store when not being actively processed and when the model was being trained, files were transferred from the warm tier to more performant hot tier, leveraging faster Network File System (NFS) storage. The warm and hot tiers were connected by a 10Gbps line that we maintained at 70-80% utilisation while maximising utilisation of compute resources.
We developed deep-inspect algorithms to extract metadata and redact sensitive data including Vehicle Identification Numbers. Our agents enabled analysts to define search tags and patterns that were used to search for matching files and report back file identifiers and locations. To efficiently process millions of files, parallel processing was implemented both at a container level and through multithreading. Our solution maximised utilisation of available compute infrastructure, and monitored network throughput, throttling processing where necessary to avoid network backlog.
The Technical Benefit
Using our solution the automotive company was able to process 800 files per second while maximising the efficiency of the available network and compute infrastructure.
The extensible architecture enabled future custom search and scan agents to be rapidly developed and for processing of additional file types to be added to meet other data analytics pipeline needs. The fully tagged and categorised object store acted as an efficient storage platform that scaled with continued use. The automated test suites and DevOps pipelines acted as a best-practice framework for future projects.
The Business Benefit
The data optimisation delivered an estimated 60% TCO reduction. Combining Hot and Warm Tier storage within a software-defined storage architecture reduced the storage cost per file. Not only was the underlying storage hardware cheaper for object storage than for NFS, but the object store required less storage admin effort to manage. The end-to-end solution maximised utilisation of the available network and compute infrastructure optimising the processing cost. The in-house teams had clear visibility of progress throughout the project and were left with an extensible framework of open standard and open source services, integrated and automated within an efficient multicloud DevOps environment.