State-of-the-Art Data Powerhouse - Distributed Analytics

Challenge

Unify complex datasets in real-time

A major client, the Department for Transport (DfT), faced substantial blockers to maximising train performance data. The DfT generate petabytes of complex rail asset and planning data. These datasets are sometimes delivered unpredictably, can feature gaps and offer contrasting depictions of network performance. This meant our client was spending time and money organising disparate datasets each day, but were still unable to get clarity on key metrics like service lateness. Consequently, decision makers did not have up-to-date or robust insights to guide planning or identify pain points. Travellers were becoming frustrated and mandatory performance metrics were running late, despite the DfT’s data team crunching continuously. Our client needed a centralised data engine to filter, unify, and make sense of all their data – without sending costs out of control

Solution

Automated superfast data synthesis engine

Our primary concerns were to fully understand the multiple datasets and deliver a robust data engine. We first performed a data assessment to get to grips with the multiplex data feeds. We then examined technology options from our data toolkit, designing three solid architectures able to ingest, process, store and deliver these huge data volumes. Our client opted to work with cutting edge Azure cloud technology. Our solution used the power of Databricks to deliver gigabytes of data to ultra-secure SQL and NoSQL databases each day. Our data experts developed bespoke superfast algorithms to clean, combine, and filter datasets – delivering metrics to a next gen insights platform using custom APIs. The final step was to add a control layer, leveraging Azure Kubernetes Service (AKS) to guarantee performance and control costs with highly-elastic scalability

Outcome

Ultra-efficient data powerhouse on scalable, zero-downtime cloud

24-7 service uptime

7 multi-petabyte datasets unified

>1 BILLION synthesised datapoints delivered

10x increase in data team dev speed

Unify complex datasets in real-time

Automated superfast data synthesis engine

Ultra-efficient data powerhouse on scalable, zero-downtime cloud

Services

Important Links

Company Info