Mitra Bio reduces data processing time by up to 96% and storage costs by 70% with Automat-it

Table of Contents

Automat-it took a systematic approach to solving our data management problem and the results were clear and immediate. It previously could take up to four hours to process 1000 individual data sets, now it takes just 15 minutes.

About Mitra Bio

Mitra Bio is an innovative biotechnology company founded in 2020 that focuses on validating skin therapies using in-vivo epigenetic biomarkers that aim to improve skin health and longevity.

The company has developed a pioneering platform that utilises non-invasive skin sampling and next-generation sequencing to analyze DNA methylation patterns.

The Challenge

Mitra Bio processes a large amount of genomics data often stored in CSV files. This means that, quite often, they have to merge large CSV files to feed into their Machine Learning (ML) pipelines to conduct effective analysis of individual data sets.

Each file could be up to 50 MB and contain dataframes that were four millions rows, and they may need to merge several hundred files into one. The process to do so using Jupyter notebooks consumed expensive hardware resources and could take up to four hours to complete.

As Mitra Bio sought to increase operations and scale, it was seeking a more efficient way to merge the files.

The Solution

Automat-it took a transparent and systematic approach to evaluating four techniques that leveraged AWS cloud technologies and identified the most efficient way of merging large CSV files. The ultimate ambition to help Mitra Bio speed up the merging process at low cost.

Automat-it conducted a Proof of Concept (PoC) in a Sandbox environment and provided code samples on how to solve this challenge.

Hundreds of terabytes of genomics data are now processed by workload deployed in Amazon EC2. Amongst other services, the solution leveraged Amazon Athena to perform ad-hoc queries and merge files together. S3 Lifecycles policies have been set up to move older files into a cheaper storage class meaning a significant part of the data is archived in the Amazon S3 Glacier Deep Archive.

The Results

By implementing the solution provided by Automat-it, Mitra Bio was able to achieve faster processing times in a more cost efficient manner. Specifically:

  • Data processing times reduced from as long as four hours to just 15 minutes
  • Daily data processing costs reduced to single figure amount
  • Data storage costs reduced by approximately 70%

Doing so means that Mitra Bio is in a stronger position to scale and grow as they work with an increasing number of customers.