The Next Generation Multi-omics Informatics and Systems

Updated: May 3

June 22, 2020

White Paper By

John F. Conway Chief Visioneer Officer 20/15 Visioneers


1. Introduction and Problem Statement. 3 2. Observed Challenges. 4 3. Solution Architecture. 8 4. Next Generation Multi-omics Platform Players. 10 4.1 BlueBee. 10 4.2 DNAnexus. 12 4.3 EPAM... 12 4.4 Qiagen.. 13 5. Data Standards. 14 6. Request Management. 14 7. Sample Management. 15 8. LIMS Environment. 15 9. Data Environments. 15 9.1 Genestack. 15 9.2 DataBricks. 17 10. Electronic Laboratory Notebook (ELN). 17 11. Automation.. 17 12. Professional Services. 17 13. Conclusion.. 18 14. References. 18 15. Integrated omics: tools, advances and future approaches. 18

John F. Conway Bio

John has spent 30 years in R&D on all sides of the fence; industry, software, and services, and consulting. His industry roles were Global Head of R&D&C IT @ Medimmune/AstraZeneca and then Global Head of Data Sciences and AI IT @AstraZeneca, Global Director of Discovery Informatics and Chair of the Structural Biology IT domain at GlaxoSmithKline, and Merck and Company where he worked in the Molecular Modeling group, Cheminformatics group, Biological Data group, and Analytical Vaccines Department. John also spent many years in scientific software at Accelrys (now Biovia, a Dassault Systemes company) as a Senior Director of Solutions and Services and Global Head of Presales. Also, John was Vice President of Solutions and Services at Schrodinger. Lastly, the Head of R&D Strategy and Services at LabAnswer which was acquired by Accenture where he became Global Head of R&D Thought Leadership and Innovation.

1. Introduction and Problem Statement

The world of biology is vast and diverse. Trying to piece together this world of biology so that new vaccines (prevention), medicines (treatment), and/or therapies (cures) improve quality of life is one of the most challenging feats the human species has attempted; met with some successes, and, lots of failures.

Given the complexity and the missing pieces of the understanding outlined above, multi-omics approaches try and “stitch” together understanding in complex biological environments by amalgamating the analysis of the individual “omes”: epigenome, genome, metabolome, microbiome, proteome, transcriptome, and other not mentioned or soon to be defined “omes”. Figure 1This leads to better understandings of relationships, biomarkers, and pathways. In doing so, multi-omics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association.[4] It has become core to understanding and defining early science, translational science, and clinical science. It is driving precision and personalized medicine by providing better disease understanding through molecular action and mechanism, diversity or variation, and soon, more and better omics/disease knowledge bases. It is a very dynamic space that is changing rapidly, including new “omics” disciplines and science. For this reason, flexibility and the ability to integrate and eventually change is an important factor for a solution(s) approach and implementation. The bottom line is that most existing competitive environments are disparate and multi-channeled from a solution architecture perspective as well as from a scientific data management perspective. Most biopharmaceutical companies need a Reboot or Next-Generation scientific data/process analysis and management approach to multi-omics and this whitepaper is going to outline how to achieve that. This in no way is diminishing all the challenging, creative, and innovative work others have done and continue to do.

Figure 1

2. Observed Challenges

Based on the February/March 2020 survey that we conducted with multi-omics personas, scientists/researchers/informaticians/IT, primarily from the biopharmaceutical industry, we found that researchers are struggling with their Omics data. Quality, Storage, Data and Process FAIR compliance, Integration, and Analysis topped the list. Figure 2

Figure 2

FAIR, (Findable, Accessible, Interoperable, Reusable) is a litmus test to help you assess the health of your data and process environment. If data and processes are not “FAIR” you will struggle in: replicating your organizations work, performing data exploration/mining, integrating your processes and data and lastly you won’t be able to get secondary use out of your data, which is not an acceptable outcome in 2020 and beyond. We estimate that 70 - 80% of biopharmaceutical environments are not FAIR compliant, which was right in line with the Survey results. Figure 3

Figure 3 From a current multi-omics software perspective, cost, access, support, breadth of coverage (capability stack), and quality topped the list of challenges. Figure 4

Figure 4

There are several reasons for this which include, and are probably not limited to, new areas of science and technology capabilities, heavy influence from academia and open-source, improper change management, and evolution of processes and understandings. It's not uncommon to see this in a highly evolving field.

As the multi-omics value has grown over the years the need for proper informatics, sample handling, and process optimization has also grown. As multi-omics has also permeated the clinical and translational space, regulatory and patient privacy needs are now also part of the workflows and carry an additional burden of expertise and know-how. Software companies that can’t rise to the occasion when it comes to certifications, compliance, and ultimately the security and integration of data