Multi-omics Data Management White Paper

Updated: Jul 19

October 27, 2020

The Urgency to Build a FAIR Multi-Omics Data and Process Environment

John F. Conway* and Yolanda Sanchez§

*Chief Visioneer Officer, 20/15 Visioneers

§PhD, Senior Translational Visioneer, 20/15 Visioneers

20/15 Visioneers

A Partnered 20/15 Visioneers-Genestack Multi-Omics Scientific Data Management Industry White Paper Contents

1.Introduction and Problem Statement 2.Observed and Experienced Challenges 3.Breaking the Poor Data and Process Culture Habits 4.Genestack is an Evolving Leader as a Multi-omics FAIR Data Environment Provider 4.1Genestack’s Success Story: Revamping the Multi-omics Data Landscape of a Top-5 Pharma and Growing Industry Collaboration 5.Revamping Your Multi-Omics Data Landscape 6.Conclusion About the authors References Introduction and Problem Statement

“You will need to adopt FAIR (Findable, Accessible, Interoperable, Reusable) compliant and scalable data solutions or you will drown in your own data…”

The current complexity of multi-omics in the pharmaceutical, biotech, agriculture, and consumer packaged goods companies is extensive and many Partner-Vendors and services organizations are ill-equipped to handle it. If you are not under a higher than normal sense of urgency to build out your next-generation multi-omics data environment, you may want to start asking your organization some questions. This data environment is a foundational need and if done improperly the change or fix has been likened to changing an airliner’s wings while in flight.

Don’t squander the opportunity to optimally leverage your information. Multi-omics data, information, and derived knowledge will start to bridge the gaps in our huge lack of biology understanding, but only if it is properly curated, managed, and fully exploited. This data environment is at least an order of magnitude more complex than your average lab or scientific data environment. The level of rigor needed for analysis will require the detailed data and process contextualization that many organizations are still struggling to produce. If you fail to properly do this, your return on your large investment may not get fully realized. In an era where our knowledge of target and pathway interdependencies is still limited, we think that multi-omics approaches will:

1. Enable early therapy ideation and clinical study design

2. Improve the chances of clinical success for new medicines, and better outcomes overall for all verticals

3. Be an integral part of the promise of Translational Science and Precision Medicine

These are obvious objectives that everyone will get behind.

Right now, new technologies can measure the molecular, cellular, and/or patient phenotypes derived from the integration of data concerning gene function, chromatin structure, and epigenetic regulation. Omics approaches cover a myriad of transcripts, proteins, metabolites, microbes, and more. These technologies are currently being applied to both new drug discovery programs and the cataloging of exponentially increasing patient information, in an attempt to improve (1) the efficiency of the drug discovery process, and (2) the understanding of human disease. Proper management of multi-omics information generated by drug discovery efforts and patient understanding is critical to the translation of preclinical to clinical outcomes and to fully realize the advantage of precisely tailoring new therapies to the needs of individual patients. Although not specifically covered in this paper, we recognize that a similar situation and proposition apply to other industries where the generation of new ‘omics data is driving innovation. For example, from a plant biology perspective, the missing detail in annotations across the biology space will benefit from a FAIR data environment and result in more streamlined research.

The problem; however, is that if an organization repeats the mistakes of the past and mismanages this wealth of scientific data, you will once again fall short of the immense capability of bridging that gap between failure and success.

“To this end, we propose that the multi-omics scientific data and process environment must be FAIR and must be valued and guarded as one of an organization’s top assets.” We want to be clear and highlight that this will take culture and change management to be successful, an area we touched on in our previous whitepaper1, where we discussed the whole multi-omics ecosystem and described the challenge of “stitching” together with the analysis from the many different (existing and upcoming) “omes”. Multi-omics is driving precision and personalized medicine by providing better disease understanding through molecular action and mechanism, diversity or variation, and soon, more and better omics/disease knowledge bases. It is a very dynamic space that is changing rapidly, including the introduction of new “omics” disciplines, science, and analytical methods. As you can see from this high-level description, this new discipline comes with an overwhelming amount of data and detail that must be precisely managed to drive innovative understanding and change (Figure 1).

Figure 1. A single experiment can nowadays generate a wealth of biological information that needs to be handled with new methodologies and technologies. A combination of these experiments (if the data is FAIR) yields an even greater amount of information that can be integrated with other studies, whether preclinical or clinical, to multiply the knowledge generated. Genestack Omics Data Manager (ODM) provides the critical data-metadata integration capabilities that enable rapid cross-study multi-omics data exploration. For all these reasons it is critical that your scientists become increasingly “data-savvy” and that they have access to intuitive, integrated, and user-friendly data management systems that enable FAIR activities in the ‘omics space. Genestack Omics Data Manager (ODM) is one of a few highly functional multi-omics data environments available on the market today and will dovetail into your current multi-omics ecosystem, whether legacy or new.

Observed and Experienced Challenges

“Let us tell you what we have learned over time...”

We are not underestimating the complexity of multi-omics; indeed, we recognize that there are many challenges to ‘omics data integration. We have tried here to capture all the challenges based on our own experiences and lessons learned, and also through a global survey conducted earlier in 2020. There are always major unexpected challenges when implementing new science and the accompanying technology, especially when the environment is evolving very fast and the amount of information is practically exploding. For example, if you have a previously existing environment that needs upgrading you will encounter issues with intercalating your new approaches from an integration of data, services, and applications perspective, and most importantly, a change management perspective.

Dovetailing into an existing or even a new ecosystem or environment will have to satisfy the 6 basic workflows needed in a multi-omics ecosystem: Request, Sample, Test, Experiment, Analyze, and Report. Tools already exist to manage the pipelines and/or the workflows. A critical part of making sure the data is FAIR is that the processes themselves must be FAIR. As mentioned above this “new” environment needs to be FAIR compliant and provide the R-reusable data as Model Quality Data (MQD) to drive the secondary use in innovative hypothesis-testing models.

If you have Scientific Data Management challenges now, you have to dig deep and get to the root cause and find out why. We are willing to bet it has something to with one or all of the following factors: Poor Data and Process Culture, lack of/insufficient Scientific Data and Process Strategy, and/or inadequate technology.

There actually is a proper cadence to this evolution, as shown below in Figure 2, but the cart is often put before the horse and you have to figure out how to achieve your goals out of order if you don’t start with a foundational approach like a robust Scientific Data and Process Strategy.

Figure 2. Ideal sequence of events to culminate in a FAIR Data and Process Environment in a perfect world. Unfortunately, we do not live in a perfect world, and therefore pragmatic, fit-for-your-organization approaches become paramount.

It is worthwhile to emphasize again that your data is one of the most important assets for your organization. You make significant investments in time and money and your scientists work very hard to generate and/or gather the significant and relevant data needed to assemble the institutional information and knowledge required for your business to thrive. Figure 3 illustrates the key points of omics data integration and the infrastructure and mindset required to achieve optimal use of your data.

Figure 3. There is no need to learn lessons the hard way if you invest in the right tools and training for your organization.

Based on the February/March 2020 survey that we conducted with multi-omics personas, scientists/researchers/informaticians/IT, primarily from the biopharmaceutical industry, we found that researchers are struggling with their Omics data6,7. Quality, Storage, Data and Process FAIR Compliance, Integration, and Analysis topped the list.

“Beyond infrastructure (i.e., addressing basic storage needs), two critical challenges coming out of our survey were Data Integration and FAIR Compliance, highlighting the need for systems that can address agile data management and effective data mining.”

Our experience has also taught us, and it pains us to say this, that many of the “data lake” implementations are just not cutting it. They are simply not FAIR compliant and are causing unneeded grief for those who need the data the most as part of their daily jobs. Figure 4 nicely makes this point if you are a fisherman (or a data scientist).