Industry Perspective: High Throughput Experimentation

Updated: Jul 19

October 7, 2020

HTE High Throughput Experimentation

What You Need to Know to Drive Better Outcomes from Lessons Learned to Proper Scientific Informatics

By John F Conway1, Ralph Rivero PhD. 1, and Laurent Baumes PhD.2

1 20/15 Visioneers LLC

2 Exxon Mobil Corp.

What is High Throughput Experimentation (HTE)? How is it used? And how can improved data handling and instrument integration significantly reduce discovery cycle times and costs, and improve outcomes?

High Throughput Experimentation is a self-described tactic that involves some level of automation, comes in several flavors, and all flavors entail scientific experimentation where you conceive a design of experiment(s) and you execute the experiment(s) in parallel, or in a rapid serial fashion while altering specific experimental variables, or parameters. For example, temperature, catalyst, pressure, solvent, reactants, etc. Much of this requires robotics, rigs, semi-automated kit, multichannel pipettors, solid dispensers, liquid handlers, etc. Like other more targeted experimentation, there needs to be adherence to the Scientific Method: Hypothesis, Methods, Results, and Conclusion. Ideally, well-conceived experiments result in every well, or reaction vessel, generating a wealth of information that is captured, quickly interpretable, and ultimately creates the foundation for better decisions and follow-up experimental designs. The importance of an appropriate IT and informatics infrastructure to fully capture all data in a FAIR-compliant fashion cannot be overstated. While capturing, raw data, results, and conclusions have historically been the focus of information systems, we believe that the electronic scientific method can be significantly enhanced by Ideation capture, as well as other design and experimental learnings. This improvement of knowledge management will help you optimize your experimentation and reduce any unnecessary failure that would exist within the realms of current scientific understanding and documented findings. An additional benefit of this improved knowledge management is the organization of intellectual property. In many cases we recommend enhancing the “DMTA” (design-make-test-analyze) knowledge cycle, to include ideation - “IDMTA”. While ideation may be characterized by some as “soft data”, it should be considered for some experiments as contextual and therefore foundational to any knowledge management system intending to preserve and make available the rationale that inspired the experimental work. Scientific intuition and creative ideas make the world go around!

At the most basic level, the tools necessary for a sustainable HTE program are HT equipment for fast and parallel synthesis or testing of materials, computational methods to select experiments, e.g. design libraries, and a FAIR environment with a well-designed data repository and query system to retrieve and further use the data in future ideation and enhanced designs. Biology, chemistry, and material sciences, just to name a few, are among the scientific domains that have benefited from the volumes of data generated from High Throughput Experimentation. There are multiple advantages for implementing HTE, those include, but are not limited to, automation driving reproducibility in science, innovation, and of course major efficiency gains. HTE is ideal for the driven scientist who does not settle for less, and wants to accomplish more with less, and in less time. For R&D organizations to realize the full benefits of HTE, careful investment in strategy, hardware, and software is required. Too often; however, software platforms, perhaps viewed as not as sexy as the hardware, are underfunded or neglected, resulting in lost value and opportunity. As you read on, the recurring questions to consider include: “With the massive investment in HTE, did we get the IT right? Were there some gold nuggets left uncaptured that would have provided long term value and opportunity for greater institutional learning and reduced rework? The latter question is extremely important as world-leading corporations are often recognized by their commitment to creating a culture of continuous improvement and learning. Neither is truly possible if improper strategy and IT systems are put in place.

Biological Sciences- The More the Better-Wrong!

For close to three decades High Throughput Biology (a.k.a. High Throughput Screening, and more recently High Content Screening, that deals with biological imaging) has matured to where researchers routinely and rapidly “screen” thousands to millions of molecules in a biochemical or cellular context, using a variety of assay or imaging techniques, to determine endpoints like biological activity, genetic markers, apoptosis, toxicity, binding, and other biochemical, cellular and tissue readouts. It has become a staple of the drug discovery process and provided many lessons learned and valuable critical decision-making knowledge for an industry whose foundation relies on data. While information/LIMS systems have provided tremendous value by facilitating the generation of files that instruct liquid handlers and robots to run high throughput screens, the real value is derived from the efficient capture of decision-making data that allows scientists to turn that data into actionable knowledge and insights. A big lesson learned was screening everything and anything was not probably a good strategy. A better strategy was to carefully manage the master DOE and infuse scientific and mathematical thinking into the approach. General approaches sometimes yield general or less than general results. In addition, the amount of time and effort spent in the analysis and the overall solution architecture of these massive campaigns may have underwhelmed the costs. While these technologies, particularly the hardware from automation vendors have been transformational in the biological sciences. It is only natural for one to anticipate that recent advancements in AI and ML, along with continued instrument advances, would lead to improved data-driven decisions. R&D organizations that embrace those advancements, and prepare for them now, will most certainly emerge as industry leaders. Chemistry

The adoption of high throughput technologies by scientists, though well-established in the life sciences’ biology space, has been somewhat slower to take hold in some synthetic chemistry labs. The powerful ability to explore numerous hypotheses by executing multiple experiments in parallel, or in rapid serial fashion, promised to revolutionize discovery sciences. So, what happened in discovery chemistry? The advent of high throughput synthesis (AKA combinatorial & parallel synthesis in discovery) in the early 1990s, often executed manually with multichannel pipettors and custom reaction plates, created the early market for automation and, more importantly, the requisite informatics for sample tracking and to capture reaction data including chemical reactivity, observations, results, etc., in a FAIR compliant fashion. Early on in the discovery space, specifically, discovery synthesis, the “ideation” and “design” components of the ideation-design-make-test-analyze (IDMTA) knowledge cycle, were somewhat flawed by the belief that all makeable chemical diversity, was equally valuable. It did not take long to realize that just because you could prepare a molecule, does not mean you should. So now in 2020, we apply those learnings that not all chemical diversity is in fact biologically relevant, nor developable, and that valuable knowledge has been leveraged by computational scientists in Pharma into improved predictive models that routinely inform the ideation and design components of the knowledge cycle. A classic example of turning lemons into lemonade. Undeterred by the slower uptake in the discovery chemistry space, research automation industry leaders have continued to make tremendous technical advances in synthesis equipment and automation platforms. Most of the limitations of early automated synthesizers (often just modified liquid handlers) have been cleverly addressed by these innovative companies providing chemists with modular workstations, with few synthetic restrictions, and the ability to customize workstations as needed for even the most bespoke reaction sequences. The integration of these synthetic and post-synthetic modular workstations with existing company analytical and IT systems, critical to getting the maximum return on investment, is curiously often left to individual organizations, despite the vendors’ ability to provide that service. While it’s unclear why this full integration, so critical for maximizing ROI, is not prioritized higher, it is often a decision that can plague organizations for years to come.

High throughput experimentation has been employed in the analytical chemistry space for decades as the platforms used closely mimic those used to run high throughput screening. Various vendors provide instruments capable of carrying out analyses in a rapid serial manner providing rapid turnaround of critical decision-making data. As was the case in the discovery chemistry space, automation exploiting plate-based analyses are initially well integrated with existing IT systems. Integration of these analytical systems and their output with new instruments or new IT systems is often done via intermediary databases sometimes slowing down processes and critical decision making.

Materials Science- Size Does Matter

Characterization chemistry, a term sometimes used in the materials space has also been around for a couple decades and has a large overlap with the previously mentioned hardware and instrument manufacturers. Sometimes the difference here is that microvessels up to multi-liter vessels can be part of the “HTE”. Materials science, and in particular catalysis, is characterized by a scarcity of data compared to other domains. This can be viewed as a hardware limitation and related difficulties to set up new experiments. A more realistic reason is the inverse correlation between parallelization or miniaturization and scale-up. Early HTE reaction screening has been highly parallelized or miniaturized, but quickly fell out of favor due to the limited relevance or potential use of the data to drive discovery and optimization at a larger scale. At that time, the community was using HTE in combination with the Combinatorial approach borrowed from Pharma. However, the combinatorial method gets very quickly combinatorically intractable for materials science. Remaining hardware businesses now focus on larger scale equipment with a relatively small reactor parallelization (4 to 16) but using conditions that will allow a more easily scale up exercise.

HTE and combinatorial approaches usually assume a large amount of data. In Materials science and especially catalysts, the amount of experimentation is relatively low due to the scale-up constraints mentioned above. In such domains, it is prudent to balance the selection of the experiments and the value of the generated data. A computational technique called Active Learning is concerned with the integration of data collection, design of experiments, and data mining, for making better data exploitation. The learner is not treated as a classical passive recipient of the data to be processed. The researcher has the control of data acquisition, and he must pay attention to the iterative selection of samples for extracting the greatest benefit from future data treatments. It is crucial when each data point is costly, and the domain knowledge is imperfect.

The sampling strategy in HT materials embodies an assessment of where it might be good to collect data in an iterative fashion. Evolutionary algorithms, homogeneous covering, or traditional DoE have been used. However, those techniques still use a rather large number of samples and are not optimized considering the downstream techniques used to learn on the consolidated dataset. The idea of the latter is to optimize libraries based on the learning efficiency of a given technique between the materials space and the response space.

Moving the bottleneck- Data Analysis

HTE helped generate new materials faster (HT synthesis), Test faster (parallel reactors), and characterized materials faster. In such a good-looking scenario, the analysis of the data may become the bottleneck. As mentioned above, Catalysis space is not generating a huge amount of data and experiments. However, such amounts of data are sufficient to become impossible for a human to ingest it and make an optimal decision with it. In some cases, characterization or reaction data requires adapted algorithms to decipher the contained information. Scientists still need to understand what is going on between the solid and reactants, and for that, they need to know, at the atomistic level, how the materials and reactants are interacting with the surface. (Excitingly, there have been some success stories reported were using advanced algorithms and high-performance computing, the solid structures from HTE have been exploited.)


Finally, the conventional catalyst development relies on fundamental knowledge and know-how. The main drawback is that it is very time consuming and intuition or initial choice becomes critical. To overcome these attempts to shorten this process using HTE have been reported for 30 years. HTE is more pragmatic-oriented and involves screening of the collection of samples. It must be stressed that the relevant parameters are usually unknown and some of these cannot be directly and individually controlled.

Descriptors and Virtual Screening

The concept of virtual screening using molecular descriptors has co-evolved along with HTE. High‐throughput experimentation has become an accepted strategy. While the design of libraries has improved in the drug discovery arena, it is still challenging, especially, if vast numbers of catalysts are to be explored. QSAR (quantitative structure-activity relationship) is a powerful method used in drug discovery for which molecules need to be represented by so‐called descriptors. However, a transfer of descriptor concepts to solids is a challenge as they cannot easily be represented since no structural formula can be given. Little success has been demonstrated in that area whereas the identification of efficient descriptors would open the path to virtual screening for solids. Note that there have been few demonstrations of such concept and most of them are related to a special type of materials, so-called zeolite, which is crystalline and therefore can be more easily described at the structure/atomistic level.

Making Materials Data FAIR

Up to now, the focus has been on running experiments for a given “program” and the general outcome is that it is very hard to reuse the data outside of the context of each study. Libraries are too often developed in a silo and without the use of a controlled vocabulary, or even better an ontology, and retrieving the data across multiple programs, materials, or reactions remains challenging. Catalyst development is a long process that involves synthesis, formulation, characterization of the materials pre and post-reaction (or even in situ for Operando systems), reaction testing at different scales which all generate data. Challenge in consolidating and connecting all that da