2021 Spring Allotrope Connect Info & Reg.| Allotrope Foundation

2021 Spring Allotrope™ Connect

Allotrope Connect:

Bringing together Allotrope members and the broader scientific community to discuss how we are delivering faster insights from new and existing data within organizations by improving standardization of data and its interpretation across laboratory and manufacturing operations.

With over 30 data models, and growing, Allotrope is leading the way in data standardization thereby allowing quicker access to insights within data.

Registration Dates and Links - April 19, 22, 26

Day 1, Monday April 19, 10:00am-12:00pm EDT: Register
Day 2, Thursday April 22, 10:00am-12:00pm EDT: Register
Day 3, Monday April 26, 10:00am-12:00pm EDT: Register

Approved Abstracts (Presentation schedule is pending)

Increasing the scientific visibility of the Allotrope Foundation

Dennis Della Corte (BYU)

Brigham Young University has partnered with the Allotrope Foundation in 2020 to write a proceedings and history report for the Allotrope Connect in Fall 2020. The resulting manuscript is currently in review with Drug Discovery Today and publication is soon anticipated. This presentation will outline the process, show highlights from the paper, and suggest how it can be used for effective internal communication at member and partner companies. In a second part, we will provide an overview of a new manuscript that performs a use cased based analysis of ADF and three other data standards. We will share the current state of the analysis and invite member and partner companies to contribute to it. Finally, we will provide an outlook into planned research articles which evaluate the Allotrope standard for other use cases.

Complementing ADF With a Scalable Analysis and Analytics Platform, REVEALTM: AnalyticalDevelopment

Srikant Sarangi (Paradigm4)
Zachary Pitluk (Paradigm4)

The Allotrope Data Format (ADF) addresses some critical needs in big data management: uniform storage of diverse types of scientific instrument data and associated metadata, the ability to develop consistent and integrated data processing and analysis pipelines. The ADF is vendor, platform, and method-agnostic, and has the capability to store datasets of nearly unlimited size and complexity in a single file, organized as single or multiple n-dimensional arrays (1).

While the ADF has created a remarkable opportunity to efficiently federate data, the scalability challenges associated with accessing and analyzing large datasets stored in files remain. Here, we present a solution that complements the ADF through incorporation of its features into an array-native scientific database management platform. REVEAL: AnalyticalDevelopment enables data selection and computation from imported ADF files using Python and R API’s. REVEAL is innately designed to store data as multi-dimensional arrays, and each array element can hold an arbitrary number of attributes of any data type. REVEAL also supports in-database computation through optimal implementation of algorithms at scale (e.g., regressions, PCA). Additionally, it provides provenance and reproducibility by being an ACID database system.

In this presentation, we demonstrate ingestion, processing, and visualization of a representative LC-UV dataset in ADF format from Agilent that was used for sensitivity analysis (Agilent v0-0-0-2 Performance Data). Data was extracted and loaded into the database using a Python API, and we created a simple R Shiny GUI for visualization of data and data completeness. The dataset consists of measurements on an 8-component mixture separated on the HPLC with 17 replicate separations at each condition with the spectrum from 210 nm to 280 nm sampled at 10 nm increments. Using REVEAL, we perform a DoE analysis to find the combination of experiment variables (detector rate, resolution, and run time) that maximizes peak separation. Higher resolution data, for instance, capturing the rate of increase of absorbance adjacent to absorbance maxima, can also easily be utilized by REVEAL: AnalyticalDevelopment. Amongst the other datasets available through the Allotrope Foundation, we also discuss our work with Bioanalyzer Data provided by Bayer as another example of using the ADF with REVEAL.

In conclusion, REVEAL: AnalyticalDevelopment provides a scalable solution for efficiently working with the ADF and mitigating the challenges posed by in-memory processing of large datasets stored in files.

Ontology for Process Chemistry – Giving Context to Instrument Data Structured by the Allotrope Data Model

Wes Schafer, Merck Process Research & Development
Oliver He, University of Michigan Medical School
Anna Dunn, Pharmaceutical Development, GlaxoSmithKline (GSK)
Zachary E. X. Dance, Merck Process Research & Development

Allotrope is providing a solid foundation for structuring and standardizing analytical instrumentation settings and its data. But in order to fully realize machine learning/artificial intelligence and complete data mining, experimental and workflow details need to be linked to the analytical data in order for it to have context. An important domain in the industry that produces a large amount of analytical data is process chemistry, which is the branch of chemistry (including pharmaceutical chemistry) that studies the development and optimization of the production processes by scaling-up laboratory chemical reactions into commercially viable routes. Key concepts for the domain are product quality, process robustness, economics, environmental sustainability, regulatory compliance and safety.

Last year we proposed building the ontological terms needed to tag and structure the contextual data associated with process chemistry workflows. Here we present the initial version of an Ontology for Process Chemistry that covers route scouting, route optimization and route validation with a line of sight to producing regulatory documentation. Use-cases for building the ontology included: reaction and catalyst screening, kinetic and mechanistic studies, fate and purge studies of process impurities as well as polymorph / form studies. Supporting concepts such as additional synthetic roles, equipment and unit operations are also included.

Harvesting the digital dividends - Allotrope at Bayer

Michael Rosskopf, Bayer

The LifeScience industry undergoes a critical digital transformation which requires to develop new and adapt existing business and innovation models to a new pace. Due to the lack of standardized data access patterns in the past decades, analytical and other wet labs face severe challenges by adopting common digitalization trends as central data aggregation and modularized centrally orchestrated workflows combined with data modelling.

Allotrope has the potential to be a central pillar to address these points. It can help getting sounder scientific decisions in less time and even help proving hypotheses beyond the initial experimental intention. Due to its high level of maturity, it can break up old structures and simplify system architectures.

We at Bayer will show some ideas on how to make use of Allotrope technologies. We will discuss ideas on future data workflows and give insights into initiatives from our divisions Pharmaceuticals and Crop Sciences.

Methods Hub for HPLC opens new gates into the Intelligent Analysis of Analytical Methods and Results

Steve Emrick, USP
Dana VanDerwall, Bristal Myers Squibb
Gerhard Noelken, Pistoia Alliance

The collaboration between Pistoia Alliance and Allotrope Foundation to produce a digital solution for HPLC-UV analytical methods management is nearing completion. By mid 2021, the Methods Database (Methods Db) with OpenLab and Empower Chromatography Data System (CDS) adapters will be available for early implementation by Pistoia members. The Methods Db will allow you to move machine-readable versions of your valuable HPLC-UV methods between these two CDS’s in a vendor-neutral format with consistent context. By the end of 2021, the solution will expand capabilities to include analytical result data adapters, the ability to track historical analytical performance of HPLC-UV methods and chromatographic columns, include the sample preparation process as part of the digital method, and will propose a data standard for chromatographic columns. Community support for the early implementation of this solution is critical to efficiently mature the software and the underlying AFO and ADM content that it leverages, so please attend to learn more about participating in the future. Recognizing that current methods content are distributed across various media from instrument software to the articles in the literature, the project has also added an additional important phase. Natural Language Processing (NLP) will demonstrate streamlining the import of text-based methods into the methods database by mid 2021 via a collaboration with CAS, Elsevier, and USP to facilitate population of the methods Db. Finally, a cloud-based Method Hub will bring all this together and allow scientists to freely exchange analytical methods on demand with global partners to accelerate the pace of R&D.

End-2-End integration of IDBS Eworkbook, AGU SDC, and ZONTAL for applications solving common challenges in Product Lifecycle Management

Scott Weiss, IDBS
Klaus Bruch, AGU
Wolfgang Colsman, ZONTAL

Allotrope partner companies IDBS and ZONTAL have teamed up with AGU to provide an end-2-end solution for common problems encountered during the pharmaceutical product lifecycle. Among these problems are automation of experimental design, data capture, data cleaning, and data reporting. We leverage the Allotrope Data Format to increase data quality by contextualizing the raw data according to Allotrope Data models with descriptive information. Further, we create a tight connection between the experiment and researchers by removing artificial boundaries and giving full access to reusable data at all times. Through ADF supported audit trailing, we keep track of full data lineage and changes throughout the product lifecycle to ensure a reduction of workflow related errors. Our presentation will highlight the strengths of this three-way collaboration and will provide tangible insights through a live demonstration.

A Platform for Instrument Integration to the Electronic Notebook Enabled by Data Standardization

Vinny Antonucci, Merck (Allotrope Foundation Chair, and Director of Informatics at Merck)
Doug West, Merck

Allotrope Foundation has invested several years to develop robust, automated, and accessible technology to drive community adoption in real-life end-to-end (E2E) workflows. This short presentation will introduce one such E2E application of the technology & data standardization: a scalable platform for instrument integration with the electronic notebook. The platform includes automated instrument data collection into the cloud, conversion to Allotrope Data Format, and a scalable integration pattern between instrument data and an electronic laboratory notebook (ELN) which also is configured with a similar modular and standardized data capture design to better receive the data. Infrastructure will also be described to facilitate advanced data analytics of this connected & contextualized data. This presents an opportunity to extend the scope of Allotrope data governance to include simple experiment-based content which complements the current instrument-based content since both are necessary to standardize critical information across any E2E laboratory workflow. Ongoing status updates on the progress of this effort will be provided in future Allotrope Connect meetings.

Developer’s test suite for company-specific extensions of AFO/ADM

Jan Nespor, Merck

Allotrope ontologies (AFO) and data models (ADM) are designed to cover a commonly agreed subset of data for a particular instrument type. This approach helps to define and promote the Allotrope standards across industry. However, use cases of specific pharmaceutical companies typically require ADF to contain additional data, such as internal system identifiers; or meet stricter constraints, e.g. tighter cardinalities. This can be achieved by defining company-specific extensions of Allotrope artifacts, in particular ADM.

Ensuring the correctness and usability of these extensions is critical: constraints need to be expressed as valid SHACL shapes; impose only actually intended restrictions; and be kept additive, i.e. not contradicting the standard ADM and ontological constraints. This needs to be a part of continuous integration as ADM or AFO evolves.

As manual checks of the afore-mentioned are time-consuming and error-prone, we have developed a test suite that executes SHACL QA tools and tries to use MSD extensions (and original ADM and AFO) to validate samples of valid and invalid data. Using example extensions of AFO and ADM, this work shows the integration of automated QA into the development workflow.

Constructing ADMs with SHACL (Shape Constraint Language) to Describe, Validate and Achieve Data Interoperability

Amnon Ptashek, Allotrope Foundation

Interoperability is a fundamental key to transform the acquisition, exchange, and management of laboratory data throughout its complete lifecycle. Using semantic technology, the Allotrope Data Model (ADM) together with a controlled vocabulary provides a mechanism to model simple as well as rich use case scenarios. SHACL, a W3C Shape Constraint Language for describing and validating RDF graphs, builds with the concept of “Shapes” to describe a data model. SHACL subscribes to the Closed World Assumption (CWA) which typically applies when a system has complete control over the information. As a result SHACL is a powerful tool for data validation.

The presentation reviews the technical aspects of the "Shape" structure. It covers the technology in a high level but at the same token dives and zooms into the fundamentals of SHACL core such as type of Shapes, Constraints, Targets, and Focus Nodes. Following the technical review and a sample exercise, a demonstration of several shapes in a Tabular ADM will be presented.

Sample Prep Orchestration: Agilent SLIMS in conjunction with Sartorius Cubis II balance

Thomas Schink (Sartorius) and Heiko Fessenmayr & Maximilian Schiffer (Agilent)

Committing to the Allotrope Mission with leveraging end-to-end lab workflows by the Allotrope Foundation Framework, we present Agilent SLIMS as an orchestrator for connecting to the unit operation of sample preparation utilizing a Sartorius Cubis II balance. We demonstrate how to unify collected workflow data in the Allotrope Data Format (ADF).

Unlock Insights from Experimental Data and Empower Data Science

Mike Tarselli, Ph.D., MBA, Chief Scientific Officer & Spin Wang, Ph.D., Chief Technology Officer (TetraScience)

The Waters Empower Data Science Link (EDSL), powered by TetraScience, enables scientists and
engineers to do more with their valuable experimental data. From enabling chromatography fleet
management to deducing new trends across experiments or injections - making lab data FAIR
and usable accelerates the move to a digital lab. This presentation will cover how the data
science link helps support Allotrope’s mission (universal formats, interoperability, simplified IT
and system integration) and promotes improved data science while using existing resources.

This session will focus on:

High-throughput, automatic acquisition and conversion of data into a vendor-agnostic and data science-compatible format in the cloud, (which can be easily converted to the Allotrope Data Format (ADF) through third-party tools)
How bidirectional communication between chromatography instrumentation, software, and tools enables sample set method creation, eliminates manual transcription, and increases data integrity
Easy query of converted data based on sample ID, method name, and other fields using common data science tools
TetraScience’s membership in the Allotrope Partner Network and how it informs analytics on chromatography data system (CDS) data

Agenda