First Release of the Ontologies is Available

Mar 29, 2018
4 min read

Allotrope Foundation has released the first version of the Allotrope Foundation Ontology (AFO) suite, which is publicly available on this web site link.

The included taxonomies and ontologies establish the basis of a controlled vocabulary and relationships needed to describe and execute measurements in the lab on analytical instruments, and later interpret the data. Drawing from thought leaders across member companies and the Allotrope Partner Network (APN), the standard language for describing the equipment, processes, materials, and results is being developed to cover a broad range of techniques and instruments, driven by real use cases, in an extensible design.

This release includes mature domain taxonomies and ontologies for chromatography and provide not only standardized terminology, but terminology in a logically valid data structure, meaning, it allows one to encode what humans know, or infer, about the relationships between the components, processes and results in a machine readable structure. The AFO is built upon a powerful upper ontology schema known as The Basic Formal Ontology (BFO), which serves as the upper ontology for the Open Biological and Biomedical Ontology (OBO) Foundry. The OBO Foundry contains a set of 50+ biomedical ontologies that, through BFO, share a common upper-level schema with the Allotrope Foundation Ontologies.

The underlying taxonomy of the AFO normalizes the words and definitions we use to describe our results and the methods, instruments and materials, and goes a long way to reducing the complexity of integrating and leveraging our data, even allowing us to embed them in our workflow through our informatics tools. Ultimately this can start to reduce the use of the keyboard and improve data quality in the lab, providing better annotated data that has increased use value across the pharma landscape. Further, the AFO can be used to capture the relationships between the concepts and underlying words that humans use to understand and infer scientific principles, processes, etc. Intuitively scientists understand that pumps, detectors and columns are part of an HPLC system, and that the system is used in executing an HPLC ‘run’, using a method that describes a process and that the solvents described by the mobile phase play a role in that process that is different than the sample injected into the system. The relationships between those various components and activities are necessary to fully understand the context in which those words are used, and have to be captured in a way that allows a computer systems to understand them unambiguously, independently of the data models and schemas embedded within our wide array of software applications. The ontologies provide those relationships that then allow us to structure the data we generate with the appropriate context to search, aggregate, model, and understand in total. It also establishes the kind of data structure that will position an enterprise to leverage “big data” capabilities and apply machine learning to address more sophisticated questions and identify new knowledge across and enterprise. In the end, automated systems can exploit data in ways that encompass basic human conceptual understanding. In other words, computers are given the background conceptual structure, via the ontologies, that represent the same concepts and relationships in scientists’ heads that form their common sense understanding of the world.

This release also contains domains and terms from over forty additional instrument ontologies under development, less thoroughly described than the Chromatography domain. The terminology for other techniques provide value in their current state by enabling indexing, tagging, annotation, controlled vocabularies that can be used in electronic notebook (ELN) and laboratory information systems (LIMS), even before these ontologies are fully mature. Look for a future blog describing governance and tooling being put in place to enable a modular, community enabled approach to expanding coverage of the ontologies by technique or workflow.

The next major deliverable will be Allotrope Data Models (ADM), which provide a mechanism to define data structures (schemas, templates) that describe how to use the ontologies for a given purpose in a standardized (i.e. reproducible, predictable, verifiable) way. Originally slated for release at the end of 2017, it was determined the additional design and PoC work was needed to define the standard approach to using ontologies and SHACL to define data models- one that balanced robustness with complexity and scalability. A corresponding data model for Chromatography will be released later in 2018.

Finally, the Allotrope Ontology suite is made publicly available for use by anyone under the Creative Commons Attribution License. The Foundation wants to ensure that Allotrope Ontologies contribute to the advancement of science through the creation of linked data as they are used and integrated with other semantic sources in the public domain. While this initial release is distributed via the Allotrope web page, we will at a future date distribute via other appropriate mechanisms.

You can read the full press release from Allotrope Foundation here.

Don’t miss the upcoming Allotrope Connect conference, hosted by Agilent in Waldbronn Germany April 15, 2018. More details and registration here.

(Thank you to Eric Little @Osthus for review and feedback on the semantics)