Allotrope Simple Model (ASM)

Introduction to Allotrope Simple Model (ASM)

The Allotrope Simple Model (ASM) is a JavaScript Object Notation (JSON)-based standard for the structure of instrument data.  Through its use of JSON, the de facto standard by which computers on the internet share data, the data in an ASM is designed to be easy to read, write, and transmit by any modern software system. Based upon an intuitive tabular approach,  it is a set of key/value pairs, where the keys describe the types of values represented, and are taken from the formal vocabulary defined in the Allotrope Foundation Ontology (AFO) to ensure consistency and standardization. This tabular approach can be easily  extended to allow the collection (aggregation) of related pieces of data  under a unified context and to provide a simple way to index a collection of similar data patterns (such as a set of peaks from a chromatogram, or a series of timepoint measurements).

From a software engineering perspective, a concrete ASM data instance represents the single business object which is being measured and all of the measurements directly related to this object. In doing so, the ASM provides a way to capture all of the relevant experimental data from a laboratory result in a simple, human-readable way.

To ensure consistency, all ASMs can be easily machine validated using standardized JSON schemas.  The ASM schemas are maintained and published by Allotrope Foundation and used to describe and validate the structure of an ASM against the set of constraints that make up the ASM standard for a particular laboratory domain.

 
asm_conductivity_00.PNG
 

ASM Standardization and Consistency

ASMs rely upon terms found in the Allotrope Framework Ontologies (AFO), which provides a single comprehensive resource for controlled vocabulary, term relationships, and contextual metadata for laboratory analytical processes. Built with professional collaboration across the Allotrope community of scientists, semantic experts, and software engineers across academia and industry, the standard language and vocabulary is being constantly extended to cover an ever more broad range of techniques and instruments, driven by real use cases, in an extensible design. The ongoing community effort includes: 

 

  • Organizations from the pharmaceutical and chemical domains, and some from the wider life sciences

  • Instrument and solution providers

  • Academia and research institutes

  • Scientists and technologists

  • Ontologists and modelers

  • Data scientists and cloud experts

 

ASM data instances use the unique labels from AFO in their elements. With this vocabulary in place, the scientists and engineers can describe tests, measurements and data, all within a consistent approach based on the Basic Formal Ontology (BFO) that easily enables the reuse of many well-established ontologies, such as Chemical Entities of Biological Interest (ChEBI), Chemical Methods Ontology (CHMO), and many others.

asm_florescence_luminescence_00.PNG
 
ASM and Aggregate Information

ASM Schemas have a well-defined way to handle aggregate information, where the same kind of data is repeated (as in a set of measurements or multiple observations). This pattern aggregates data facets (the information about each sub-measurement) under an aggregate object. These facets can in turn be aggregates themselves, which builds a tree-like structure.

asm_ftir_00.PNG
 

ASM and Indexed Information

ASM Schemas also employ a well-defined structure to handle indexed information automatically using standard JSON arrays. This approach indexes the data regardless of whether each measurement in the list is of the same type or can even contain different types simply by including a specific marker.

asm_light_obcuration_00.PNG

ASM and Multi-Dimensional Arrays (Data Cubes)

 

ASM Schemas also have standard structure to handle multi-dimensional arrays (with one or more controlled variables such as time or wavelength) and multiple measures (where one or more variable is recorded for each point). The Allotrope data framework borrows from the well-understood concept of a data cube as originally defined by the W3C. Within the ASM data result, this data cube structure includes its labeling, its structure (dimensions and measures), and the actual data values, storing all of the critical information in an easily-accessible manner. 

asm_qpcr_00.PNG

ASM and Binary Data

While the ASM is ideal to store simple numerical data and measurements, many analytical techniques produce binary data as results, such as images. ASM Schemas easily handle this data by referencing it outside of the ASM, much as a URL identifies a website. For example, the reference can indicate the relative path to the base location of the ASM JSON file.

ASM binary data reference support common file paths on Windows and UNIX (POSIX), along with URLs to the data:

  • File references

    • "POSIX path": "/path/file"

    • "UNC path": "//host/path/file

    • "file path": "x:y:z"

  • Web references

  • Semantic identifiers

    • "experimental data identifier": "my experiment"

    • "assay data identifier": "my assay"

asm_sem_00.PNG

ASM Integrated Reported Results with Processed Data

As ASMs all work with a common framework provided by Allotrope’s ontologies and have been built specifically to be modular, it makes it simple to put together multiple ASMs into a single file to represent more complex data.  A good example of this can be seen within the liquid chromatography domain, where raw instrument output in the form of a chromatogram can be stored alongside the processed list of peaks derived from that chromatogram.

This modularity also occurs at a lower level in assembling the ASM schemas for each domain, where data patterns provide reusable basic building blocks to represent recurring structures found in the underlying data.  For example, a chromatogram can be represented by a data cube pattern, and the peak list employs the indexed information pattern.

asm_liquid_chromatography_00.PNG

ASM Schema Modularity

While you don’t need to understand it to write or read data in an ASM format, the ASMs utilize a number of modern techniques in their construction which makes them easier to develop and maintain.  Building upon the latest Internet Engineering Task Force specifications which power data across the web, each domain covered by an ASM also has a corresponding JSON schema that is used to describe the ASM’s structure and enabling validation.

 

The ASM Schema is defined using:

  • Technique specific schema: a JSON Schema that contains the domain specific rules. It references the core declarations instead of each technique defining its own

  • Core schema: a JSON Schema that contains reusable, domain independent rules that can be applied across multiple ASM types to ensure consistency. For example, the Core schema defines value ranges for all possible values that may be used in tabular models.  In turn, it relies on a number of other reusable schemas that represent basic building blocks of ASMs.

  • Other reusable schemas: reusable, domain independent rules, such as unit schema, cube schema and so on that define basic recurring ASM data patterns.

 

By using a modular approach, ASM schemas can be easily extended in future revisions without changing each technique specific schema.

asm_schema_00.PNG

ASM Schema for Instance Data Validation

WAs mentioned previously, ASM schemas can also be used to validate the content of a specific ASM instance with a set of constraints.  As the ASM schemas are implemented using the JSON Schema approach, validation tools can be obtained from many different sources and used right off-the-shelf.  To validate an ASM, simply provide the ASM data along with the ASM Schema to the validator.  The validator will check the ASM and return a passing result or provide a list of errors to be corrected, making it easy to check the validity of data before utilizing it in further analysis. In addition, with JSON Schema, it’s possible to define constraints and validate specific ASM instance data values in addition to the structure of the document.

asm_schema_validation_00.PNG

For more information on JSON Schema and an up-to-date implementation of validators, schema generators, utilities, and other tools, please refer to the homepage of JSON Schema. It also provides a simple step-by-step tutorial.

Advantages of Allotrope Simple Models

The Allotrope Simple Models build upon Allotrope’s in-depth modeling of over 50 different laboratory domains and make it easy for generators and consumers of scientific data to represent their data in a simple yet comprehensive way that leverages Allotrope’s expansive ontologies to ensure a consistent nomenclature across any technique.  In addition:

  • ASMs leverage the collective wisdom of Allotrope’s community of scientists and engineers who work across all levels of industry, ensuring that the standard is truly comprehensive, and not just the work of a single company.

  • ASMs are interchangeable with Allotrope’s catalog of semantically-verified models, and provide a simple way to get started representing scientific information that can grow with your usage, paving the way to more sophisticated usage over time.

  • ASMs utilize JSON, which makes the data both easy to work with and also easy to validate:  no more need for complicated XML parsers, namespaces, and definitions, JSON is the standard for data interchange on today’s internet and supported out-of-the-box in every major programming language, requiring no additional libraries or optimized approaches.  Additionally, with the power of the JSON Schema, it’s possible to validate the individual data values themselves and not just the structure of the document.

  • Beyond Allotrope’s list of 50+ data domains which have already been defined, the community is growing and evolving, and our engineers make it easy to develop new areas into Allotrope’s community standards.