The Allotrope Framework
Working with Osthus, our technology partner, we conducted an extensive analysis of existing data standards, architecture, and related taxonomies and ontologies initiatives, which included the development and deployment of proof-of-concept applications to test concepts and standards using real use cases. The net result of iterative research, evaluation and design is a holistic approach to the file format, APIs and semantics capabilities collectively referred to as the Allotrope Framework.
The Allotrope™ Data Format
The Allotrope Data Format (ADF) is an innovative federation of standards that features the ability to store datasets of nearly unlimited size and complexity in a single file, organized as a single or multiple n-dimensional arrays to record the measurements of the experiments, including time series and hyper-dimensional data. The ADF is also capable of storing the metadata describing the context of each test and measurement event as well as all the instrument settings. The ADF is portable, allowing easy file transfer and use across operating system and vendor platforms.
The ADF is built on the well-established HDF5 file specification for storage of data in a binary format, within which acquired data are stored in one or more Data Cube(s). Metadata are stored in the Data Description using an RDF Data Model for process, material, instrument and result details, as well as the metadata describing the Data Cube and Data Package layers, all based on semantic web and linked data concepts (and appropriate W3C specifications). This represents an extensible foundation to leverage big data platforms, semantic technologies and state-of-the-art analytics. In order to seamlessly incorporate companion files to the acquired data, the Data Package component of the ADF provides a virtual file system.
The Allotrope Data Format (ADF) Explorer is a lightweight software application that uses the Allotrope APIs and class libraries and provides the ability to view and interact with the three components of an ADF file (i.e. data cube, data description, and data package) regardless of the application that generated the ADF file. The ADF Explorer will be available from Allotrope Foundation free of charge.
The ADF Explorer will serve two main functions: 1) as a tool for software developers/engineers to verify that the developer’s native software application is correctly creating or processing ADF files, aiding in their ability to deliver “Allotrope Framework enabled software.”; and 2) as a viewer for scientists and end-users to explore the contents of an ADF file independent of the native software application. These functions will be delivered through a simple and intuitive user interface, created be our APN member EPAM* in direct consultation with representatives of Allotrope Foundation and the Allotrope Partner Network. (*EPAM was selected by Allotrope Foundation to deliver the ADF Explorer via an open Request for Proposals (RFP) process in 2016).
The availability of the ADF Explorer is an important step on the path from development of the Allotrope Framework to implementation the Allotrope Data Standards. While Allotrope Foundation does not plan to enrich the read-only capabilities of the ADF Explorer, its modular architecture will allow the user community to create extensions and enhanced capabilities, and also allow developers to integrate the ADF Explorer technology into their own product offerings.
To lower the barrier to adoption, as well as ensure an unambiguous and consistent implementation of the standards, software tools (java and .NET class libraries) are provided to read, write, and perform common low level operations required for working with experimental data files, which can be used to build or adapt software to implement the ADF and Framework capabilities. The libraries include I/O APIs for the data cube, data package and data description (as well as Apache Jena and triple store libraries).
Taxonomies & Ontologies
The Allotrope Taxonomies are modeled domains, which currently include: material, equipment, process, result, and properties. This design provides for novel technique combinations and amalgams, allows low level concepts to be modeled with the appropriate level of consistency, coherence and reusability, and accommodates extensions as required for future use cases. Thus any workflow and technique in the ADF draws the appropriate terms and properties from across the domains, and the terms in a particular domain can be employed and combined to describe any number of different techniques or workflows. Moreover, The Allotrope Taxonomies are maintained separate and distinct from the code base, facilitating future growth, extension and evolution as IT technology, business needs, and the underlying science and instrument hardware evolves. The taxonomies have grown to a collective ~5,000 terms and properties, with variation in maturity depending on the sophistication and complexity of the underlying use cases that have driven the development. The initial versions of the taxonomies were based on harvesting existing taxonomies, standards and knowledge sources, whereas as subsequent extensions and updates have been primarily driven by the projects implementing the Framework in member companies, and contributing those updates back to the Allotrope community.
To provide structure to the ADF that can be auto-verified by computer systems, and provide reproducibility and predictability to the model for metadata across platforms, ‘data shapes’ (Shapes Constraint Language or SHACL, a W3C standard expressed as RDF triples) are used as a syntactic framework to provide an unambiguous data model, or structure, for the Data Description and Data Cube layers.