The Allotrope Framework
The Allotrope™ Data Format
The Allotrope Data Format (ADF) is an innovative federation of standards that features the ability to store datasets of nearly unlimited size and complexity in a single file, organized as a single or multiple n-dimensional arrays to record the measurements of the experiments, including time series and hyper-dimensional data. The ADF is also capable of storing the metadata describing the context of each test and measurement event as well as all the instrument settings. The ADF is portable, allowing easy file transfer and use across operating system and vendor platforms.
The ADF is built on the well-established HDF5 file specification for storage of data in a binary format, within which acquired data are stored in one or more Data Cube(s). Metadata are stored in the Data Description using an RDF Data Model for process, material, instrument and result details, as well as the metadata describing the Data Cube and Data Package layers, all based on semantic web and linked data concepts (and appropriate W3C specifications). This represents an extensible foundation to leverage big data platforms, semantic technologies and state-of-the-art analytics. In order to seamlessly incorporate companion files to the acquired data, the Data Package component of the ADF provides a virtual file system.
To lower the barrier to adoption, as well as ensure an unambiguous and consistent implementation of the standards, software tools (java and .NET class libraries) are provided to read, write, and perform common low level operations required for working with experimental data files, which can be used to build or adapt software to implement the ADF and Framework capabilities. The libraries include I/O APIs for the data cube, data package and data description (as well as Apache Jena and triple store libraries).
Taxonomies & Ontologies
The Allotrope Taxonomies are modeled domains, which currently include: material, equipment, process, result, and properties. This design provides for novel technique combinations and amalgams, allows low level concepts to be modeled with the appropriate level of consistency, coherence and reusability, and accommodates extensions as required for future use cases. Thus any workflow and technique in the ADF draws the appropriate terms and properties from across the domains, and the terms in a particular domain can be employed and combined to describe any number of different techniques or workflows. Moreover, The Allotrope Taxonomies are maintained separate and distinct from the code base, facilitating future growth, extension and evolution as IT technology, business needs, and the underlying science and instrument hardware evolves. The taxonomies have grown to a collective ~5,000 terms and properties, with variation in maturity depending on the sophistication and complexity of the underlying use cases that have driven the development. The initial versions of the taxonomies were based on harvesting existing taxonomies, standards and knowledge sources, whereas as subsequent extensions and updates have been primarily driven by the projects implementing the Framework in member companies, and contributing those updates back to the Allotrope community.
To provide structure to the ADF that can be auto-verified by computer systems, and provide reproducibility and predictability to the model for metadata across platforms, ‘data shapes’ (Shapes Constraint Language or SHACL, a W3C standard expressed as RDF triples) are used as a syntactic framework to provide an unambiguous data model, or structure, for the Data Description and Data Cube layers.