The FAIR Data Principles apply to metadata, data, and supporting infrastructure (e.g., search engines). Most of the requirements for findability and accessibility can be achieved at the metadata level. Interoperability and reuse require more efforts at the data level. The scheme below depicts the FAIRification process adopted by GO FAIR, focusing on data, but also indicating the required work for metadata:
The FAIRification process consists of the following steps:
- Retrieve non-FAIR data: gain access to the data to be FAIRified.
- Analyse the retrieved data: inspect the content of the data: Which concepts are represented? What is the structure of the data? What are the relations between the data elements? Different data distributions require different methods for identification and analysis. For instance, if the dataset is in a relational database, the relational schema provides information about the dataset structure, the types involved (the field names), cardinality, etc.
- Define the semantic model: define a ‘semantic model’ for the dataset, which describes the meaning of entities and relations in the dataset accurately, unambiguously, and in a computer-actionable way. Depending on the dataset, defining a proper semantic model may require a significant effort, even for experienced data modellers. A good semantic model should represent a consensus view in a particular domain, for a particular purpose. Therefore, it is good practice to search for existing models. Semantic models often contain multiple terms from existing ontologies and vocabularies. A vocabulary is a computer-readable file that captures terms, their URIs, and descriptions. An ontology can be roughly described as a vocabulary with hierarchies, meaningful relations among concepts, and their constraints. These conceptual models allow us to classify our data models and data items using the provided terms, concepts, and conceptual structures.
- Make data linkable: The non-FAIR data can be transformed into linkable data by applying the semantic model defined in step 3. Currently, this is done using Semantic Web and Linked Data technologies. This step promotes interoperability and reuse, facilitating the integration of the data with other types of data and systems. However, the user should evaluate the feasibility of this step for the given data. It is a sensible thing to do for many types of data (e.g., structured data), but it may not be relevant for other types (e.g., the pixels or audio elements in images, audio data, and videos). Of course, the annotations about the images, audio, and video (e.g., data about identified regions of images, or about parts of an audio file) could very well be made linkable.
- Assign license: Although license information is part of the metadata, we have incorporated the license assignment as a separate step in the FAIRification process to highlight its importance. The absence of an explicit license may prevent others to reuse data, even if the data is intended to be open access.
- Define metadata for the dataset: As explained by many of the FAIR principles, proper and rich metadata support all aspects of FAIR. (Read the GO FAIR recommendation for metadata.)
- Deploy FAIR data resource: deploy or publish the FAIRified data, together with relevant metadata and a license, so that the metadata can be indexed by search engines and the data can be accessed, even if authentication and authorisation are required.