The scalable and transparent ‘routing’ of data, tools, and compute (to run the tools on) is a key central feature of the envisioned Internet of FAIR Data & Services (IFDS).
GO FAIR has identified the need to ensure a solid and sustainable infrastructure for the ‘core’ of the propeller image above, which is briefly explained here:
- The original ‘hourglass model’ that underpins the successful and scalable growth of the Internet as we know it and its main application, the World Wide Web, will need a rough equivalent in the IFDS. Nothing will be fully ‘identical’ as the IFDS does not start in a greenfield and will build on current internet infrastructure wherever possible. However, there are clear similarities: In the ‘classical’ hourglass model, the TCP/IP is usually placed in the narrow centre of the hourglass: In fact, all blocks below it can be broadly classified as ‘underlying network infrastructure’ and all levels above the narrow waist are leading to a wide variety of applications with both sides having maximum freedom to make implementation choices. This is a basic ‘mantra’ in GO FAIR as well: ‘Only set minimal necessary protocols and standards and support a wide variety of implementation choices for data, tools, and compute elements to participate in the growing IFDS’.
- If we now try to translate the hourglass model to the IFDS, we deal with three fundamentally different basic elements to be ‘routed’ to find each other at the right time and place and to be maximally effectively used and reused. We have qualified these in the three broad categories DATA, TOOLS, and COMPUTE. There are grey areas, as software code (mainly covered under executable tools) can also be regarded as ‘data’ and middleware could be classified as part of the ‘compute’ infrastructure. We also realise that these boundaries may blur even further when data driven and computationally assisted science will develop exponentially in the decades to come. However, for all practical purposes, we follow earlier definitions, and we basically want to treat all ‘digital objects’ in the IFDS according to the same principles, including the need for sufficiently rich machine-actionable metadata such as elaborated on in the FAIR principles and in several follow-up publications. Tools are defined mostly as software-type services that ‘act on data’, such as for instance virtual machines packaged to travel the IFDS for distributed data analytics, but also, for instance, data repositories. So, we deal with three classes of ‘top levels’ in three hourglasses, each with their needed under-the-hood network and routing infrastructure:
- Intuitively, the IFDS would function most fluently if the infrastructure (where possible the existing internet infrastructure) would operate on a strong, common and globally interoperable networking and routing engine that could efficiently route data to tools, tools to data, and both to the needed compute, as these three elements will increasingly no longer reside in large centralised super storage and HPC facilities but will be distributed ‘all over the internet’. Many performance and security issues will have to be addressed and they will be addressed in other GO FAIR INs and elsewhere, but these are not the focus of this MEMO. We visualise the basics of the IFDS here as a ‘propellor’ with the engine ‘under the hood:
- A first very important aspect of our further reasoning is that we adopt the basics of the Digital Object Model and consider each digital object (from a single concept-reference, such as an identifier to a single machine-readable assertion to an entire database or software package) according to the following simplified scheme: The first obvious prerequisite for the IFDS is that each digital object is assigned (and findable through) a Unique, Persistent and Resolvable Identifier (UPRI). The specific addition of the term ‘resolvable’ here indicates the need to accept multiple, UPRIs to point to the same concept. There are several initiatives underway to repair the current undesirable situation where most data and services do not even fulfil this first criterion to participate in Open Science and the IFDS in general. We will rely on these initiatives when they become community-adopted and follow them as well as contribute to their development wherever appropriate but also this aspect is not the core of the proposed service layer discussed in this memo. We assume that digital objects as ‘containers’ have a UPRI.
- However, in order to intelligently route data to tools, tools to data and both to compute (and in the future likely even mobile compute?) we need more than just UPRIs for the containers. We need to describe the containers with rich enough metadata in machine-readable format for both machine and humans (with lingual interface outputs and search capabilities for the latter) to Find, Access, Interoperate, and thus effectively Reuse these components of the IFDS in a myriad of combinations in near real-time. As said, for each and every concept referred to in the metadata as well as, where possible, in the data themselves we need to enforce the use of UPRIs but the choice for various UPRIs (even within the same domain) for the same concept will persist at least for the foreseeable future and belongs to the first degree of ‘freedom to operate’ away from the centre of the hourglass. However, to enable this critical degree of freedom in the IFDS, which will be even more important when we really want to support interdisciplinary research and innovation, we need very high quality, robust, and sustainable mapping services between UPRIs and human-readable terms that denote the same concept in digital objects.These ‘mapping tables’ are critical infrastructure in the ‘centre of the propeller’. A major problem is that currently, such services (for example BioPortal in the life sciences, OLS, FAIR Sharing are built, maintained and funded largely by academic efforts and funded through volatile, few-year cycles of public funding, frequently even in fierce competition with ‘rocket science’. A key feature of GO FAIR as a movement is that we mobilise existing networks of excellence (gems) to converge and ‘speak with one voice’ to the funders (both public and private) of research and innovation about previously controversial issues such as the lack of sustainable investment in the ‘rocket launcher’ (the underlying infrastructure for ‘rocket-open-science’). We recognise that the case for each individual service component (such as BioPortal, a single ontology, FAIRSharing, ISA tools etc.) is difficult to make and is even more impeded by each of the academic groups running for the last possible funding source to keep the service up and running for another few months or years. It should be obvious that this is severe malpractice and may all by itself prevent the IFDS to develop rapidly unless we find a collective and sustainable solution. The first step we want to cover in this text is to place these individual ‘core resources’ in a much more comprehensive and internally consistent context. Mapping tables, protocols and other community emerging standards should not only find a ‘home’ (such as for instance FAIRsharing), but should also be collectively endorsed and used in practice by much more coherent communities. A very important aspect of GO FAIR will be to support the process of coordination within and across implementation, training and certification networks to minimise reinvention of redundant infrastructure components, including such things as thesauri and domain specific or generic ontologies protocols and other standards related elements of the IFDS. But, as said, we have learned that, classically, domains operate in silos and that even within domains multiple standards, vocabularies, languages, and approaches will continue to emerge. This is not only a nuisance and a lack of coordination and discipline, it is also an intrinsic part of the creative process that should be supported in order to further our knowledge and drive innovation. This means that ‘mapping tables’, ‘libraries to choose from’, ‘community standards registries’, etc. will continue to be crucial elements of the IFDS support infrastructure.