GO FAIR Virus Outbreak Data Network (VODAN) as one of the COVID-19 related activities of the Data Together organisations
The COVID-19 pandemic presents a major test for our science system and our research and data infrastructures. The Data Together organisations –comprising CODATA, GO FAIR, RDA and WDS– jointly contend that it is essential to ensure that data and science platforms/infrastructures are based on the FAIR Principles to meet both the immediate needs and the long-term objectives of global science. This will maximise the ability to combine, visualise, and use data from many sources; facilitate fine-grained data access and protection; and allow for decentralised and machine assisted analysis.
Three COVID-19 activities are highlighted in the Data Together statement:
– GO FAIR VODAN Implementation Network
– RDA COVID-19 Working Group
– CODATA global programme ‘Making Data Work for Cross-Domain Grand Challenges’
The following boundary considerations for an open science and FAIR data platform focused on COVID-19 were outlined:
- Much COVID-19 related data has political or institutional sensitivities, some have personal components. Access to sensitive and personal data must be restricted in various ways, and the institutions that control such datasets cannot release them openly without additional processing or access controls. Such data can generally only be accessed partially and in controlled circumstances, and for this to be achieved, the data must be FAIR. We need, therefore, to make a rigorous effort in favour of FAIR, while continuing to emphasise the policy: As open as possible, as closed as necessary.
- In such circumstances (and many others) a centralised, data warehousing approach is not possible and not fit for purpose. As the data are distributed, rich machine-actionable FAIR metadata is necessary to enable controlled, computational access for analysis or visualisation.
- Very large quantities of data are being generated in relation to the pandemic. There are also significant challenges in ensuring data quality and, as well as risks of false and misleading information being disseminated as ‘fact’. This poses challenges for science and society: there should be a mechanism to mitigate such dangers.
- Consequently, we need to facilitate and further enhance methods for ‘distributed deep learning’, and make sure the algorithms and services on which such approaches are based can work effectively with FAIR metadata and, where possible, with FAIR data.
- The current circumstances necessitate urgent development, in the interfaces of many services and components, of a ‘community annotation system’ that enables objective assessment of new claims and information. Decision makers, funders and advisory groups can then rely on a vast community of trusted experts to review new and existing claims relevant for COVID-19 interventions.
- It is essential to avoid that any particular, public or private, organisation should be able to monopolise the applications or the FAIR ecosystem. Therefore, a quality control and a minimal certification scheme for all components must be in place as part of the effort.