What does this mean?
Principle F1 is arguably the most important because it will be hard to achieve other aspects of FAIR without globally unique and persistent identifiers. Hence, compliance with F1 will already take you a long way towards publishing FAIR data (see 10 ways identifiers can help with data integration).
Globally unique and persistent identifiers remove ambiguity in the meaning of your published data by assigning a unique identifier to every element of metadata and every concept/measurement in your dataset. In this context, identifiers consist of an internet link (e.g., a URL that resolves to a web page that defines the concept such as a particular human protein). Many data repositories will automatically generate globally unique and persistent identifiers to deposited datasets. Identifiers can help other people understand exactly what you mean, and they allow computers to interpret your data in a meaningful way (i.e., computers that are searching for your data or trying to automatically integrate them). Identifiers are essential to the human-machine interoperation that is key to the vision of Open Science. In addition, identifiers will help others to properly cite your work when reusing your data.
Of course, identifiers are one thing, but their meaning is another (see principles I1-I3). F1 stipulates two conditions for your identifier:
- It must be globally unique (i.e., someone else could not reuse/reassign the same identifier without referring to your data). You can obtain globally unique identifiers from a registry service that uses algorithms guaranteeing the uniqueness of newly minted identifiers.
- It must be persistent. It takes time and money to keep web links active, so links tend to become invalid over time. Registry services guarantee resolvability of that link into the future, at least to some degree.
Examples of globally unique and persistent identifiers
- One particular person on planet earth has this globally unique and persistent identifier: https://orcid.org/0000-0001-8888-635X
- Here is an identifier that uniquely links to the results of a study estimating the FAIRness of different data repositories: doi:10.4121/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f
- The human polycystin-1 protein has a globally unique and persistent identifier given by the UniProt database: http://www.uniprot.org/uniprot/P98161
- Polycystic kidney disease Type 1 has a globally unique and persistent identifier given by the OMIM database: http://omim.org/entry/173900
- The number 163483 refers to the undergraduate student ID of Mark Wilkinson, the NCBI gi number for a bovine protease, and a part number for a Singer sewing machine. Hence, this is a poor example of F1 !
Example services that supply globally unique and persistent identifiers
- Identifiers.org provides resolvable identifiers in the form of URIs and CURIEs: http://identifiers.org
- Universally unique identifier: https://en.wikipedia.org/wiki/Universally_unique_identifier
- Persistent URLs: http://www.purlz.org
- Digital Object Identifier: http://www.doi.org
- Archival Resource Key: https://escholarship.org/uc/item/9p9863nc
- Research Resource Identifiers: https://scicrunch.org/resources
- Identifiers for funding organisations (see F3 & R1): https://www.crossref.org/services/funder-registry/
- Identifiers for the world’s research organisations (see F3 & R1): https://www.grid.ac