TY - JOUR
T1 - Identifiers for the 21st century
T2 - How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data.
AU - McMurry, Julie A
AU - Juty, Nick
AU - Blomberg, Niklas
AU - Burdett, Tony
AU - Conlin, Tom
AU - Conte, Nathalie
AU - Courtot, Mélanie
AU - Deck, John
AU - Dumontier, Michel
AU - Fellows, Donal
AU - Gonzalez-Beltran, Alejandra
AU - Gormanns, Philipp
AU - Grethe, Jeffrey
AU - Hastings, Janna
AU - Hériché, Jean-Karim
AU - Hermjakob, Henning
AU - Ison, Jon C
AU - Jimenez, Rafael C.
AU - Jupp, Simon
AU - Kunze, John
AU - Laibe, Camille
AU - Le Novère, Nicolas
AU - Malone, James
AU - Martin, Maria Jesus
AU - McEntyre, Johanna R.
AU - Morris, Chris
AU - Muilu, Juha
AU - Müller, Wolfgang
AU - Rocca-Serra, Philippe
AU - Sansone, Susanna Assunta
AU - Sariyar, Murat
AU - Snoep, Jacky L
AU - Soiland-Reyes, Stian
AU - Stanford, Natalie J
AU - Swainston, Neil
AU - Washington, Nicole
AU - Williams, Alan R.
AU - Wimalaratne, Sarala M.
AU - Winfree, Lilly M.
AU - Wolstencroft, Katherine
AU - Goble, Carole
AU - Mungall, Christopher J
AU - Haendel, Melissa A
AU - Parkinson, Helen
N1 - This article was submitted to PLOS Biology 2016-10-04, a revised article was submitted 2017-03-07 after peer review. Published in PLOS Biology 2017-06-29.
PY - 2017/6/29
Y1 - 2017/6/29
N2 - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
AB - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
UR - https://groups.google.com/forum/#!forum/id21c
U2 - 10.1371/journal.pbio.2001414
DO - 10.1371/journal.pbio.2001414
M3 - Article
SN - 1545-7885
VL - 15
JO - PLoS Biology
JF - PLoS Biology
IS - 6
M1 - e2001414
ER -