F3: Metadata clearly and explicitly include the identifier of the data they describe

Literature references with and without DOIs. Tables of data in articles with and without unique identifiers in each row for what that row is about. The magic of including identifiers in the metadata you share.

Hello, and welcome to Machine-Centric Science. My name is Donny Winston, and I'm here to talk about the FAIR principles in practice for scientists who want to compound their impacts, not their errors. Today, we're going to be talking about the third of the FAIR principles, F3: metadata clearly and explicitly include the identifier of the data they describe.

I can't tell you how many times I've gotten a literature reference without a DOI. You sort of copy and paste it into Google or something, and usually it's okay, but sometimes it's not okay; depending on how it was formatted, there may be abbreviations in the journal name or this or that. And so you're probably going to find it, probably, but ugh, when I get a reference with a DOI, I sort of just ignore everything else. It's great to have all that other stuff for confirmation. So once I load the DOI, I can double-check by eye -- yep, that's the right author, that's the right title. Someone didn't mis-copy the DOI. But it's so great when a citation, which is metadata for a publication -- who's the author, the title, when was it published? --clearly and explicitly includes the identifier of what it's describing: the publication, the DOI in this case.

Once you have a DOI, I'll add it to my Zotero bibliography manager easily and I can do all sorts of things with it. I can format it using any citation style that I want. It's just great to have the identifier.

Also if I see a table of data in an article without identifiers, it's unclear. I might be redirected to get the data from this source. Or go to this website or go here and search for this. And it's unclear. There might be data connected to a material, say, where only the chemical formula is given, which means I need to disambiguate among different crystal structures for that chemical formula. I might need to read in the text because the disambiguation isn't necessarily given in the figure caption or the table. It's just so nice to see a table of data with one column of the table being a unique identifier for the thing that the row is an observation of.

And so this F3 is the equivalent of that, but with an emphasis on machine-actionability, so it's something we appreciate as humans, but this connection between metadata and the identifier should be given in some kind of a formal manner. So, for example, using the Data Catalog Vocabulary dcat:Dataset class in the case of RDF metadata. And you can clearly talk about this identifier as being a dataset. And so everything else you say about it, like the author, et cetera, you know you're talking about a dataset.

That's it for today. I'm Donny Winston and I hope you join me again next time for Machine-Centric Science.