Machine-Centric Science | Transcript: A1.2: The protocol allows for authentication and authorisation where necessary

April 13, 2022 • 5 Minutes

A1.2: The protocol allows for authentication and authorisation where necessary

FAIR does not mean open. You're certainly allowed to authenticate and to authorize. The HTTP protocol is pretty great for this.

Hello, and welcome to Machine-Centric Science. My name is Donny Winston, and I am here to talk about the FAIR principles in practice for scientists who want to compound to their impacts, not their errors. Today, we're going to be talking about the seventh of the 15 FAIR principles, A1.2: the protocol allows for authentication and authorization where necessary.

This is important for the FAIR principles, because FAIR does not mean open. You're certainly allowed to authenticate, that is, know who the requester is and confirm that the requester is who they say they are , and to authorize. So, given that you know who that person is, are they authorized to see this particular resource?

This is perfectly okay for FAIR. The important part is that it's a protocol, that it's machine-accessible, machine-actionable. It's not arbitrary.

You have to know whether something is behind authentication or authorization and if so, how to get at it.

The HTTP protocol is pretty great at this.

It accomplishes this in a lot of cases through headers. So you might send a request with a URL. If you're posting a form, then you'll send the URL and you'll also send a post body. But with any kind of request, you can also send a set of key-value pairs known as headers.

And one such header is the Authorization header. And this is how you can communicate information about who you are so that the server can know who you are and if you're authorized to do whatever. It's a protocol because if a request requires authentication and you don't supply it, you will get a predictable response in return, a code 401. And you'll get a WWW-Authenticate header with information about how to authenticate, how the server wants you to authenticate.

And there are a few different ways that are standard to authenticate. One is Basic authentication with a username and password. So the server is telling you, hey, I need basic authentication. So send me a username and password in the Authorization header. It could be Bearer-based authentication where you need to be bearing a token. So this is common with OAuth2, where you'll have a website that will require a token and you might log into your Google account or something like that. And Google will give you a bearer token so that the application knows, yes, this person is a valid Google user, and this is their email address, that sort of thing.

So, there are a variety of ways within HTTP that are standardized for authentication. It's sort of up to the server to authorize for a given resource.

There are different ways of authenticating securely, the most basic of which is to encrypt, and so if you're using basic authentication, if you're not using an SSL certificate, if you're not going over HTTPS, then your basic auth authentication username and password is sent in plain text, so you don't want to do that.

But other than that it's very common to have a shared secret where you're able to encode your secret and only the server knows about that. And so you'll have these hash-based message authentication codes, with shared secrets.

The other way to do things, particularly if you desire what's known as non-repudiation, which means that the sender of requests cannot later claim that they did not send the request.

In this case you want asymmetric encryption. You want things like digital signatures, request fingerprinting. That's beyond the scope of this episode.

In summary, the protocol should allow for authentication and authorization where necessary.

It should be explicit. Just like for the access protocol, you should know the metadata in order to access. And whether or not you are actually able to successfully authenticate and authorize for the resource, the protocol for how to do this should be straightforward, machine-actionable.

And this is what this part of FAIR means. So FAIR doesn't necessarily mean open, openly accessible, but again, the method of access and the way that you verify that you are who you are, and you're allowed to access what you're allowed to access, should be explicit in the metadata and machine-actionable.

That'll be it for today. I'm Donny Winston, and I hope you join me again next time for Machine-Centric Science.