Store and manage credentials separate from datasets #6612

normanrz · 2022-11-07T12:52:06Z

Detailed Description

Credentials for remote datasets should be stored and managed separately from datasets. The credentials should be created by dataset managers or admins. Credentials should never be sent back to the client. Credentials can then be attached to layers (or mags).
Not sure if we should allow to delete credentials that are referenced by a dataset. Pro: It is very easy to revoke access. Con: It makes datasets unusable and maybe hard to find out why.

normanrz · 2022-11-10T08:45:56Z

Please note that there are different types of credentials that we want to support:

HTTP basic auth + password
S3 Access Key ID + Secret Access Key
HTTP token (maybe)
GCS credientials (I think they have .pem files?)

fm3 · 2022-11-10T12:50:18Z

Currently, the MagLocator case class has a field credentials (case class FileSystemCredentials) – this means, the credentials are stored directly in the datasource-properties.json. Instead, I’d suggest to store a credentialsReference there, which is an id pointing towards the postgres database.

The explore route should add supplied credentials into the database, and return a json with the matching reference.

Later, when the datastore uses the MagLocator to load data, via zarrMag.remoteSource, The credentials should be looked up via an RPC to webKnossos (with caching), so a get route needs to be added to the wkRemoteDatastoreController, and a matching client method to the datastore.

I’m not sure yet how accessing this client method can work in the ZarrBucketProvider, we’ll have to think about this.

But I’d say you could already start designing the postgres entries.

I’m not sure this is the time to add the support for other kinds of credentials that are not essentially user/password (as we have now with basic auth and s3). But we should keep in mind that this will be added later and we need to be able to adapt this without having to rewrite the existing datasource-properties.json s

normanrz · 2022-11-10T12:58:06Z

I’m not sure this is the time to add the support for other kinds of credentials that are not essentially user/password (as we have now with basic auth and s3). But we should keep in mind that this will be added later and we need to be able to adapt this without having to rewrite the existing datasource-properties.json s

I think we should care about the GCS case in this iteration. Maybe it the pem file could be represented as a password string.

I think it would be useful if the credential object had a field to what type it belongs (eg. S3, sasic auth, GCS) with a scope (e.g. S3 bucket, HTTP domain) and an optional user-definable name. That will make it easier for users to manage these credential objects.

frcroth · 2022-11-12T16:46:44Z

So as I understand it in the datasource-properties.json there should be a field "credential" containing an object id. This object id is then used to look up the credential in the db when accessing the data set. Since different credential types have to be supported, should this be done via subclassing and using a different table for each type or with more generic table columns and a credential type enum?

fm3 · 2022-11-14T10:34:26Z

Your understanding is right :) I’d say credentialsId would be a good name in the json.

I’d say in scala, subclassing sounds fair, in postgres I’d say it’s better to have it in one table, with some nullable fields and a way to distinguish what the type is (enum may be a solution). I don’t have a super strong opinion here, though. If during the implementation you find that using different tables seems better, feel free to go that way too.

frcroth · 2022-12-13T10:34:29Z

#6646 progress report

S3 access key and http basic auth now work.

a scope (e.g. S3 bucket, HTTP domain)

There is a scope value in the db which is optional, however it is not yet validated. Should the use of the credential fail if the scope does not match the request?

HTTP token (maybe)
GCS credientials (I think they have .pem files?)

This is not yet implemented. Are there examples for datasets with these credentials?

Also while the routes to create credentials are already working, credentials are not yet automatically created when exploring a data set from the frontend

fm3 · 2022-12-13T10:43:01Z

S3 access key and http basic auth now work.

Wohoo! 🎉

While the routes to create credentials are already working, credentials are not yet automatically created when exploring a data set from the frontend

I think this should be implemented in this iteration. The json returned by the explore routes should contain a reference to the newly created credentials object (and no longer the credentials themselves). This is exactly the benefit of this change.

I’d forward the other two questions to @normanrz

normanrz · 2022-12-13T10:54:09Z

There is a scope value in the db which is optional, however it is not yet validated. Should the use of the credential fail if the scope does not match the request?

How are these scopes defined?

HTTP token (maybe)
GCS credientials (I think they have .pem files?)

This is not yet implemented. Are there examples for datasets with these credentials?

I have GCS credentials, but GCS is not yet implemented as filesystem provider. For HTTP token, the wk auth token would be a candidate (not sure if it is accepted via Authorization header yet).

normanrz added enhancement backend frontend labels Nov 7, 2022

fm3 assigned frcroth Nov 9, 2022

frcroth mentioned this issue Nov 17, 2022

Store remote dataset credentials separately #6646

Merged

6 tasks

bulldozer-boy bot closed this as completed in #6646 Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store and manage credentials separate from datasets #6612

Store and manage credentials separate from datasets #6612

normanrz commented Nov 7, 2022

normanrz commented Nov 10, 2022

fm3 commented Nov 10, 2022

normanrz commented Nov 10, 2022

frcroth commented Nov 12, 2022

fm3 commented Nov 14, 2022

frcroth commented Dec 13, 2022

fm3 commented Dec 13, 2022 •

edited

Loading

normanrz commented Dec 13, 2022

Store and manage credentials separate from datasets #6612

Store and manage credentials separate from datasets #6612

Comments

normanrz commented Nov 7, 2022

Detailed Description

normanrz commented Nov 10, 2022

fm3 commented Nov 10, 2022

normanrz commented Nov 10, 2022

frcroth commented Nov 12, 2022

fm3 commented Nov 14, 2022

frcroth commented Dec 13, 2022

fm3 commented Dec 13, 2022 • edited Loading

normanrz commented Dec 13, 2022

fm3 commented Dec 13, 2022 •

edited

Loading