Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store and manage credentials separate from datasets #6612

Closed
normanrz opened this issue Nov 7, 2022 · 8 comments · Fixed by #6646
Closed

Store and manage credentials separate from datasets #6612

normanrz opened this issue Nov 7, 2022 · 8 comments · Fixed by #6646

Comments

@normanrz
Copy link
Member

normanrz commented Nov 7, 2022

Detailed Description

Credentials for remote datasets should be stored and managed separately from datasets. The credentials should be created by dataset managers or admins. Credentials should never be sent back to the client. Credentials can then be attached to layers (or mags).
Not sure if we should allow to delete credentials that are referenced by a dataset. Pro: It is very easy to revoke access. Con: It makes datasets unusable and maybe hard to find out why.

@normanrz
Copy link
Member Author

Please note that there are different types of credentials that we want to support:

  • HTTP basic auth + password
  • S3 Access Key ID + Secret Access Key
  • HTTP token (maybe)
  • GCS credientials (I think they have .pem files?)

@fm3
Copy link
Member

fm3 commented Nov 10, 2022

Currently, the MagLocator case class has a field credentials (case class FileSystemCredentials) – this means, the credentials are stored directly in the datasource-properties.json. Instead, I’d suggest to store a credentialsReference there, which is an id pointing towards the postgres database.

The explore route should add supplied credentials into the database, and return a json with the matching reference.

Later, when the datastore uses the MagLocator to load data, via zarrMag.remoteSource, The credentials should be looked up via an RPC to webKnossos (with caching), so a get route needs to be added to the wkRemoteDatastoreController, and a matching client method to the datastore.

I’m not sure yet how accessing this client method can work in the ZarrBucketProvider, we’ll have to think about this.

But I’d say you could already start designing the postgres entries.

I’m not sure this is the time to add the support for other kinds of credentials that are not essentially user/password (as we have now with basic auth and s3). But we should keep in mind that this will be added later and we need to be able to adapt this without having to rewrite the existing datasource-properties.json s

@normanrz
Copy link
Member Author

I’m not sure this is the time to add the support for other kinds of credentials that are not essentially user/password (as we have now with basic auth and s3). But we should keep in mind that this will be added later and we need to be able to adapt this without having to rewrite the existing datasource-properties.json s

I think we should care about the GCS case in this iteration. Maybe it the pem file could be represented as a password string.

I think it would be useful if the credential object had a field to what type it belongs (eg. S3, sasic auth, GCS) with a scope (e.g. S3 bucket, HTTP domain) and an optional user-definable name. That will make it easier for users to manage these credential objects.

@frcroth
Copy link
Member

frcroth commented Nov 12, 2022

So as I understand it in the datasource-properties.json there should be a field "credential" containing an object id. This object id is then used to look up the credential in the db when accessing the data set. Since different credential types have to be supported, should this be done via subclassing and using a different table for each type or with more generic table columns and a credential type enum?

@fm3
Copy link
Member

fm3 commented Nov 14, 2022

Your understanding is right :) I’d say credentialsId would be a good name in the json.

I’d say in scala, subclassing sounds fair, in postgres I’d say it’s better to have it in one table, with some nullable fields and a way to distinguish what the type is (enum may be a solution). I don’t have a super strong opinion here, though. If during the implementation you find that using different tables seems better, feel free to go that way too.

@frcroth
Copy link
Member

frcroth commented Dec 13, 2022

#6646 progress report

S3 access key and http basic auth now work.

a scope (e.g. S3 bucket, HTTP domain)

There is a scope value in the db which is optional, however it is not yet validated. Should the use of the credential fail if the scope does not match the request?

HTTP token (maybe)
GCS credientials (I think they have .pem files?)

This is not yet implemented. Are there examples for datasets with these credentials?

Also while the routes to create credentials are already working, credentials are not yet automatically created when exploring a data set from the frontend

@fm3
Copy link
Member

fm3 commented Dec 13, 2022

S3 access key and http basic auth now work.

Wohoo! 🎉

While the routes to create credentials are already working, credentials are not yet automatically created when exploring a data set from the frontend

I think this should be implemented in this iteration. The json returned by the explore routes should contain a reference to the newly created credentials object (and no longer the credentials themselves). This is exactly the benefit of this change.

I’d forward the other two questions to @normanrz

@normanrz
Copy link
Member Author

There is a scope value in the db which is optional, however it is not yet validated. Should the use of the credential fail if the scope does not match the request?

How are these scopes defined?

HTTP token (maybe)
GCS credientials (I think they have .pem files?)

This is not yet implemented. Are there examples for datasets with these credentials?

I have GCS credentials, but GCS is not yet implemented as filesystem provider. For HTTP token, the wk auth token would be a candidate (not sure if it is accepted via Authorization header yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants