Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to rename datasets #6613

Closed
normanrz opened this issue Nov 7, 2022 · 8 comments · Fixed by #8075
Closed

Allow to rename datasets #6613

normanrz opened this issue Nov 7, 2022 · 8 comments · Fixed by #8075

Comments

@normanrz
Copy link
Member

normanrz commented Nov 7, 2022

Detailed Description

Datasets should be renamable. For that, we should decouple the URL from the dataset name in Notion-style (i.e. <name>-<id>, but only the id part is actually used). I would suggest to add a new dataset name field that is separate from the name that is used for locating the dataset on-disk (maybe rename that to "path"?). Not sure, whether we should reuse the displayName field. We should check the current occurrences and impose the same restrictions as on dataset names. For backwards compatibility, we resolve the name in the URL on a best-effort basis.

@fm3
Copy link
Member

fm3 commented Jan 18, 2023

unassigning myself for the moment, since sprint priorities have changed

@fm3 fm3 removed their assignment Jan 18, 2023
@fm3
Copy link
Member

fm3 commented Feb 13, 2023

Note: In this context we may also create a way of deleting datasets, without breaking their annotations (i.e. still allowing to list + download them)

@fm3
Copy link
Member

fm3 commented Aug 5, 2024

I don’t think we can easily reuse the displayName, because several existing values contain spaces and parentheses etc. Unless we decide that we can just drop/convert these.

If we want to keep that option, we will then need

id
nameOnDisk (or path? directory? note that it’s not the full path)
name (for use in uris, strict restrictions, but no longer unique)
displayName (no restrictions, free text field, not unique. as before)

@fm3
Copy link
Member

fm3 commented Aug 5, 2024

Another thought: since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.

We could just make the display name (or maybe another name field) a more prominent feature 🤔
And switch to the id-based URI format, with fallback to the old format.

I get the feeling I don’t yet know well enough what the desired outcome is here. Depending on that we may have to either change almost all APIs or only a few.

@normanrz
Copy link
Member Author

normanrz commented Aug 5, 2024

since we still need the nameOnDisk anyway, could we not use that for most APIs as well? e.g. in datastore and worker jobs.

I don't think that is a good idea. Mid-term I'd like to make datasets virtual, ie. they don't need a folder on disk. For that, we need id-based URIs. Doesn't seem wise to refactor to nameOnDisk when we want to change that in a few months to ids.

@normanrz
Copy link
Member Author

normanrz commented Aug 5, 2024

Keeping both displayName and name going forward seems unnecessary

@fm3
Copy link
Member

fm3 commented Aug 5, 2024

Thoughts

  • dataset id also needs to be serialized to NMLs
  • heuristic to select dataset from ambiguous names: take the oldest (hypothesis: old NMLs/URIs were generated before the renaming was ok, so probably old dataset)

@MichaelBuessemeyer
Copy link
Contributor

MichaelBuessemeyer commented Sep 5, 2024

Notes from my talk with @normanrz:

  • The main motivation is that datasets are no longer referenced by their name but by their id. This enables to have multiple datasets with the same name. The on-disk directory in which a dataset is store should be decoupled from its name. For readability, we use an addressing schema like used by notion as described in the issue description.
  • A row is needed that stores the directory name: e.g. "path"
  • name & displayName are kinda duplicates, just keep displayName. In the URL replace all special characters with something more friendly like - . or just omit them. Take a look at how notion handles this.
  • Problem: only few datasets have a display name set. => use displayName where possible, else use name as fallback
  • Problem: To be still able to locate dataset on disc keep a "copy" of the old name column and name it e.g. path; Kinda like a legacy name
  • When creating a new dataset use the name given by the user as path in case it is still unique. Else use the new dataset's id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants