Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata version #11479

Open
4e6 opened this issue Nov 4, 2024 · 7 comments
Open

Add metadata version #11479

4e6 opened this issue Nov 4, 2024 · 7 comments
Assignees

Comments

@4e6
Copy link
Contributor

4e6 commented Nov 4, 2024

Extracted from the #11390 (comment)

Issue

Add version to the metadata section of the file. Implementation could be

Separate section

Relates to the whole metadata section. Can change the way the rest of the metadata is parsed.

#### METADATA ####
{"version":1}
[]
{}

Encoded in METADATA string

Same as the previous but different encoding.

#### METADATA v1 ####
[]
{}

Extend the metadata

Semantically relates to the last metadata line but does not require changing the parser.

#### METADATA ####
[]
{"version":1,"ide":{}}

Make the METADATA section a comment

Make the metadata section an Enso comment since we're changing the parser anyway.

#### METADATA ####
  {"version":1}
  []
  {}
@kazcw
Copy link
Contributor

kazcw commented Nov 4, 2024

External metadata

There are a few reasons to have the metadata inside the file as we do now:

  • Atomicity: Ensure it is in sync with the file.
  • Transparency: Enso developers who know the format can use it for debugging.
  • Portability: Moving or copying a file doesn't cause metadata loss.

I think that each of these reasons could be (or already has been) addressed without needing the metadata to be part of the file.

Moving the metadata out of the file would enable large efficiency improvements--for example, it removes the need for the format to be text-safe.

Atomicity

Atomicity is less of a concern now that the metadata format is resilient--if the metadata and the source file end up slightly out of sync (e.g. due to a sudden process exit), this should cause little or no disruption to metadata usability.

Transparency

The metadata format has never been very human-readable; we can probably address this use case better with improved tooling.

Portability

Portability can be achieved without keeping all the data in the file; we just need unique file IDs:

# ENSO file-id: 44e510 #

In the metadata database, we would look up metadata primarily by file-id. Each file-id would have one "origin" FS path; if we find a file-id at a different path, we read the data according to the claimed file-id, then we assign a new ID for the new path--this way copies would share data when initially read, but evolve independently.

This approach would allow metadata to "follow" moved or copied files as well as it does now, within a local filesystem. It wouldn't work when sending a file between computers, but the IDE cannot operate on one file in isolation anyway; users already need to import/export project, in which case we could include metadata in the project file.

Side-note: Comment type

If we want metadata to be ignored by the parser without the parser needing to recognize it specifically, a doc comment (starting with ##) is the wrong kind of comment. During translation we do some work to assemble a doc comment into an abstracted text string, and then we place it in the IR; in the future we are likely to introduce a warning for unused documentation. Plain comments are more "ignored"--the parser represents them exactly, and does nothing else with them.

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Nov 5, 2024

External metadata

I'd rather move forward by smaller steps. E.g. versioning and (being a) comment to begin with any other changes later.

_If we want metadata to be ignored by the parser without the parser needing to recognize it specifically, a doc comment (starting with ##) is the wrong kind of comment.

OK, so what do you suggest? Is:

#*** META-DATA 2.0 ***#
  [json1...]
  [json2...]

better?

@farmaazon
Copy link
Contributor

My twopenny:

  1. Transparency - currently is not readable, but we had talks about making it better, for example identifying nodes by name and put it in YAML format.
  2. Portability - the proposed solution still breaks if someone is sending to their friend just a file, without a project. A
  3. Also, there is a problem with the version control - the file metadata also should be versioned, and having it in a single file simplifies any action (moving, checkout, etc.).

Of course, we can solve those problems, but I personally I don't see any efficiency improvements so large it would justify the effort.

@JaroslavTulach
Copy link
Member

@jdunkerley told me that there is a problem:

  • use the latest development IDE and save a project
  • send this project to someone using the last release IDE
  • the person will not open the project
  • the latest release is not able to parse the file due to metadata changes

My 2 Kč fix:

  • necessary turn stored metadata into a comment so that last release IDE ignores the comment and does not try to parse it as metadata as that would fail
  • future proof - add version to the metadata and if it is new than expected (produced by some future IDE) then don't read it at all

CCing @kazcw, @hubertp.

@farmaazon
Copy link
Contributor

@jdunkerley told me that there is a problem:

  • use the latest development IDE and save a project
  • send this project to someone using the last release IDE
  • the person will not open the project
  • the latest release is not able to parse the file due to metadata changes

A particular case of this problem is tracked by #11742

@JaroslavTulach
Copy link
Member

Is there anything to be done during the metadata change to allow future changes to not break ability of old IDE to read the metadata partially?

Would removing of the metadata handling from the Rust parser side and letting the Ydoc server to do all such work help? What if Ydoc server uses proper JSON.parse, could it ignore newer fields? Do we need semantic versioning for the metadata version then? E.g. use a versioning scheme that is able to identify a major metadata format change as well as minor (adding a new field to existing structure) metadata format change?

@farmaazon
Copy link
Contributor

Would removing of the metadata handling from the Rust parser side and letting the Ydoc server to do all such work help?

It's exactly as it's done now AFAIK.

What if Ydoc server uses proper JSON.parse, could it ignore newer fields?

It actually does. The problem with #11742 was not that new fields were added, but rather some old field was removed. And older version of ydoc server. So I was able to make a fix to develop: PR's incoming.

Do we need semantic versioning for the metadata version then? E.g. use a versioning scheme that is able to identify a major metadata format change as well as minor (adding a new field to existing structure) metadata format change?

Yes, I think it is a good idea. I think we need to finally discuss all the versioning and compatibility topics after this release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📤 Backlog
Development

No branches or pull requests

4 participants