Add metadata version #11479

4e6 · 2024-11-04T11:57:02Z

Issue

Add version to the metadata section of the file. Implementation could be

Separate section

Relates to the whole metadata section. Can change the way the rest of the metadata is parsed.

#### METADATA ####
{"version":1}
[]
{}

Encoded in METADATA string

Same as the previous but different encoding.

#### METADATA v1 ####
[]
{}

Extend the metadata

Semantically relates to the last metadata line but does not require changing the parser.

#### METADATA ####
[]
{"version":1,"ide":{}}

Make the METADATA section a comment

Make the metadata section an Enso comment since we're changing the parser anyway.

#### METADATA ####
  {"version":1}
  []
  {}

The text was updated successfully, but these errors were encountered:

kazcw · 2024-11-04T15:27:19Z

External metadata

There are a few reasons to have the metadata inside the file as we do now:

Atomicity: Ensure it is in sync with the file.
Transparency: Enso developers who know the format can use it for debugging.
Portability: Moving or copying a file doesn't cause metadata loss.

I think that each of these reasons could be (or already has been) addressed without needing the metadata to be part of the file.

Moving the metadata out of the file would enable large efficiency improvements--for example, it removes the need for the format to be text-safe.

Atomicity

Atomicity is less of a concern now that the metadata format is resilient--if the metadata and the source file end up slightly out of sync (e.g. due to a sudden process exit), this should cause little or no disruption to metadata usability.

Transparency

The metadata format has never been very human-readable; we can probably address this use case better with improved tooling.

Portability

Portability can be achieved without keeping all the data in the file; we just need unique file IDs:

# ENSO file-id: 44e510 #

In the metadata database, we would look up metadata primarily by file-id. Each file-id would have one "origin" FS path; if we find a file-id at a different path, we read the data according to the claimed file-id, then we assign a new ID for the new path--this way copies would share data when initially read, but evolve independently.

This approach would allow metadata to "follow" moved or copied files as well as it does now, within a local filesystem. It wouldn't work when sending a file between computers, but the IDE cannot operate on one file in isolation anyway; users already need to import/export project, in which case we could include metadata in the project file.

Side-note: Comment type

If we want metadata to be ignored by the parser without the parser needing to recognize it specifically, a doc comment (starting with ##) is the wrong kind of comment. During translation we do some work to assemble a doc comment into an abstracted text string, and then we place it in the IR; in the future we are likely to introduce a warning for unused documentation. Plain comments are more "ignored"--the parser represents them exactly, and does nothing else with them.

JaroslavTulach · 2024-11-05T04:10:27Z

External metadata

I'd rather move forward by smaller steps. E.g. versioning and (being a) comment to begin with any other changes later.

_If we want metadata to be ignored by the parser without the parser needing to recognize it specifically, a doc comment (starting with ##) is the wrong kind of comment.

OK, so what do you suggest? Is:

#*** META-DATA 2.0 ***#
  [json1...]
  [json2...]

better?

farmaazon · 2024-11-05T09:42:30Z

My twopenny:

Transparency - currently is not readable, but we had talks about making it better, for example identifying nodes by name and put it in YAML format.
Portability - the proposed solution still breaks if someone is sending to their friend just a file, without a project. A
Also, there is a problem with the version control - the file metadata also should be versioned, and having it in a single file simplifies any action (moving, checkout, etc.).

Of course, we can solve those problems, but I personally I don't see any efficiency improvements so large it would justify the effort.

JaroslavTulach · 2024-12-03T10:19:46Z

@jdunkerley told me that there is a problem:

use the latest development IDE and save a project
send this project to someone using the last release IDE
the person will not open the project
the latest release is not able to parse the file due to metadata changes

My 2 Kč fix:

necessary turn stored metadata into a comment so that last release IDE ignores the comment and does not try to parse it as metadata as that would fail
future proof - add version to the metadata and if it is new than expected (produced by some future IDE) then don't read it at all

CCing @kazcw, @hubertp.

farmaazon · 2024-12-03T15:23:40Z

@jdunkerley told me that there is a problem:

use the latest development IDE and save a project

send this project to someone using the last release IDE

the person will not open the project

the latest release is not able to parse the file due to metadata changes

A particular case of this problem is tracked by #11742

JaroslavTulach · 2024-12-06T03:29:32Z

Is there anything to be done during the metadata change to allow future changes to not break ability of old IDE to read the metadata partially?

as Opening workflow created in 2024.5.1-rc2 in 2024.4.2 breaks layout #11742 demands?

Would removing of the metadata handling from the Rust parser side and letting the Ydoc server to do all such work help? What if Ydoc server uses proper JSON.parse, could it ignore newer fields? Do we need semantic versioning for the metadata version then? E.g. use a versioning scheme that is able to identify a major metadata format change as well as minor (adding a new field to existing structure) metadata format change?

farmaazon · 2024-12-06T10:49:30Z

Would removing of the metadata handling from the Rust parser side and letting the Ydoc server to do all such work help?

It's exactly as it's done now AFAIK.

What if Ydoc server uses proper JSON.parse, could it ignore newer fields?

It actually does. The problem with #11742 was not that new fields were added, but rather some old field was removed. And older version of ydoc server. So I was able to make a fix to develop: PR's incoming.

Do we need semantic versioning for the metadata version then? E.g. use a versioning scheme that is able to identify a major metadata format change as well as minor (adding a new field to existing structure) metadata format change?

Yes, I think it is a good idea. I think we need to finally discuss all the versioning and compatibility topics after this release.

4e6 added the -parser label Nov 4, 2024

4e6 self-assigned this Nov 4, 2024

github-project-automation bot added this to Issues Board Nov 4, 2024

github-project-automation bot moved this to ❓New in Issues Board Nov 4, 2024

JaroslavTulach added the -compiler label Nov 4, 2024

4e6 mentioned this issue Nov 4, 2024

Add compression to the metadata code snapshot #11470

Merged

3 tasks

JaroslavTulach mentioned this issue Dec 4, 2024

Opening workflow created in 2024.5.1-rc2 in 2024.4.2 breaks layout #11742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metadata version #11479

Add metadata version #11479

4e6 commented Nov 4, 2024

kazcw commented Nov 4, 2024

JaroslavTulach commented Nov 5, 2024 •

edited

Loading

External metadata

farmaazon commented Nov 5, 2024

JaroslavTulach commented Dec 3, 2024

farmaazon commented Dec 3, 2024

JaroslavTulach commented Dec 6, 2024

farmaazon commented Dec 6, 2024

Add metadata version #11479

Add metadata version #11479

Comments

4e6 commented Nov 4, 2024

Issue

Separate section

Encoded in METADATA string

Extend the metadata

Make the METADATA section a comment

kazcw commented Nov 4, 2024

External metadata

Atomicity

Transparency

Portability

Side-note: Comment type

JaroslavTulach commented Nov 5, 2024 • edited Loading

External metadata

farmaazon commented Nov 5, 2024

JaroslavTulach commented Dec 3, 2024

farmaazon commented Dec 3, 2024

JaroslavTulach commented Dec 6, 2024

farmaazon commented Dec 6, 2024

JaroslavTulach commented Nov 5, 2024 •

edited

Loading