Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow source modifications with minimal changes #207

Closed
fkirc opened this issue Oct 28, 2020 · 10 comments · Fixed by #252
Closed

Allow source modifications with minimal changes #207

fkirc opened this issue Oct 28, 2020 · 10 comments · Fixed by #252
Labels
enhancement New feature or request

Comments

@fkirc
Copy link

fkirc commented Oct 28, 2020

First of all, thank you for writing this yaml package, it works great 👍
The yaml package has become a critical part of attranslate: https://github.com/fkirc/attranslate

attranslate has the requirement of "whitespace and comment preservation".
This means that attranslate compares two different YML-trees, modifies values of scalar nodes, and then serialize a file again with as little changes as possible.

I was able to implement this behavior with the following recursive tree-walk: https://github.com/fkirc/attranslate/blob/master/src/file-formats/yaml/yaml-manipulation.ts

However, there are still some problems.
The preservation of comments and single-linebreaks works correctly, but the preservation of multi-linebreaks doesn't work.

Could you point me in a direction to preserve multi-linebreaks?

@fkirc fkirc mentioned this issue Oct 28, 2020
14 tasks
@eemeli
Copy link
Owner

eemeli commented Oct 29, 2020

Supporting that at the document level would require the data on multiple linebreaks to be retained somewhere, which it currently isn't. You may also find that preserving variable indent levels to require data that isn't currently retained.

The API tries to balance things between power and simplicity, and that does mean that some compromises are made; one such compromise is the expressibility of line breaks preceding a node, particularly in association with comment lines. If you would like for this API to change and thereby be able to retain additional information, I think you ought to propose what the alternative API ought to look like, and how it would work?

@fkirc
Copy link
Author

fkirc commented Oct 29, 2020

I understand that such features should not lead to an excessive complication of parser-internals, therefore I will try to keep it simple.

Right now, I am doing the following to preserve comments and single-linebreaks (this code-snippet is taken from https://github.com/fkirc/attranslate/blob/master/src/file-formats/yaml/yaml-manipulation.ts):

  if (context.oldTargetNode?.comment) {
    node.comment = context.oldTargetNode.comment;
  }
  if (context.oldTargetNode?.commentBefore) {
    node.commentBefore = context.oldTargetNode.commentBefore;
  }
  if (context.oldTargetNode?.spaceBefore) {
    node.spaceBefore = context.oldTargetNode.spaceBefore;
  }

As we can see, this is a selective transfer of properties from one Node to another Node (during a recursive double-tree-walk).

Because of this working snippet, I am thinking about augmenting Node with additional properties:

  • Preceding whitespace lines and comments (could be a string-array)
  • Variable indent levels, as long as it doesn't break YML-syntax

@fkirc
Copy link
Author

fkirc commented Oct 29, 2020

One more sidenote:

I also experimented with transferring CST-nodes from one Node to another Node:

  if (context.oldTargetNode?.cstNode) {
     node.cstNode = context.oldTargetNode.cstNode;
  }

However, this didn't have any observable effects during my experiments.
Nevertheless, I would be willing to re-assign CST-nodes if it can help to preserve multi-linebreaks.

@eemeli
Copy link
Owner

eemeli commented Oct 30, 2020

The CST nodes are not used by any of the library internals once the AST structure has been parsed; they're primarily meant for users that need to access the actual source of where a particular node came from. Therefore modifying them won't affect the output in any way.

When proposing your new API for retaining all data on whitespace & comments, please keep in mind that a completely empty line and an empty comment line are different.

@fkirc
Copy link
Author

fkirc commented Oct 30, 2020

The CST nodes are not used by any of the library internals once the AST structure has been parsed; they're primarily meant for users that need to access the actual source of where a particular node came from. Therefore modifying them won't affect the output in any way.

Thanks for the explanation, so it seems that I have reached the maximum preservation with the current documents-API.

When proposing your new API for retaining all data on whitespace & comments, please keep in mind that a completely empty line and an empty comment line are different.

I would try to keep the API as lean as possible.
Perhaps it would be sufficient to have a string-array previousLines[] for each document-node, and those previous lines could be either empty lines, blank lines, empty comments or non-empty comments.

@eemeli
Copy link
Owner

eemeli commented Feb 21, 2021

Given that I've not heard from any other users a desire for a more powerful empty-line API, I'm rather tempted to close this as wontfix.

@fkirc
Copy link
Author

fkirc commented Feb 21, 2021

Whitespace preservation is probably not the highest priority right now, but I am sure that it would be a nice feature for future updates.
It would be beneficial for all users that aim to modify an existing YAML-file with as little changes as possible.

@pjeby
Copy link

pjeby commented Mar 20, 2021

Minimal modification is an important feature for working with automated changes to markdown front matter. I recently wrote a plugin for Obsidian.md that needs to modify strings and arrays in front matter, but even when only changing the value of strings, yaml@2 removes blank lines, removes most whitespace before comments, and 2.0.0-4 sometimes moves what were once line-end comments to their own lines as well. These kinds of changes are problematic since it's affecting the human-readable parts of the document.

For example consider this trivial bit of YAML:

thing1: something          # this is where you put thing1
thing2: something more     # thing2 has more though
thing3: whatever

Round-tripping this document, even if you only change thing3 or add thing4, will strip out all the whitespace that aligns the comments on the thing1 and thing2 lines. And if you add in blank lines to separate sections, they will be reduced to at most one blank line. And so on.

There is very much a use case for minimal edits to existing YAML: any tool for making automated changes to human-generated, human-edited YAML needs it in order not to globally destroy a user's intentional formatting of the original document.

That being said, this package's 2.x does a better job than most (though 2.0.0-4 seems a slight regression compared to 2.0.0-3, with respect to comment positioning).

Also, I imagine it's possible that there could be some solution I've overlooked to perform more minimal edits by using positional information from the parsed document? (But I haven't looked at this in any detail since the 2.0.0-4 refactoring.)

@eemeli
Copy link
Owner

eemeli commented Mar 20, 2021

Operating on the CST rather than the AST is probably the only way to guarantee minimal changes to the source file. I agree that the comment association is sub-optimal, but even if it were perfect, issues would still remain with the simplifications that are made in the AST composition (e.g. losing multi-space indents on comments).

To that end, the round-tripping use case is entirely valid, and should have a decent API at least for applying small changes to a YAML source document.

@eemeli eemeli changed the title Whitespace Preservation - Usage for attranslate Allow source modifications with minimal changes Mar 20, 2021
@eemeli eemeli added the enhancement New feature or request label Mar 20, 2021
@eemeli eemeli mentioned this issue Apr 3, 2021
2 tasks
@eemeli
Copy link
Owner

eemeli commented Apr 3, 2021

PR #252 adds a bunch of tools for working with the CST, which will hopefully answer at least some of the desires expressed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants