Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use toml_edit's to_string_in_original_order() #312

Merged

Conversation

alsuren
Copy link
Contributor

@alsuren alsuren commented Jun 15, 2019

uses toml_edit from this PR: toml-rs/toml#63

solves #218 pretty convincingly

previously it was:

1457 (2%) also did something else (mostly moving sections of the file around, but ~10 also made whitespace changes, like removing spaces from section headers)

Now it is:

13 (0.02%) also did something else (mostly removing whitespace from section headers but a handful also fixed unix/windows line endings that I failed to clean up properly before starting).

@alsuren alsuren mentioned this pull request Jun 15, 2019
11 tasks
@alsuren alsuren changed the title PoC to use toml_edit's to_string_in_original_order() WIP: PoC to use toml_edit's to_string_in_original_order() Jun 15, 2019
bors bot added a commit to toml-rs/toml that referenced this pull request Jul 1, 2019
63: Avoid reordering tables r=ordian a=alsuren

Adds Document::to_string_in_original_order()

This is intended to help with killercup/cargo-edit#218
    
We save .position = Some(i) on each Table, starting with i = 0 for
the root of the Document, and then incrementing each time we see a
table header.

When serialising, ~we start looking for a Table with position 0,
which we should find at the root of the document. We then continue
scanning, looking for position 1 and so on. If the document doesn't
need re-ordering then we will on need to do a single scan.~ we start by 
doing a walk over the tree and working out which positions exist
(storing them in a Vec and then sorting them). We then do subsequent
walks to find each of the known positions in turn. Each walk of the tree
is guaranteed to yield at least one table, but will probably yield more.
If the document doesn't need re-ordering then we will on need to do a single scan.

If a Table has been added using the API rather than parsed then it
will have position = None. We only want to serialise each Table
once, so we only print tables with position = None if the table
we visited immediately before it had the right position.

If we didn't manage to serialise all Tables after a single loop
then we keep looping until we are done. ~There is code to make sure
we don't get stuck when someone causes a gap in the numbers by
deleting a Table.~ It is impossible for the loop to get stuck, because
it only searches for position values that it already knows are in the tree.

Known issues:
- [x] ~If you have created your Document by parsing two .toml files and
  merging the results together, this method may silently skip some
  tables. Please use Document.to_string() instead, which doesn't have
  this problem.~ fixed
- [ ] The best case performance of this function is a bit worse than
  Document.to_string(). It will be significantly worse if you have lots of
  tables that are in strange orders.


Also adds `pretty_assertions` in some tests and color-backtrace (as separate commits) because I found debugging test failures more tricky without them.

Things to do before merging:
- [x] make a version of cargo-edit which uses this patch, and make sure that it actually works on my test samples at https://github.com/alsuren/rust-repos/tree/cargo.toml-samples
  - see killercup/cargo-edit#312
  - went from 1457 funky files down to 13 (and I think that some of them are because I didn't clean up the unix/windows line endings correctly)
- [x] Write a few more integration tests, describing interesting edge cases from the repo. For example:
  - [x] An example with trailing comments after the end of the document.
  - [x] An example where `[package.metadata.deb]` is at the bottom of the file and it stays there.
    * The table_reordering test already does this.
  - [x] A test with the following document, where you add `[dependencies.somethingelse]` and it is added after `[dependencies.opencl]` rather than before `[[example]]`.
```
[package]
[dependencies]
[[example]]
[dependencies.opencl]
[dev-dependencies]
```
- [x] Is the API okay? Display.to_string() seems to swallow the error case (possibly by panicking). Should we do the same? (answer: yes)
- [x] (edit: added a commit to use this new algorithm) Is it enough to document the issue where it can silently skip tables that are parsed from multiple files, or should I check for it? If I want to fix it properly then the best way to do it is to:
  * walk the tree once, gathering all of the known `position` numbers unto a vec
  * Sort the positions vec
  * Walk the tree and pop off items from the positions vec as I go (rather than incrementing an integer)
  * Stop looping when the vec is empty.
  * This is probably simpler than my current implementation, but forces me to allocate another vec, and then do at least two passes of the tree, even if the tables are in the correct order already.
- [x] update the limitations section of the readme to explain that you should use Document::to_string_in_original_order() if you care about not forcing nested tables to be grouped next to their parents.
  - [ ] Once this PR is merged, make another PR to point `https://github.com/ordian/toml_edit/blob/f09bd5d075fdb7d2ef8d9bb3270a34506c276753/tests/test_valid.rs#L84` at the updated test.

Co-authored-by: David Laban <[email protected]>
Cargo.toml Outdated Show resolved Hide resolved
@alsuren alsuren force-pushed the toml_edit-with-to_string_in_original_order branch from c091a9f to 0b89f82 Compare July 19, 2019 21:17
@alsuren alsuren force-pushed the toml_edit-with-to_string_in_original_order branch from 0b89f82 to e25f246 Compare July 19, 2019 21:22
@alsuren alsuren changed the title WIP: PoC to use toml_edit's to_string_in_original_order() Use toml_edit's to_string_in_original_order() Jul 19, 2019
@alsuren
Copy link
Contributor Author

alsuren commented Jul 20, 2019

I have rebased on top of master, and double-checked that it transforms my rust-repos dataset in the same way as the original PoC (everything is identical). I think it's ready for merging as-is:

cargo add and friends probably want to default to using to_string_in_original_order() (if a tool mangles your file on your first attempt to use it then you're more likely to drop it on the floor than go hunting in the docs for a --dont-mangle-my-shit flag).

I thought about adding a --group-nested-toml-sections flag to restore the old behaviour, but I think that this would add more complexity than it's worth. Anyone who relies on this behaviour should probably be using a dedicated tidying tool like toml-fmt instead.

@ordian
Copy link
Collaborator

ordian commented Jul 20, 2019

Awesome, thank you!

bors r+

bors bot added a commit that referenced this pull request Jul 20, 2019
312: Use toml_edit's to_string_in_original_order() r=ordian a=alsuren

uses toml_edit from this PR: toml-rs/toml#63

solves #218 pretty convincingly 

previously it was:
> 1457 (2%) also did something else (mostly moving sections of the file around, but ~10 also made whitespace changes, like removing spaces from section headers)

Now it is:
> 13 (0.02%) also did something else (mostly removing whitespace from section headers but a handful also fixed unix/windows line endings that I failed to clean up properly before starting).

Co-authored-by: David Laban <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jul 20, 2019

@bors bors bot merged commit e25f246 into killercup:master Jul 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants