Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sorting nested results by system time #2361

Closed
wants to merge 7 commits into from
Closed

fix: sorting nested results by system time #2361

wants to merge 7 commits into from

Conversation

nguyenvukhang
Copy link
Contributor

Bug is aptly described in #2243.

This PR aims to address this by delaying result sorting to after every match has been found.

@nguyenvukhang nguyenvukhang marked this pull request as ready for review November 28, 2022 07:11
@nguyenvukhang nguyenvukhang marked this pull request as draft November 28, 2022 07:11
@nguyenvukhang nguyenvukhang changed the title fix: nested results sorted wrongly fix: sorting nested results by system time Nov 28, 2022
Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work but yeah, there is a fair bit of work left here. The two big items:

  1. It is a hard requirement that if --sort is not given, then there should be no additional overhead here. But in this PR, it looks like you always collect all of the paths before starting a search. Today, ripgrep will start searching well before it finishes directory traversal.
  2. There are a number of places where errors are being swallowed. We can't do that.

I'm also not sure about moving sorting to the main functions. I think it's probably fine, although I was trying to keep the main functions as simple as I could. But if we are going to deal with sorting in main.rs, then I think it would be nice to add a short comment in the parallel searcher stating why it doesn't handle sorting. (Because if sorting is enabled, then parallel searching is disabled in args.rs.)

crates/core/args.rs Outdated Show resolved Hide resolved
crates/core/main.rs Show resolved Hide resolved
crates/core/main.rs Show resolved Hide resolved
@nguyenvukhang
Copy link
Contributor Author

  1. It is a hard requirement that if --sort is not given, then there should be no additional overhead here. But in this PR, it looks like you always collect all of the paths before starting a search. Today, ripgrep will start searching well before it finishes directory traversal.

Yep. Fixed it. Top-level search() and files() functions will now specifically check if a sort that requires calls to stat is required.

The latest implementation using new functions continue_search() and continue_files() is a result of trying to coerce the types of the two possible states of preliminary search results:

  1. uncollected and unconsumed iterator. (no sort)
  2. collected, sorted, and re-made into an iterator.

I'd prefer to stick to having just one function for search and files each, though I'm not sure if that's possible.

  1. There are a number of places where errors are being swallowed. We can't do that.

Agreed. I noticed that the previously reviewed code swallowed the Result<Walk> that args.walker()? emitted, and that has been fixed.

#2361 (comment) has also been addressed.

@BurntSushi BurntSushi added the rollup A PR that has been merged with many others in a rollup. label Jul 9, 2023
BurntSushi pushed a commit that referenced this pull request Jul 9, 2023
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
BurntSushi pushed a commit that referenced this pull request Jul 9, 2023
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
@BurntSushi
Copy link
Owner

You're all set. I already did the fixups needed to get this PR into shape. You can see it in #2555.

BurntSushi pushed a commit that referenced this pull request Jul 9, 2023
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
BurntSushi pushed a commit that referenced this pull request Jul 9, 2023
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
@nguyenvukhang
Copy link
Contributor Author

I see. Thanks for patching up my tests and adding delays so the stat calls are correct.

@BurntSushi BurntSushi closed this in 6abb962 Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rollup A PR that has been merged with many others in a rollup.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants