Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsort refactoring proposal #5968

Merged
merged 2 commits into from
Oct 19, 2024
Merged

tsort refactoring proposal #5968

merged 2 commits into from
Oct 19, 2024

Conversation

anastygnome
Copy link
Contributor

@anastygnome anastygnome commented Feb 11, 2024

This pull request is an attempt to rework the current tsort implementation in order to allow for better performance.
I finally had some time to give it a shot after I mentionned it in #4931 with @tertsdiepraam

It also tries to mimic gnu tsort output.

Please share any suggestions.

@sylvestre
Copy link
Contributor

I think we should keep the gnu ordering. I value compatibility over speed

@anastygnome
Copy link
Contributor Author

anastygnome commented Feb 11, 2024

I think we should keep the gnu ordering. I value compatibility over speed

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

Keeping the same order forces us to implement a solution that's 4x slower.
The order in the provided solution is still mathematically correct.

If you switch to rust coreutils and you rely on implementation related details, you're in trouble anyway
So I don't believe it's a compatibility issue in and of itself

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Feb 11, 2024

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?

Also, I'd love to see some benchmarks on this using hyperfine. Counting memory copies is a nice approximation, but actual measurements are even more helpful :)

Edit: I've done a measurement:

Benchmark 1: ./coreutils-main tsort in.txt
  Time (mean ± σ):      2.428 s ±  0.100 s    [User: 2.355 s, System: 0.066 s]
  Range (min … max):    2.334 s …  2.684 s    10 runs
 
Benchmark 2: ./coreutils-new tsort in.txt
  Time (mean ± σ):      2.658 s ±  0.092 s    [User: 2.589 s, System: 0.063 s]
  Range (min … max):    2.562 s …  2.824 s    10 runs
 
Benchmark 3: tsort in.txt
  Time (mean ± σ):     517.4 ms ±   5.6 ms    [User: 501.2 ms, System: 12.6 ms]
  Range (min … max):   511.5 ms … 528.9 ms    10 runs
 
Summary
  tsort in.txt ran
    4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
    5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt

Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.

@anastygnome
Copy link
Contributor Author

anastygnome commented Feb 11, 2024

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?

well, the test case with the dependencies is such an example. My implementation sorts the nodes alphabetically before running tsort.

 Benchmark 1: ./coreutils-main tsort in.txt
   Time (mean ± σ):      2.428 s ±  0.100 s    [User: 2.355 s, System: 0.066 s]
 Range (min … max):    2.334 s …  2.684 s    10 runs 
 Benchmark 2: ./coreutils-new tsort in.txt
  Time (mean ± σ):      2.658 s ±  0.092 s    [User: 2.589 s, System: 0.063 s]
   Range (min … max):    2.562 s …  2.824 s    10 runs

 Benchmark 3: tsort in.txt
   Time (mean ± σ):     517.4 ms ±   5.6 ms    [User: 501.2 ms, System: 12.6 ms]
   Range (min … max):   511.5 ms … 528.9 ms    10 runs
  
 Summary
   tsort in.txt ran
     4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
     5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt

Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.

this is still a WIP. I'm trying to get better performance indeed.
Can you share the in.txt file so that we have a common bench?

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Feb 11, 2024

Sure, I've generated the input with this:

import random

N = 10000
for i in range(100*N):
    a = random.randint(0,N)
    b = random.randint(0,N)
    if a != b:
        print(f"{min(a, b)} {max(a, b)}")

@anastygnome
Copy link
Contributor Author

@tertsdiepraam
I hope you're both doing well. I've been quite busy lately but have managed to make significant progress on the refactor I've been working on.

The code still passes all tests and shows a substantial performance improvement, being up to 3 times faster than the current implementation in almost all cases I have tested. I didn't have the time to document it sadly, put simply, I've used another algorithm that leverages the tree nature of the graph in a better way, ensuring less copies and sorting.

I believe we can potentially achieve even greater speed improvements by switching from a BTree to a Red-Black tree.

I am currently unable to continue this PR, so I hope someone can reuse my work :)

@anastygnome anastygnome force-pushed the tsort branch 7 times, most recently from 8c9306a to d8b784b Compare June 23, 2024 09:53
@tertsdiepraam
Copy link
Member

Remind me on Tuesday! I'd love to take a look then

@anastygnome anastygnome force-pushed the tsort branch 4 times, most recently from 2bcf91b to 76abc9d Compare June 24, 2024 20:40
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@anastygnome
Copy link
Contributor Author

anastygnome commented Jun 25, 2024

Remind me on Tuesday! I'd love to take a look then
@tertsdiepraam
There you go.

For reference, the proposed code in the first commit is approximately 1.5 times slower than GNU tsort.
(still a 2x improvement)

Benchmark 1: target/release/coreutils tsort terts_10000_seed0
  Time (mean ± σ):     763.7 ms ±  12.9 ms    [User: 736.1 ms, System: 22.8 ms]
  Range (min … max):   742.4 ms … 781.6 ms    10 runs
 
Benchmark 2: tsort terts_10000_seed0
  Time (mean ± σ):     509.4 ms ±   4.2 ms    [User: 485.3 ms, System: 21.1 ms]
  Range (min … max):   503.0 ms … 514.9 ms    10 runs
 
Summary
  tsort terts_10000_seed0 ran
    1.50 ± 0.03 times faster than target/release/coreutils tsort terts_10000_seed0

The last commit is a work in progress and serves as a reference to illustrate that the performance cost is mainly driven by the data structure used for nodes. Switching to a hashmap and some manual sort results in a topological order very close to GNU tsort (though it still fails in some cases).

Benchmark 1: target/release/coreutils tsort terts_10000_seed0
  Time (mean ± σ):     222.5 ms ±   4.5 ms    [User: 202.3 ms, System: 19.0 ms]
  Range (min … max):   215.8 ms … 229.7 ms    13 runs
 
Benchmark 2: tsort terts_10000_seed0
  Time (mean ± σ):     511.4 ms ±   6.5 ms    [User: 490.1 ms, System: 19.3 ms]
  Range (min … max):   500.5 ms … 519.5 ms    10 runs
 
Summary
  target/release/coreutils tsort terts_10000_seed0 ran
    2.30 ± 0.05 times faster than tsort terts_10000_seed0

The proposed implementation uses Knuth's Algorithm T (TAOCP vol. 1) for topological sorting.

This algorithm works by first listing all nodes and counting how many nodes each one depends on (its predecessors). nodes with no predecessors are added to a queue. We then take the node at the front of the queue, remove it from the graph and append it to the result. Any nodes depending on it have their predecessor count reduced.

If any of those nodes now have no predecessors, they are added at the back of the queue. This continues until the whole queue is processed. If any nodes remain with predecessors at the end, it means there's a cycle, which we find with a DFS on the remaining subgraph.

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think you actually settled on something very close to the current version on main, but with a much better implementation. Most of my comments are easy to fix I think. The compat is interesting. I bet we can fix this to be more compatible, but we'd have to investigate more.

src/uu/tsort/src/tsort.rs Outdated Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Outdated Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Outdated Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Show resolved Hide resolved
src/uu/tsort/src/tsort.rs Outdated Show resolved Hide resolved
@anastygnome anastygnome marked this pull request as ready for review June 28, 2024 17:42
@anastygnome anastygnome force-pushed the tsort branch 3 times, most recently from 516ef13 to 42f102e Compare June 29, 2024 06:56
@anastygnome
Copy link
Contributor Author

@tertsdiepraam ding dong!

@tertsdiepraam
Copy link
Member

Sorry I don't have access to a computer this week. Ill try to look at it soonish

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@anastygnome anastygnome force-pushed the tsort branch 2 times, most recently from ffbdf45 to f2f15d7 Compare September 24, 2024 18:28
@anastygnome
Copy link
Contributor Author

anastygnome commented Sep 24, 2024

@tertsdiepraam Hey there, I've corrected typos & clarified the comments. I'm also working on testing replacing strings with IDs, but that will be for another PR.

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@anastygnome
Copy link
Contributor Author

@sylvestre Hey there, I would also like your input on this one, given the benchmarks,
I fear the output of tsort are very structure dependent and it would require a rewrite of the tree as is.
I believe @tertsdiepraam is correct and it should be merged as is.

There is still work to be done on tsort, but I believe this one makes it at least good enough for now.

Copy link

github-actions bot commented Oct 5, 2024

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Contributor

If Terts is happy, I am happy. Could you please add your benchmark in a BENCHMARKING.md file?
See other for example

This commit is a complete rewrite of the tsort util, resulting in a x10 performance improvement
BREAKING CHANGE: the order is no longer the same as gnu tsort in some fringe cases.
@anastygnome anastygnome force-pushed the tsort branch 2 times, most recently from 3adbf08 to a7633ec Compare October 19, 2024 13:07
@anastygnome
Copy link
Contributor Author

Hello @sylvestre, I have added the benchmarking routine in the md file.

@sylvestre sylvestre merged commit 9a6f552 into uutils:main Oct 19, 2024
68 checks passed
@sylvestre
Copy link
Contributor

Thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants