Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsort: Switch to BTreeMap and BTreeSet #4931

Merged
merged 2 commits into from
Jun 5, 2023
Merged

Conversation

cazou
Copy link

@cazou cazou commented Jun 2, 2023

Using HashMap and HashSet give a valid topological sort, but the output will change randomly at each run.

BTree based structures will guarantee that the output is always ordered in the same way.

This also makes the ouptut similar to the output of the C version of the tools, on which some applications rely.

An example of application that rely on this is the initramfs-update program on Debian that uses tsort to create the order of hooks execution. The floating nodes still need to be executed alphabetically to avoid having the busybox version of some tools in initramfs instead of the full versions. See this issue on Apertis.

Although the issue could be fixed in initramfs-update, other tools/programs could rely on this ordered output and a stable output for a given input makes sense.

@sylvestre
Copy link
Contributor

oh, super interesting.
could you please add a test to make sure we don't regress in the future ? thanks

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Jun 3, 2023

While I think we should merge this because correctness is more important than speed, I was curious what the performance difference would be and this seems to make tsort twice as slow[1] (this makes us 4 times slower than GNU instead of 2 times). I bet there would be some people interested in figuring out how to speed this up if we open an issue for it.

[1]: On a test case generated with this Python script:

import random

N = 10000
for i in range(100*N):
    a = random.randint(0,N)
    b = random.randint(0,N)
    if a != b:
        print(f"{min(a, b)} {max(a, b)}")

@tertsdiepraam tertsdiepraam changed the title tsort: Switch to BTreeHash and BTreeSet tsort: Switch to BTreeMap and BTreeSet Jun 3, 2023
Copy link
Contributor

@sylvestre sylvestre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a test

@anastygnome
Copy link
Contributor

anastygnome commented Jun 5, 2023

Although this PR fixes the correctness issue, wouldn't it be better to use a graph library such as petgraph that supports topological sort ? (MIT-licensed)

Otherwise, isn't there a linked hash set crate or datastructure in rust ? This way, the ordering would be constant.
Maybe we would get better performance than a tree structure.

@tertsdiepraam
Copy link
Member

Maybe, but since this is a very small PR, we can accept this first and optimize later.

cazou added 2 commits June 5, 2023 11:06
Using HashMap and HashSet give a valid topological sort, but the output
will change randomly at each run.

BTree based structures will guarantee that the output is always ordered
in the same way.

This also makes the ouptut similar to the output of the C version of the
tools, on which some applications rely.
@cazou
Copy link
Author

cazou commented Jun 5, 2023

Any idea why inotify-dir-recreate would fail ?

@sylvestre
Copy link
Contributor

unrelated
we have an intermittent issue on this one :(

@uutils uutils deleted a comment from github-actions bot Jun 5, 2023
@sylvestre sylvestre merged commit 2804af2 into uutils:main Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants