Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the REUSE.toml performance regression (+ do a whole lot of refactoring) #1047

Merged
merged 10 commits into from
Sep 26, 2024

Conversation

carmenbianca
Copy link
Member

@carmenbianca carmenbianca commented Jul 8, 2024

Fixes #1033

Relevant to fsfe/reuse-website#85

This is my attempt at fixing #1033. It necessitates a whole lot more shuffling things around than I had anticipated.

  • Added a change log entry in changelog.d/<directory>/.
  • Added self to copyright blurb of touched files.
  • Wrote tests.
  • My changes do not contradict
    the current specification.
  • I agree to license my contribution under the licenses indicated in the
    changed files.

@carmenbianca carmenbianca marked this pull request as draft July 8, 2024 15:28
@carmenbianca
Copy link
Member Author

carmenbianca commented Jul 8, 2024

I want to stop passing (root, include_submodules, include_meson_subprojects, vcs_strategy) everywhere. I'm going to see if I can turn that information into an interface of Project, so that I can just pass the object to global_licensing.

edit: I was not successful.

@carmenbianca
Copy link
Member Author

There is a light contradiction with the spec: REUSE.tomls ignored by VCS are ignored by the tool.

Hooray! This had always been bad design.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
This is kind of annoying, but it is necessary for some work I want to do
in global_licensing. Fortunately this was surprisingly easy; iter_files
is the first function ever written for REUSE, so I expected that it
would be more tangled up.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
Signed-off-by: Carmen Bianca BAKKER <[email protected]>
Previously, the glob **/REUSE.toml would search _all_ directories,
including big directories containing build artefacts that are otherwise
ignored by VCS. This commit uses the same logic to find REUSE.toml as
any other file is found. It's not super rapid, but it does a heap more
filtering.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
Signed-off-by: Carmen Bianca BAKKER <[email protected]>
In _main, find_global_licensing was called to find the file that
contained some parsing error. This may have contained false information
in the case of multiple REUSE.tomls.

Instead of needlessly calling this function, the errant file is now an
attribute on the exception.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
Previously this function would be called three times on a lint:

- once by NestedReuseTOML.find_reuse_tomls in Project.find_global_licensing
- once by NestedReuseTOML.find_reuse_tomls in  NestedReuseTOML.from_file
- once to lint all the files

I now no longer use NestedReuseTOML.from_file. It's still not ideal to
go over all files twice, but it's better than thrice.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
@carmenbianca carmenbianca force-pushed the performance-toml branch 2 times, most recently from fef5f20 to dccf077 Compare September 18, 2024 13:15
Signed-off-by: Carmen Bianca BAKKER <[email protected]>
This cleanup is the result of a rebase. It would be more effort to
incorporate these fixes into their original commits.

Signed-off-by: Carmen Bianca BAKKER <[email protected]>
@carmenbianca carmenbianca merged commit 40cb533 into fsfe:main Sep 26, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse 4 is slow
1 participant