-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize performance when Git is used as storage repository #1121
Conversation
Avoid calling `git log` with a wildcard `*` pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the benchmarks, this looks amazing!
Thanks a lot @michielbdejong for the experiments and suggestions, I'm thrilled to see this land and am looking forward to the impact this will have on tracking performance across collections!
return this.recordVersion(terms, extractOnly); | ||
await this.recordVersion(terms, extractOnly); | ||
|
||
terms.sourceDocuments.forEach(sourceDocument => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just do
terms.sourceDocuments = []
orterms.sourceDocuments = null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because doing so would result in losing information needed for the next run, such as location
, contentSelectors
, …
Co-authored-by: Matti Schneider <[email protected]>
This PR implements several optimizations to improve performance and memory usage:
Technical improvements:
git log
commands with specific pathscommit-graph
for faster commit history traversal (fixes Consider taking advantage of git commit-graph #1101)Performance benchmarks:
Testing was performed on two distinct datasets:
ToS;DR Collection
P2B Compliance Collection
The optimizations show greater impact on collections with larger numbers of declared services and terms.
Fixes #1101. Special thanks to @michielbdejong for reporting the issue, investigating, and providing valuable suggestions 🙏