-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance improvements, new scheduler #107
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #107 +/- ##
==========================================
+ Coverage 99.12% 99.21% +0.09%
==========================================
Files 12 12
Lines 1137 1271 +134
Branches 113 132 +19
==========================================
+ Hits 1127 1261 +134
Misses 5 5
Partials 5 5
☔ View full report in Codecov by Sentry. |
After further performance testing of the updates to collapsed loops, it seems that performance may be worsened for average use cases. Avoiding merging this for now until more data can be collected. |
Division-by-multiplication was removed in collapsed loops, and instead, manual iteration was implemented. This avoids any expensive operations like division, modulo, multiply, etc. The performance improvements from this are insane, well over 2x across the board for different approaches to loops.
Optimizing high-dimension collapsed loops shouldn't be too difficult if I get requests for it. Certainly a far easier approach than prior iterations of the collapsed chunk executor. Opening an issue and doing this later. |
Which issue are you addressing?
Significant performance improvements, new work-stealing scheduler.
How have you addressed the issue?
This PR implements the
WorkStealingScheduler
class for parallel for loops which use a work-stealing scheduler. Much of the scheduling code has undergone serious optimization, including a 40% improvement in a particular benchmark forstatic
scheduling withchunk_size=1
. Improvements were made to collapsed loops as well, incorporating division-by-multiplication. More testing is required here.How have you tested your patch?
Unit tests have been written where necessary, and all unit tests pass.