Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance improvements, new scheduler #107

Merged
merged 16 commits into from
Nov 7, 2023
Merged

Conversation

computablee
Copy link
Owner

Which issue are you addressing?

Significant performance improvements, new work-stealing scheduler.

How have you addressed the issue?

This PR implements the WorkStealingScheduler class for parallel for loops which use a work-stealing scheduler. Much of the scheduling code has undergone serious optimization, including a 40% improvement in a particular benchmark for static scheduling with chunk_size=1. Improvements were made to collapsed loops as well, incorporating division-by-multiplication. More testing is required here.

How have you tested your patch?

Unit tests have been written where necessary, and all unit tests pass.

Copy link

codecov bot commented Nov 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e140dc3) 99.12% compared to head (ad2c447) 99.21%.
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   99.12%   99.21%   +0.09%     
==========================================
  Files          12       12              
  Lines        1137     1271     +134     
  Branches      113      132      +19     
==========================================
+ Hits         1127     1261     +134     
  Misses          5        5              
  Partials        5        5              
Files Coverage Δ
DotMP/Parallel.cs 98.97% <100.00%> (+<0.01%) ⬆️
DotMP/Schedule.cs 96.66% <100.00%> (+4.35%) ⬆️
DotMP/WorkShare.cs 99.00% <100.00%> (-0.08%) ⬇️
DotMP/Wrappers.cs 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@computablee
Copy link
Owner Author

After further performance testing of the updates to collapsed loops, it seems that performance may be worsened for average use cases. Avoiding merging this for now until more data can be collected.

@computablee
Copy link
Owner Author

Division-by-multiplication was removed in collapsed loops, and instead, manual iteration was implemented. This avoids any expensive operations like division, modulo, multiply, etc. The performance improvements from this are insane, well over 2x across the board for different approaches to loops.

Collapse(3) was also optimized, although remains untested. I would be very shocked if performance gains were anything less than 3x. Collapse(4) and Collapse(n) remain unoptimized, due to code complexity. There should be a writeup discussing the "yes"s and "no"s of the library as far as performance. Collapse(4) or higher is definitely a "no" for lightweight loops due to the extreme overhead of calculating indices.

Optimizing high-dimension collapsed loops shouldn't be too difficult if I get requests for it. Certainly a far easier approach than prior iterations of the collapsed chunk executor. Opening an issue and doing this later.

@computablee computablee merged commit c5c28a9 into main Nov 7, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant