Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for openblas + openmp threading layer #114

Merged
merged 16 commits into from
Jan 28, 2022

Conversation

jeremiedbb
Copy link
Collaborator

No description provided.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 5, 2022

@ogrisel
Copy link
Contributor

ogrisel commented Jan 12, 2022

I think we need a dedicated API to protect about the sequential_blas_under_openmp oversubscription case that we need in scikit-learn.

  • For an OpenBLAS BLAS implementation with an OpenMP threading layer and if there is a single OpenMP runtime loaded, assume that OpenMP will already handle the case natively and do nothing;
  • For an OpenBLAS implementation with a threading layer that is either native or OpenMP but there are several active OpenMP runtimes linked to the Python process, set limit=1 on this OpenBLAS;
  • For MKL and BLIS with OpenMP threading layers, we need to check.

This should be a new context manager, both for the functional and the pre-loaded object oriented API.

WDYT @jeremiedbb?

@jeremiedbb jeremiedbb changed the title CI job with openblas + openmp threading layer Workaround for openblas + openmp threading layer Jan 20, 2022
@jeremiedbb
Copy link
Collaborator Author

So at first I tried this new context manager idea but I was very unsatisfied with it. I think I came up with a more satisfying solution. When we set the limit to blas only, we check if the internal state of openmp has changed and if so we set it back to the original state (this does not modify the state of the blas, I checked).

I thinks it's better to just have this kind of workaround for a behavior that is unexpected at least and should even be considered a bug imo, than adding a new whole public object just to handle that. What do you think ?

@jeremiedbb jeremiedbb changed the title Workaround for openblas + openmp threading layer [WIP] Workaround for openblas + openmp threading layer Jan 20, 2022
@jeremiedbb
Copy link
Collaborator Author

Hum, actually setting back openmp to its original state is bad when we don't want to set the limits for the nested openmp openblas case... It will run as if no limit was set even if openblas_get_num_threads returns 1.

@jeremiedbb
Copy link
Collaborator Author

Ok so I think I came up with an acceptable alternative: a new helper function that gives the appropriate params to pass to threadpool_limits only for the use case of nested blas inside an openmp parallel section. There's no need to introduce a new context manager.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 24, 2022

Nice!

Maybe this could also be somehow exposed in the .limit function? For instance

with threadpoolctl.threadpool_limits(limits="sequential_blas_under_openmp"):
    # code protected against over-subscription

WDYT @jeremiedbb?

If you agree, please update the README to document that high level policy.

@jeremiedbb
Copy link
Collaborator Author

Maybe this could also be somehow exposed in the .limit function?

I like this Idea. Actually I think it should be the only public way to do that and I would make the helper private. Ok for you ?

@ogrisel
Copy link
Contributor

ogrisel commented Jan 28, 2022

I agree. Let's make the getter private. The changelog will need to be updated accordingly to only mention the new limit option.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ogrisel
Copy link
Contributor

ogrisel commented Jan 28, 2022

This PR is missing a changelog entry though.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the construction of the first sentence in the changelog entry a bit weird. See below:

CHANGES.md Outdated Show resolved Hide resolved
@ogrisel
Copy link
Contributor

ogrisel commented Jan 28, 2022

Still [WIP]? If not, +1 for merging and maybe Friday evening release ;) Or maybe not releasing today.

CHANGES.md Outdated
@@ -4,6 +4,13 @@
- Fixed a detection issue of the BLAS libraires packaged by conda-forge on Windows.
https://github.com/joblib/threadpoolctl/pull/112

- `threadpool_info` and `ThreadpoolController.limit` now accept the string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `threadpool_info` and `ThreadpoolController.limit` now accept the string
- `threadpool_limit` and `ThreadpoolController.limit` now accept the string

@jeremiedbb jeremiedbb changed the title [WIP] Workaround for openblas + openmp threading layer Workaround for openblas + openmp threading layer Jan 28, 2022
@jeremiedbb
Copy link
Collaborator Author

maybe Friday evening release

I can't think of a better time to do this. But unfortunately I don't have time and I'm still not crazy enough to make a half release a friday evening and finish it during the week-end :D

@jeremiedbb
Copy link
Collaborator Author

Ok all green. Let's merge

@jeremiedbb jeremiedbb merged commit 9dad513 into joblib:master Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants