Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support registering user-defined LibController #137

Closed
alugowski opened this issue Jun 29, 2023 · 11 comments · Fixed by #138
Closed

Support registering user-defined LibController #137

alugowski opened this issue Jun 29, 2023 · 11 comments · Fixed by #138

Comments

@alugowski
Copy link

Very useful library.

I've gotten a request for my own library to be able to be controlled via threadpoolctl, and I agree that would be a positive thing to support, but it's currently impossible. While threadpoolctl is well designed and it appears to be easy to support new parallel libraries, the list of LibControllers is static.

Would it be possible to add a feature where libraries could add support to register themselves with threadpoolctl?

For example, something like

try:
    import threadpoolctl
    threadpoolctl.register(MyLibController)
except ImportError:
    pass

I imagine it's a big burden for the threadpoolctl authors to support all popular libraries out there, but with such a feature those library authors (or their users) could write such hooks ourselves.

There are several open requests for Apple Accelerate, arrow, TensorFlow, and I would be interested in taskflow or Eigen's threadpool, too. Additionally there are many smaller libraries, or ones using Python threads, that could easily be supported by bespoke LibControllers with no effort on your part.

@tomMoral
Copy link
Collaborator

I think this would be interesting to have this feature, as this would make it simpler to support various libs compared to having a centralized implems here.

If anyone wanna give it a try, feel free to propose a PR.

@jeremiedbb
Copy link
Collaborator

I also find this approach interesting for libraries that we're not likely to support directly. I'm checking if it would be easy to implement.

@ogrisel
Copy link
Contributor

ogrisel commented Jul 11, 2023

@alugowski #138 was merged with a fix. Please let us know if that good enough for your use case.

@alugowski
Copy link
Author

alugowski commented Jul 11, 2023

@ogrisel It should work, but my controller methods do not get called at all.

I installed via pip install --upgrade git+https://github.com/joblib/threadpoolctl

My controller code: https://github.com/alugowski/fast_matrix_market/blob/954331f9d65e4ce8b56dc8009cd09cc70e93f08e/python/src/fast_matrix_market/__init__.py#L49

I copy/pasted the example then modified it to fit my usecase, which is to just modify a global variable that I already had.

and I call it like this:

>>> from fast_matrix_market import mmread
registered with threadpoolctl
>>>
>>> from threadpoolctl import threadpool_limits
>>>
>>> with threadpool_limits(limits=1, user_api='fast_matrix_market'):
...    mmread("asdf.mtx")
...
parallelism: 0

Note that none of the controller print statements have printed. Not the version getter, and neither set or get num threads.

@ogrisel
Copy link
Contributor

ogrisel commented Jul 12, 2023

It's important to specify the known filename prefixe(s) if the dynamic library (e.g. the beginning of the filename that ends in .so on linux, .dylib on masos, or .dll on windows) that holds the symbols.

@jeremiedbb
Copy link
Collaborator

Looking at the list of loaded libraries when I import fast_matrix_market, I guess the name of the dso is _core.xxx.so, meaning the prefixes should be ("_core",). _core might be a common dso name, so this an argument to implement additional symbol checking. You could also consider giving it a more informative name like libfmm or lib_fast_matrix_market, to make runtime introspection of the linked libraries easier 😉

@alugowski
Copy link
Author

Ok, so adding the library name to filename_prefixes does make it work.

But why is it necessary? My controller does not call any methods from the library, nor does it need to. The library itself is stateless, and I'd prefer to keep it that way. The thread control is purely on the Python side.

How would a pure-Python library implement a controller?

@jeremiedbb
Copy link
Collaborator

jeremiedbb commented Jul 12, 2023

As explained here #134 (comment), this is not the purpose of threadpoolctl.

threadpoolctl is essentially a tool to introspect the dynamic shared objects loaded by a python program in a robust and cross-platform manner, in order to call some functions from the libraries implementing their own threadpools, packaged in a simple python context manager. Libraries that can be controlled from python are not in the original scope of this project since for those it's already easy to provide such a context manager.

@alugowski
Copy link
Author

As explained here #134 (comment), this is not the purpose of threadpoolctl.

threadpoolctl is essentially a tool to introspect the dynamic shared objects loaded by a python program in a robust and cross-platform manner, in order to call some functions from the libraries implementing their own threadpools, packaged in a simple python context manager. Libraries that can be controlled from python are not in the original scope of this project since for those it's already easy to provide such a context manager.

That's fair.

Note it's an argument for closing nearly all of the outstanding library support tickets, though. TF, Arrow, TBB, all have Python-accessible ways to control parallelism. A parallel library that exposes itself to Python but without parallelism control is an outlier, and you currently support most such outliers.

My thoughts are echoed by your very next post in that thread:

For instance threadpoolctl provides a way to limit all supported libraries at once. Not having to write custom context managers for all libraries is nice.

That's very powerful.

Powerful in several ways. The obvious one is a common interface for something that varies a lot. Some libraries accept an argument for each method (is it num_threads, workers, parallelism?), some supply a static method, some expect you to set a variable, etc.

Another power is that some parallel methods are called as dependencies, and the middle layer may or may not expose a way to control parallelism of its dependencies. threadpoolctl may be the only way in such cases.

For example, the library I linked earlier is currently under review for inclusion in SciPy. Nearly every parallel operation in SciPy is controlled by threadpoolctl, so why shouldn't that one be? Just because it's possible to do it another way? Currently I accept a parallelism argument, which is fine by me, but I suspect a SciPy user would rather just do what they do for every other op instead of looking up the docs.

@jeremiedbb
Copy link
Collaborator

These are all valid points and I'm definitely not opposed to have threadpoolctl being used like that. My previous comment was more to explain the reason and give some context about why giving the name of the dso is necessary. In addition, the name of the dso is necessary to be able to figure out which shared libraries are actually loaded in order to instanciate their corresponding controller.

Anyway, I don't think that having to provide the filename prefix is an issue for your controller. For this use case we can just see that as to provide a way for threadpoolctl to find out if the underlying shared library is loaded or not.

@alugowski
Copy link
Author

Sounds good :) Thanks again for the new feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants