-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a manual override for Katana CPU topology and thread pinning #71
Comments
Seems reasonable to me if only to provide a path for faking configs for testing. Small suggestion for enhancement: make the format of "katana topo override" value an implementation detail subject to future change. libgalois used to respond to an environment variable along the lines of "DEBUG_PRINT_TOPO", which would print the mappings it had inferred. I thought it was a little weird to have this functionality in the library itself and stuck into every program using libgalois, so this capability is now just available in the topo unit test. It might make sense to promote this to an actual tool that (1) prints out the current inferred information (human, json) and (2) accepts some json data and spits out the internal override value suitable for an environment variable. Pros:
Cons:
Separately, it might make sense to use hwloc directly: |
If we have a tool like this, why not just allow the user to provide a json filename in the environment variable and parse it in libgalois? No reason to have the indirection since libgalois already depends on a json parser via libtsuba. |
That's fine too. Though, I'd start with something that takes the contents of the json config (or base64 encoding thereof) in an environment variable first. In a lot of cases, it is easier to hack in an environment variable than it is a file (e.g., multiple hosts, inside a container). |
That's a good point. And I would much prefer to have a simple format that is expert readable than a totally opaque base64 blob. So I think your initial idea is good.We could definitely just put a little json data into an environment variable. We would get a format like |
We should provide a way to manually provide CPU topology, overriding the parsing of
/proc
and maybe even libnuma.Previous issues (e.g., #64) have come up around incorrectly reading CPU information. A manual override would allow us to work around future issues of this kind quickly, and it would enable experimentation with non-standard thread pinnings and that kind of thing (e.g., I could tell Katana that my local machine has 4 NUMA nodes with 4 cores each for testing and debugging NUMA-specific crashes).
As a strawman, I propose the following environment variable:
Meaning "treat cores 0, 1, 2, 3 as one NUMA node and 4, 5, 6, 7 as another, ignore all other cores". In this mode Katana would ignore all other information about hyperthreading or memory and just do as it's told.
More specifically, the environment variable
KATANA_OVERRIDE_CORES
is a list of "fake NUMA nodes" separated by:
(colon). Each fake NUMA node is a list of core numbers separated by,
(comma). Or generally:a1,...,an:...:b1,...,bm
. All core numbers are the numbers used bysched_setaffinity
, with no remapping, to avoid bugs in the mapping itself. This provides enough information for the Katana thread pool to set things up correctly. There is a little more information available inHWTopoInfo
, but it's not used and we should be able to generate reasonable fake information even if it was.Issues with this simple approach:
The text was updated successfully, but these errors were encountered: