Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a manual override for Katana CPU topology and thread pinning #71

Open
arthurp opened this issue Feb 9, 2021 · 4 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@arthurp
Copy link
Contributor

arthurp commented Feb 9, 2021

We should provide a way to manually provide CPU topology, overriding the parsing of /proc and maybe even libnuma.
Previous issues (e.g., #64) have come up around incorrectly reading CPU information. A manual override would allow us to work around future issues of this kind quickly, and it would enable experimentation with non-standard thread pinnings and that kind of thing (e.g., I could tell Katana that my local machine has 4 NUMA nodes with 4 cores each for testing and debugging NUMA-specific crashes).

As a strawman, I propose the following environment variable:

KATANA_OVERRIDE_CORES="0,1,2,3:4,5,6,7"

Meaning "treat cores 0, 1, 2, 3 as one NUMA node and 4, 5, 6, 7 as another, ignore all other cores". In this mode Katana would ignore all other information about hyperthreading or memory and just do as it's told.

More specifically, the environment variable KATANA_OVERRIDE_CORES is a list of "fake NUMA nodes" separated by : (colon). Each fake NUMA node is a list of core numbers separated by , (comma). Or generally: a1,...,an:...:b1,...,bm. All core numbers are the numbers used by sched_setaffinity, with no remapping, to avoid bugs in the mapping itself. This provides enough information for the Katana thread pool to set things up correctly. There is a little more information available in HWTopoInfo, but it's not used and we should be able to generate reasonable fake information even if it was.

Issues with this simple approach:

  • It doesn't provide any way to specify hyperthreads.
  • It is very opaque and wouldn't be useful for end users. Though honestly, end users shouldn't be overriding these things anyway.
@arthurp arthurp added enhancement New feature or request good first issue Good for newcomers labels Feb 9, 2021
@ddn0
Copy link
Contributor

ddn0 commented Feb 9, 2021

Seems reasonable to me if only to provide a path for faking configs for testing.

Small suggestion for enhancement: make the format of "katana topo override" value an implementation detail subject to future change.

libgalois used to respond to an environment variable along the lines of "DEBUG_PRINT_TOPO", which would print the mappings it had inferred. I thought it was a little weird to have this functionality in the library itself and stuck into every program using libgalois, so this capability is now just available in the topo unit test.

It might make sense to promote this to an actual tool that (1) prints out the current inferred information (human, json) and (2) accepts some json data and spits out the internal override value suitable for an environment variable.

Pros:

  • Slightly more self-documenting UX
  • Puts the reading of topo information next to the configuring of topo information for a user
  • Better error handling when a tool can spit out error messages

Cons:

  • The tool version may be different than the library version (probably not a major concern for a developer tool)

Separately, it might make sense to use hwloc directly:
https://www.open-mpi.org/projects/hwloc/

@arthurp
Copy link
Contributor Author

arthurp commented Feb 9, 2021

It might make sense to promote this to an actual tool that (1) prints out the current inferred information (human, json) and (2) accepts some json data and spits out the internal override value suitable for an environment variable.

If we have a tool like this, why not just allow the user to provide a json filename in the environment variable and parse it in libgalois? No reason to have the indirection since libgalois already depends on a json parser via libtsuba.

@ddn0
Copy link
Contributor

ddn0 commented Feb 9, 2021

That's fine too. Though, I'd start with something that takes the contents of the json config (or base64 encoding thereof) in an environment variable first. In a lot of cases, it is easier to hack in an environment variable than it is a file (e.g., multiple hosts, inside a container).

@arthurp
Copy link
Contributor Author

arthurp commented Feb 9, 2021

That's a good point. And I would much prefer to have a simple format that is expert readable than a totally opaque base64 blob. So I think your initial idea is good.We could definitely just put a little json data into an environment variable. We would get a format like [[0,1,2,3],[4,5,6,7]] which is reasonably simple and wouldn't require tricks to quote properly on a shell. We could support more complex things too if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants