Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Consider using different default values for cluster configurations #348

Open
randerzander opened this issue Jul 24, 2020 · 7 comments
Labels
inactive-30d inactive-90d proposal A code change suggestion to be considered or discussed

Comments

@randerzander
Copy link
Contributor

randerzander commented Jul 24, 2020

In our gpu-bdb benchmarking, we found we needed to configure quite a few bash environment variables when setting up a dask-cuda cluster for optimal performance.

For example, if you're not using UCX, it's likely that GPU worker to GPU worker communication over TCP is at least higher latency, if not slower. To avoid TCP connection failures we needed to reconfigure Dask's "COMM" settings.

export DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT="100s"
export DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP="600s"
export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MIN="1s"
export DASK_DISTRIBUTED__COMM__RETRY__DELAY__MAX="60s"

I suspect a great many dask-cuda users will be using TCP and not UCX, so would benefit from dask-cuda automatically re-configuring Dask's TCP defaults.

We also need to set --memory-limit, --device-memory-limit, and --rmm-pool-size values:

MAX_SYSTEM_MEMORY=$(free -m | awk '/^Mem:/{print $2}')M
DEVICE_MEMORY_LIMIT="25GB"
POOL_SIZE="30GB"

dask-cuda-worker --device-memory-limit $DEVICE_MEMORY_LIMIT --rmm-pool-size=$POOL_SIZE --memory-limit=$MAX_SYSTEM_MEMORY ...

It would be nice if dask-cuda included logic for detecting GPU memory per card and setting default values for device memory limit and RMM pool size. As a starting point, for a 32GB card, 25GB memory limit with 30GB pool size work well for the 30 tpcx-bb queries. I believe that's a decent representation of typical workflows where those values could be used proportionally for other card memory sizes as well.

@quasiben
Copy link
Member

I think having some of these defaults makes sense . In dask core --memory-limit is set to auto by default .

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: info = pynvml.nvmlDeviceGetMemoryInfo(handle)

In [5]: info.total
Out[6]: 34089730048

maybe rmm-pool-size = Total*90%. @pentschev @jakirkham do you have thoughts here ?

@randerzander
Copy link
Contributor Author

@mt-jones may remember why we had to set --memory-limit specifically. There might have been a bug where it wasn't being set properly, or the heuristic Dask uses is inappropriate for, say a DGX1.

@pentschev
Copy link
Member

While I understand how changing such defaults make sense for TPCx-BB, I'm not so sure this makes sense for everyone, I don't think it will be difficult to find people trying to run dask-cuda together with other applications that also require memory (either device or host), and that could be a bit annoying for users. Think for example of someone running this on a workstation for which one of the GPUs is also used as display. That's not to say I'm totally against the idea, but I'm just a bit concerned about the broader usability of dask-cuda.

The --memory-limit issue was really a bug that got fixed in #269 .

As for the communication variables, I think they were not for the workers (or not only, perhaps), but for the scheduler, am I mistaken? Changing the defaults for the scheduler would have to go to distributed upstream. As for workers, if we change those values we should probably check what would be really appropriate numbers, I have a feeling those were just defined arbitrarily high to make things work, was that not the case? And when we scale, would we have to scale the numbers too?

@pentschev
Copy link
Member

I was just thinking after writing the comment above, maybe we should have some sort of "default recipes" for different use cases? E.g., the TPCx-BB case could use a "performance recipe", while today's defaults could be something like a "conservative recipe" and so on. I think this would alleviate complexity for multiple uses while still providing defaults that should work mostly anywhere and have an easy switch.

@pentschev
Copy link
Member

I forgot to mention another related (potentially duplicate) issue: #334 .

@pentschev pentschev added the proposal A code change suggestion to be considered or discussed label Jan 8, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive-30d inactive-90d proposal A code change suggestion to be considered or discussed
Projects
Status: No status
Development

No branches or pull requests

3 participants