Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard returns 404 error in Dask-MPI #126

Closed
alessandrocornacchia opened this issue Jun 27, 2024 · 3 comments · Fixed by #127
Closed

Dashboard returns 404 error in Dask-MPI #126

alessandrocornacchia opened this issue Jun 27, 2024 · 3 comments · Fixed by #127

Comments

@alessandrocornacchia
Copy link

alessandrocornacchia commented Jun 27, 2024

I am trying to access the bokeh dashboard in a HPC environment managed by a Slurm scheduler.

I installed dask_mpi using conda. I ran the following in the Slurm submission script:

mpirun -np $SLURM_NTASKS dask-mpi --scheduler-file scheduler.json

The scheduler starts correctly, and I can also connect with a Client.

INFO: localdir at /scratch/974298.acornacchia
INFO: your job will run on local system.
2024-06-27 00:23:00,531 - distributed.scheduler - INFO - State start
2024-06-27 00:23:00,585 - distributed.scheduler - INFO -   Scheduler at:   tcp://192.168.7.50:8786
2024-06-27 00:23:00,585 - distributed.scheduler - INFO -   dashboard at:  http://192.168.7.50:8787/status
2024-06-27 00:23:00,641 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2024-06-27 00:23:00,927 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.69:46705'
2024-06-27 00:23:00,928 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.73:41865'
2024-06-27 00:23:00,944 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.70:44559'
2024-06-27 00:23:45,142 - distributed.scheduler - INFO - Receive client connection: Client-bfa9d376-340a-11ef-944a-e4434b640dd8
2024-06-27 00:24:36,898 - distributed.core - INFO - Starting established connection to tcp://192.168.7.254:49934
2024-06-27 00:24:36,898 - distributed.core - INFO - Connection to tcp://192.168.7.254:49934 has been closed.
2024-06-27 00:24:36,898 - distributed.scheduler - INFO - Remove client Client-bfa9d376-340a-11ef-944a-e4434b640dd8

However, the dashboard returns 404 HTTP error when I try to access its url

wget http://192.168.7.50:8787/status

--2024-06-27 00:26:31--  http://192.168.7.50:8787/status
Connecting to 192.168.7.50:8787... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-06-27 00:26:31 ERROR 404: Not Found.

Environment:

  • Dask version:
conda list -n try-dask | grep dask
# packages in environment at /home/acornacchia/miniconda3/envs/try-dask:
dask                      2024.5.0                 pypi_0    pypi
dask-core                 2024.5.0        py310h06a4308_0  
dask-expr                 1.1.0                    pypi_0    pypi
dask-jobqueue             0.8.5              pyhd8ed1ab_0    conda-forge
dask-labextension         7.0.0                    pypi_0    pypi
dask-mpi                  2022.4.0                 pypi_0    pypi
  • Bokeh version: 3.4.1
  • Python version: 3.10.14
  • Operating System: CentOS Linux 7 (Core)

Additional notes:
This does not happen using dask-jobqueue, the dashboard runs correctly.

@alessandrocornacchia alessandrocornacchia changed the title Dashboard returns 404 error in MPI Dashboard returns 404 error in Dask-MPI Jun 27, 2024
@mrocklin
Copy link
Member

cc @kmpaul @jacobtomlinson

@jacobtomlinson
Copy link
Member

I can confirm I am able to reproduce this on my machine. I'm going to transfer this issue over to dask-mpi as it seems to be related to how that library is starting up the scheduler.

@jacobtomlinson jacobtomlinson transferred this issue from dask/distributed Jun 27, 2024
@jacobtomlinson
Copy link
Member

Ah it looks like you need to explicity specify the dashboard address.

The following works for me:

mpirun -np $SLURM_NTASKS dask-mpi --scheduler-file scheduler.json --dashboard-address :8787

This is a little unintuitive. I'll open a PR to enable it by default and add a flag to disable this, this is the same way dask scheduler works on the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants