Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCCL Demo does not work for multi-node. #42

Open
veritas9872 opened this issue Dec 31, 2024 · 4 comments
Open

HCCL Demo does not work for multi-node. #42

veritas9872 opened this issue Dec 31, 2024 · 4 comments

Comments

@veritas9872
Copy link

Hello. I am trying to set up the HCCL demo for multiple nodes.
However, I get the following error message when trying to run the commands in a container.

  1. MPI requires --run-as-root flag to run as root user.
  2. HCCL demo error: [find_interface] MPI requires --mca btl_tcp_if_include <interface>.

I think that both issues should be addressed for demos.

@ytava
Copy link
Collaborator

ytava commented Dec 31, 2024

@veritas9872 thank you for reporting.
Does it happen only with latest demo code? we had some issue with latest upload which should be fixed now.
Can you please re-test?

@veritas9872
Copy link
Author

Hello. I have been testing with the version updated three weeks ago, where the make command still worked.

@veritas9872
Copy link
Author

I will try with the latest version. However, I think that this is fundamentally because MPI does not allow running as root by default.

@gad-arbel
Copy link

gad-arbel commented Dec 31, 2024

Hi @veritas9872,

These are MPI-related issues that should be handled by the Python wrapper (see detailed usage explanation in the README). The correct way to use hccl_demo is by running it with the Python wrapper. If it doesn't solve your issues, please share your command.

Regarding your issues:

Issue 1: MPI Requires --allow-run-as-root Flag to Run as Root User
When running MPI applications as the root user, you need to include the --allow-run-as-root flag to permit execution. This is a security measure to prevent accidental execution of MPI programs with root privileges.

Issue 2: HCCL Demo Error: [find_interface] MPI requires --mca btl_tcp_if_include
This error indicates that MPI requires you to specify the network interface to use for TCP communication. The --mca btl_tcp_if_include flag allows you to specify the desired network interface.

Important note regarding container usage: The Python wrapper uses the route command to identify the network interface. If the container does not support the route command, you must explicitly add the --mca btl_tcp_if_include <interface> option to your command.
Identify the network interface you want to use (e.g., eth0, eth1, etc.) and replace <interface> accordingly.

We hope these solutions resolve the issues you are experiencing. If you have any further questions or need additional assistance, please do not hesitate to contact us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants