Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“Error in PennyLane Lightning: custatevec dynamic library load failure” #875

Open
Shikairan opened this issue Aug 28, 2024 · 11 comments

Comments

@Shikairan
Copy link

Shikairan commented Aug 28, 2024

I can't pass the mpitests with cmd "mpirun -np 2 -env UCX_NET_DEVICES=eth0 python -m pytest mpitests --tb=short" and cmd "mpirun -np 2 python -m pytest mpitests --tb=short".

The pytest return the error: "pennylane_lightning.lightning_gpu_ops.LightningException: [/home/pl/pl5/pennylane-lightning-master/pennylane_lightning/core/src/simulators/lightning_gpu/MPIWorker.hpp][Line:178][Method:make_shared_mpi_worker]: Error in PennyLane Lightning: custatevec dynamic library load failure".

I compile mpi, ucx and lightning.gpu with mpi in the docker image <nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04>(IMAGE ID : bc9059f96b2a).

1: compile the mpich-4.2.2 with source, use cmd: ./configure --prefix=/my/path --with-device=ch4:ucx --with-cuda=/my/cuda/path
I can pass the example in the mpi package, include the , , <cuda/cudapi test>

2: compile the ucx-1.7.0 with source, use cmd: ../configure --prefix=/my/own/path
It can pass test by using cmd: "mpirun -np 2 -env UCX_NET_DEVICES=eth0 ./cuda/cudapi" in mpi examples-test.

So, it seems like the base enviroment can work.

Then I follow the steps in the pennylane-lightning to install lightning.gpu with mpi.
1: try to pip the requirement.txt and requirement-dev.txt in different conda enviromnet. I try the two requirement both.
2: follow the steps in the Lightning-GPU installation
Then I can't pass the pytest of mpi-test. The error detail is above.

If i use pip to install lightning.gpu (without mpi, only gpu vision), I can pass the pytest in tests. So the custatevec can work in plan.

The log of installing:
mpilightning.gpu.install.log

@alister
Copy link

alister commented Aug 28, 2024

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

@Shikairan
Copy link
Author

A Warning about the above reply and the link to malware on mediafire (from the author of Curl): https://mastodon.social/@bagder/113038399943924413

Thank you!

@github-staff github-staff deleted a comment from Shikairan Aug 28, 2024
@maliasadi
Copy link
Member

Hi @Shikairan, thank you for reporting this! Lightning-GPU is bounded with the system support of the NVIDIA cuQuantum libraries and cuStateVec supports CUDA capable GPU of generation SM 7.0 (Volta) and greater. Can you try compiling Lightning-GPU + MPI on NVIDIA GPUs with compute capability 7.0+? You may want to use CMAKE_CUDA_ARCHITECTURES to specify the CUDA architecture at compile time.

@Shikairan
Copy link
Author

CMAKE_CUDA_ARCHITECTURES

I test this docker image and compile lightning.gpu on 4090/4080/3090TI/A800/A100, the all of those GPU cant help to pass the mpitest.

@maliasadi
Copy link
Member

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the
--config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

@Shikairan
Copy link
Author

This is unclear from your report if you followed the latest installation guideline or the stable one to build lightning.gpu+MPI.

To install the master version of lightning.gpu+MPI in editable mode, you need to use the --config-settings editable_mode=compat pip option as shown below:

PL_BACKEND="lightning_gpu" python scripts/configure_pyproject_toml.py
CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install -e . --config-settings editable_mode=compat -vv

If this didn't resolve the problem, try to install lightning.gpu regularly with CMAKE_ARGS="-DENABLE_MPI=ON" python -m pip install . to ensure the package can be found and loaded from site_packages across nodes.

Please let us know if none of the above resolves your issue and don't hesitate to send us the complete build steps and logs in case of failure.

I tried to compile this project since last month, both compile cmds had been tried, but still failed. Both of them return the same error which I mentioned above.
I will tried to compile again to collect all the logs in the docker, all the logs will be upload next week, the project will be compiled on a machine with two 3090TI.

@maliasadi
Copy link
Member

Hey @Shikairan, I'm just following up on this issue. Were you able to compile and test Lightning-GPU with MPI?

@Shikairan
Copy link
Author

Shikairan commented Sep 26, 2024

Hey @Shikairan, I'm just following up on this issue. Were you able to compile and test Lightning-GPU with MPI?

Here is the latest log:
base env.txt
penny-lightning install log.txt

split each step by string "================================================================"

@kevzos
Copy link

kevzos commented Nov 5, 2024

I encounter same issue。

mpirun -np 2 python -m pytest mpitests --tb=short -x
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.3.3, pluggy-1.5.0
rootdir: /data/whc/pennylane-lightning
configfile: pyproject.toml
plugins: flaky-3.8.1, xdist-3.6.1, mock-3.14.0, cov-6.0.0
collected 3736 items

mpitests/test_adjoint_jacobian.py ============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-8.3.3, pluggy-1.5.0
rootdir: /data/whc/pennylane-lightning
configfile: pyproject.toml
plugins: flaky-3.8.1, xdist-3.6.1, mock-3.14.0, cov-6.0.0
collected 3736 items

mpitests/test_adjoint_jacobian.py EE

==================================== ERRORS ====================================
_______ ERROR at setup of TestAdjointJacobian.test_not_expval[dev0-True] _______
mpitests/test_adjoint_jacobian.py:51: in fixture_dev
    return qml.device(
/root/anaconda3/envs/mpi310/lib/python3.10/site-packages/pennylane/devices/device_constructor.py:280: in device
    dev = plugin_device_class(*args, **options)
pennylane_lightning/lightning_gpu/lightning_gpu.py:354: in __init__
    self._statevector = self.LightningStateVector(
pennylane_lightning/lightning_gpu/_state_vector.py:101: in __init__
    self._qubit_state = self._state_dtype()(
E   pennylane_lightning.lightning_gpu_ops.LightningException: [/data/whc/pennylane-lightning/pennylane_lightning/core/src/simulators/lightning_gpu/MPIWorker.hpp][Line:178][Method:make_shared_mpi_worker]: Error in PennyLane Lightning: custatevec dynamic library load failure

I follow instruction in https://docs.pennylane.ai/projects/lightning/en/stable/lightning_gpu/installation.html#id1 to compile from source ,run testcase on centos with 2gpu of 3090ti ,cuda==12.1.

@kevzos
Copy link

kevzos commented Nov 5, 2024

Tested on cuquatum container,use pip install,got issue:
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/pennylane_lightning/lightning_gpu/lightning_gpu.py", line 297, in _mpi_init_helper
raise ImportError("MPI related APIs are not found.")
ImportError: MPI related APIs are not found.

@multiphaseCFD
Copy link
Member

multiphaseCFD commented Nov 11, 2024

Hey @kevzos and @Shikairan ,

Thanks for your interests in the distributed Lightning.GPU and reporting the issue.

Would you please help to check if adding path\to\libmpi.so to the LD_LIBRARY_PATH env work?

Please feel free to reach out if there is any issue.

Thanks,

multiphaseCFD added a commit that referenced this issue Nov 25, 2024
…te (#993)

### Before submitting

Please complete the following checklist when submitting a PR:

- [ ] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      [`tests`](../tests) directory!

- [ ] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [ ] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing
the
      change, and including a link back to the PR.

- [x] Ensure that code is properly formatted by running `make format`. 

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:**
[SC-72314]

This PR is related the
[issue](#875).

**Description of the Change:**

**Benefits:**

**Possible Drawbacks:**

**Related GitHub Issues:**

---------

Co-authored-by: ringo-but-quantum <[email protected]>
Co-authored-by: Ali Asadi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@alister @maliasadi @kevzos @multiphaseCFD @Shikairan and others