Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules fail for Dreambooth example #10888

Open
paulaserna16 opened this issue Oct 15, 2024 · 1 comment
Open

Modules fail for Dreambooth example #10888

paulaserna16 opened this issue Oct 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@paulaserna16
Copy link

I'm trying to run the dreambooth tutorial, and when executing the dreambooth.py it raises an error related to the libraries, for example:

[NeMo W 2024-10-08 09:12:02 megatron_lm_encoder_decoder_model:78] Megatron num_microbatches_calculator not found, using Apex version.
Traceback (most recent call last):
File "/opt/NeMo/clip/convert_external_clip_to_nemo.py", line 53, in
from nemo.collections.multimodal.models.vision_language_foundation.clip.megatron_clip_models import MegatronCLIPModel
File "/usr/local/lib/python3.10/dist-packages/nemo/collections/multimodal/models/vision_language_foundation/clip/megatron_clip_models.py", line 311, in
class SiglipMHAPoolingHead(TransformerLayer):
NameError: name 'TransformerLayer' is not defined

I don't think you specify which NeMo image is compatible with this example. Two months ago it was working with the image nvcr.io/nvidia/nemo:24.02, but now it gives the error I put above.
Please could you tell me which image works with this tutorial? or which versions of the libraries are needed?

Thank you so much.

@paulaserna16 paulaserna16 added the bug Something isn't working label Oct 15, 2024
@paulaserna16
Copy link
Author

Update on the procedure I'm following:

  • I use the Nemo framework 24.07 image
  • Launch a container with the image (witha couple of volumes): docker run --runtime=nvidia --gpus all -it --rm
    -v /mnt/shared_demos/dreambooth/nemo:/opt/NeMo
    -v /mnt/shared_models/huggingface/cache/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9:/opt/models
    --shm-size=8g -p 8888:8888
    --ulimit memlock=-1 --ulimit stack=67108864
    nvcr.io/nvidia/nemo:24.05
  • I install the main branch from the nemo repository in order to get the nemo toolkit installed: python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all]
  • The one above installs the megatron library in /opt/megatron-lm. However, this is version 0.8; and if I go to the path where it's located, the contents of /megatron/core are not updated, for example, there's no folder called extensions.

Thus, it installs something that cannot be used for the dreambooth example.

There's also a section (Megatron-LM) where it is specified to run the following in roder to use it:

This has the versions updated but installs a version of megatron incompatible with dreambooth files (0.10) and image, I believe.

Could you help me with this issue please? is it related to the branch I'm installing? to the image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant