Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RuntimeError: No backend type associated with device type cpu #18803

Open
shenoynikhil opened this issue Oct 16, 2023 · 28 comments
Open

[Bug] RuntimeError: No backend type associated with device type cpu #18803

shenoynikhil opened this issue Oct 16, 2023 · 28 comments
Labels
bug Something isn't working ver: 2.1.x working as intended Working as intended

Comments

@shenoynikhil
Copy link
Contributor

shenoynikhil commented Oct 16, 2023

Bug description

On upgrading torch and lightning to both 2.1.0, and running DDP leads to the following error trace,

# Error messages and logs here please
23 Traceback (most recent call last):
24   File "/home/nikhil_valencediscovery_com/projects/openMLIP/src/mlip/train.py", line 126, in main
25     train(cfg)
26   File "/home/nikhil_valencediscovery_com/projects/openMLIP/src/mlip/train.py", line 102, in train
27     trainer.fit(model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
28   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 545, in fit
29     call._call_and_handle_interrupt(
30   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
31     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
32            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
33   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch
34     return function(*args, **kwargs)
35            ^^^^^^^^^^^^^^^^^^^^^^^^^
36   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
37     self._run(model, ckpt_path=ckpt_path)
38   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _run
39     results = self._run_stage()
40               ^^^^^^^^^^^^^^^^^
41   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1034, in _run_stage
42     self._run_sanity_check()
43   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/trainer.py", line 1063, in _run_sanity_check
44     val_loop.run()
45   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/loops/utilities.py", line 181, in _decorator
46     return loop_run(self, *args, **kwargs)
47            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
48   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 141, in run
49     return self.on_run_end()
50            ^^^^^^^^^^^^^^^^^
51   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 253, in on_run_end
52     self._on_evaluation_epoch_end()
53   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 331, in _on_evaluation_epoch_end
54     trainer._logger_connector.on_epoch_end()
55   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 187, in on_epoch_end
56     metrics = self.metrics
57               ^^^^^^^^^^^^
58   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 226, in metrics
59     return self.trainer._results.metrics(on_step)
60            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
61   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 471, in metrics
62     value = self._get_cache(result_metric, on_step)
63             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 435, in _get_cache
65     result_metric.compute()
66   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 280, in wrapped_func
67     self._computed = compute(*args, **kwargs)
68                      ^^^^^^^^^^^^^^^^^^^^^^^^
69   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 243, in compute
70     value = self.meta.sync(self.value.clone())  # `clone` because `sync` is in-place
71             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
72   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/pytorch/strategies/ddp.py", line 330, in reduce
73     return _sync_ddp_if_available(tensor, group, reduce_op=reduce_op)
74            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
75   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/fabric/utilities/distributed.py", line 171, in _sync_ddp_if_available
76     return _sync_ddp(result, group=group, reduce_op=reduce_op)
77            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/lightning/fabric/utilities/distributed.py", line 221, in _sync_ddp
79     torch.distributed.all_reduce(result, op=op, group=group, async_op=False)
80   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
81     return func(*args, **kwargs)
82            ^^^^^^^^^^^^^^^^^^^^^
83   File "/home/nikhil_valencediscovery_com/local/conda/envs/mlip4/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2050, in all_reduce
84     work = group.allreduce([tensor], opts)
85            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
86 RuntimeError: No backend type associated with device type cpu
87 Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

On downgrading lightning to 2.0.1, the error goes away.

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

Environment

Current environment
<details>
  <summary>Current environment</summary>

* CUDA:
	- GPU:               None
	- available:         False
	- version:           11.8
* Lightning:
	- lightning:         2.0.1.post0
	- lightning-cloud:   0.5.42
	- lightning-utilities: 0.9.0
	- pytorch-lightning: 2.1.0
	- torch:             2.1.0
	- torch-cluster:     1.6.3
	- torch-geometric:   2.4.0
	- torch-scatter:     2.1.2
	- torch-sparse:      0.6.18
	- torchmetrics:      1.2.0
* Packages:
	- absl-py:           2.0.0
	- aiobotocore:       2.5.4
	- aiohttp:           3.8.6
	- aioitertools:      0.11.0
	- aiosignal:         1.3.1
	- antlr4-python3-runtime: 4.9.3
	- anyio:             3.7.1
	- appdirs:           1.4.4
	- argon2-cffi:       23.1.0
	- argon2-cffi-bindings: 21.2.0
	- arrow:             1.3.0
	- ase:               3.22.1
	- asttokens:         2.4.0
	- async-lru:         2.0.4
	- async-timeout:     4.0.3
	- attrs:             23.1.0
	- babel:             2.13.0
	- backcall:          0.2.0
	- backoff:           2.2.1
	- backports.cached-property: 1.0.2
	- backports.functools-lru-cache: 1.6.5
	- beautifulsoup4:    4.12.2
	- black:             23.9.1
	- bleach:            6.1.0
	- blessed:           1.19.1
	- blinker:           1.6.3
	- boto3:             1.28.17
	- botocore:          1.31.17
	- brotli:            1.1.0
	- build:             0.10.0
	- cachecontrol:      0.12.14
	- cached-property:   1.5.2
	- cachetools:        5.3.1
	- certifi:           2023.7.22
	- cffi:              1.16.0
	- cfgv:              3.3.1
	- charset-normalizer: 3.3.0
	- cleo:              2.0.1
	- click:             8.1.7
	- colorama:          0.4.6
	- comm:              0.1.4
	- contourpy:         1.1.1
	- coverage:          7.3.2
	- crashtest:         0.4.1
	- croniter:          1.3.15
	- cryptography:      41.0.4
	- cycler:            0.12.1
	- datamol:           0.0.0
	- dateutils:         0.6.12
	- debugpy:           1.8.0
	- decorator:         5.1.1
	- deepdiff:          6.6.0
	- defusedxml:        0.7.1
	- distlib:           0.3.7
	- docker-pycreds:    0.4.0
	- dulwich:           0.21.6
	- e3nn:              0.5.1
	- einops:            0.6.0
	- entrypoints:       0.4
	- exceptiongroup:    1.1.3
	- executing:         1.2.0
	- fastapi:           0.88.0
	- fastjsonschema:    2.18.1
	- filelock:          3.12.4
	- flask:             3.0.0
	- fonttools:         4.43.1
	- fqdn:              1.5.1
	- freetype-py:       2.3.0
	- frozenlist:        1.4.0
	- fsspec:            2023.9.2
	- gcsfs:             2023.9.2
	- gitdb:             4.0.10
	- gitpython:         3.1.37
	- gmpy2:             2.1.2
	- google-api-core:   2.12.0
	- google-auth:       2.23.3
	- google-auth-oauthlib: 0.4.6
	- google-cloud-core: 2.3.3
	- google-cloud-storage: 2.12.0
	- google-crc32c:     1.1.2
	- google-resumable-media: 2.6.0
	- googleapis-common-protos: 1.61.0
	- greenlet:          3.0.0
	- grpcio:            1.59.1
	- h11:               0.14.0
	- h5py:              3.10.0
	- html5lib:          1.1
	- hydra-core:        1.3.2
	- identify:          2.5.30
	- idna:              3.4
	- importlib-metadata: 6.8.0
	- importlib-resources: 6.1.0
	- iniconfig:         2.0.0
	- inquirer:          3.1.3
	- installer:         0.7.0
	- ipdb:              0.13.13
	- ipykernel:         6.25.2
	- ipython:           8.16.1
	- ipywidgets:        8.1.1
	- isoduration:       20.11.0
	- itsdangerous:      2.1.2
	- jaraco.classes:    3.3.0
	- jedi:              0.19.1
	- jeepney:           0.8.0
	- jinja2:            3.1.2
	- jmespath:          1.0.1
	- joblib:            1.3.2
	- json5:             0.9.14
	- jsonpointer:       2.4
	- jsonschema:        4.19.1
	- jsonschema-specifications: 2023.7.1
	- jupyter-client:    8.4.0
	- jupyter-core:      5.4.0
	- jupyter-events:    0.7.0
	- jupyter-lsp:       2.2.0
	- jupyter-server:    2.7.3
	- jupyter-server-terminals: 0.4.4
	- jupyterlab:        4.0.7
	- jupyterlab-pygments: 0.2.2
	- jupyterlab-server: 2.25.0
	- jupyterlab-widgets: 3.0.9
	- keyring:           23.13.1
	- kiwisolver:        1.4.5
	- lightning:         2.0.1.post0
	- lightning-cloud:   0.5.42
	- lightning-utilities: 0.9.0
	- lockfile:          0.12.2
	- loguru:            0.7.2
	- markdown:          3.5
	- markdown-it-py:    3.0.0
	- markupsafe:        2.1.3
	- matplotlib:        3.8.0
	- matplotlib-inline: 0.1.6
	- matscipy:          0.7.0
	- mdurl:             0.1.0
	- mistune:           3.0.1
	- mlip:              0.0.1.dev157+gc3d9c0b.d20231016
	- more-itertools:    10.1.0
	- mpmath:            1.3.0
	- msgpack:           1.0.6
	- multidict:         6.0.4
	- munkres:           1.1.4
	- mypy-extensions:   1.0.0
	- nbclient:          0.8.0
	- nbconvert:         7.9.2
	- nbformat:          5.9.2
	- nest-asyncio:      1.5.8
	- networkx:          3.1
	- nodeenv:           1.8.0
	- notebook-shim:     0.2.3
	- numpy:             1.26.0
	- oauthlib:          3.2.2
	- omegaconf:         2.3.0
	- openqdc:           0.0.0
	- opt-einsum:        3.3.0
	- opt-einsum-fx:     0.1.4
	- ordered-set:       4.1.0
	- orjson:            3.9.8
	- overrides:         7.4.0
	- packaging:         23.2
	- pandas:            2.1.1
	- pandocfilters:     1.5.0
	- parso:             0.8.3
	- pathspec:          0.11.2
	- pathtools:         0.1.2
	- patsy:             0.5.3
	- pexpect:           4.8.0
	- pickleshare:       0.7.5
	- pillow:            10.1.0
	- pip:               23.3
	- pkginfo:           1.9.6
	- pkgutil-resolve-name: 1.3.10
	- platformdirs:      3.11.0
	- pluggy:            1.3.0
	- ply:               3.11
	- poetry:            1.5.1
	- poetry-core:       1.6.1
	- poetry-plugin-export: 1.5.0
	- pre-commit:        3.5.0
	- prettytable:       3.9.0
	- prometheus-client: 0.17.1
	- prompt-toolkit:    3.0.39
	- protobuf:          4.24.4
	- psutil:            5.9.5
	- ptyprocess:        0.7.0
	- pure-eval:         0.2.2
	- pyasn1:            0.5.0
	- pyasn1-modules:    0.3.0
	- pycairo:           1.25.0
	- pycparser:         2.21
	- pydantic:          1.10.13
	- pygments:          2.16.1
	- pyjwt:             2.8.0
	- pyopenssl:         23.2.0
	- pyparsing:         3.1.1
	- pyproject-hooks:   1.0.0
	- pyqt5:             5.15.9
	- pyqt5-sip:         12.12.2
	- pyrootutils:       1.0.4
	- pysocks:           1.7.1
	- pytest:            7.4.2
	- pytest-cov:        4.1.0
	- python-dateutil:   2.8.2
	- python-dotenv:     1.0.0
	- python-editor:     1.0.4
	- python-json-logger: 2.0.7
	- python-multipart:  0.0.6
	- pytorch-lightning: 2.1.0
	- pytz:              2023.3.post1
	- pyu2f:             0.1.5
	- pyyaml:            6.0.1
	- pyzmq:             25.1.1
	- rapidfuzz:         2.15.2
	- readchar:          4.0.5.dev0
	- referencing:       0.30.2
	- reportlab:         4.0.6
	- requests:          2.31.0
	- requests-oauthlib: 1.3.1
	- requests-toolbelt: 1.0.0
	- rfc3339-validator: 0.1.4
	- rfc3986-validator: 0.1.1
	- rich:              13.6.0
	- rlpycairo:         0.2.0
	- rpds-py:           0.10.6
	- rsa:               4.9
	- ruff:              0.0.292
	- s3fs:              2023.9.2
	- s3transfer:        0.6.2
	- scikit-learn:      1.3.1
	- scipy:             1.11.3
	- seaborn:           0.13.0
	- secretstorage:     3.3.3
	- selfies:           2.1.1
	- send2trash:        1.8.2
	- sentry-sdk:        1.32.0
	- setproctitle:      1.3.3
	- setuptools:        68.2.2
	- shellingham:       1.5.3
	- sip:               6.7.12
	- six:               1.16.0
	- smmap:             3.0.5
	- sniffio:           1.3.0
	- soupsieve:         2.5
	- sqlalchemy:        2.0.22
	- stack-data:        0.6.2
	- starlette:         0.22.0
	- starsessions:      1.3.0
	- statsmodels:       0.14.0
	- sympy:             1.12
	- tensorboard:       2.11.2
	- tensorboard-data-server: 0.6.1
	- tensorboard-plugin-wit: 1.8.1
	- terminado:         0.17.1
	- threadpoolctl:     3.2.0
	- tinycss2:          1.2.1
	- toml:              0.10.2
	- tomli:             2.0.1
	- tomlkit:           0.12.1
	- torch:             2.1.0
	- torch-cluster:     1.6.3
	- torch-geometric:   2.4.0
	- torch-scatter:     2.1.2
	- torch-sparse:      0.6.18
	- torchmetrics:      1.2.0
	- tornado:           6.3.3
	- tqdm:              4.66.1
	- traitlets:         5.11.2
	- triton:            2.1.0
	- trove-classifiers: 2023.9.19
	- types-python-dateutil: 2.8.19.14
	- typing-extensions: 4.8.0
	- typing-utils:      0.1.0
	- tzdata:            2023.3
	- ukkonen:           1.0.1
	- uri-template:      1.3.0
	- urllib3:           1.26.17
	- uvicorn:           0.23.2
	- virtualenv:        20.24.4
	- wandb:             0.15.12
	- wcwidth:           0.2.8
	- webcolors:         1.13
	- webencodings:      0.5.1
	- websocket-client:  1.6.4
	- websockets:        11.0.3
	- werkzeug:          3.0.0
	- wheel:             0.41.2
	- widgetsnbextension: 4.0.9
	- wrapt:             1.15.0
	- yarl:              1.9.2
	- zipp:              3.17.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.11.6
	- release:           5.15.0-1032-gcp
	- version:           #40~20.04.1-Ubuntu SMP Tue Apr 11 02:49:52 UTC 2023

</details>

More info

No response

@shenoynikhil shenoynikhil added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 16, 2023
@andwaal-esmart
Copy link

I can confirm that I also experiencing this bug. Downgrading to 2.0.8 fixes it

Current env:

lightning                     2.1.0
lightning-cloud               0.5.42
lightning-utilities           0.9.0
pytorch-lightning             2.1.0

torch                         2.1.0+cu118
torchmetrics                  1.2.0
torchvision                   0.16.0+cu118

Stack:

File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _run
    results = self._run_stage()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 141, in run
    return self.on_run_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 253, in on_run_end
    self._on_evaluation_epoch_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 331, in _on_evaluation_epoch_end
    trainer._logger_connector.on_epoch_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 187, in on_epoch_end
Traceback (most recent call last):
    metrics = self.metrics
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 226, in metrics
    return self.trainer._results.metrics(on_step)
  File "/mnt/hdd1/users/Documents/dev/sandbox_dsm_scripts/train/train.py", line 91, in <module>
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 471, in metrics
    value = self._get_cache(result_metric, on_step)
    pytroch()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 435, in _get_cache
  File "/mnt/hdd1/users/Documents/dev/sandbox_dsm_scripts/train/train.py", line 73, in pytroch
    result_metric.compute()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 280, in wrapped_func
    trainer.train_cls_locally(
  File "/mnt/hdd1/users/Documents/dev/trainer.py", line 927, in train_cls_locally
    self._computed = compute(*args, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 245, in compute
    cumulated_batch_size = self.meta.sync(self.cumulated_batch_size)
    trainer.fit(model, datamodule=datamodule)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/strategies/ddp.py", line 330, in reduce
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 545, in fit
    return _sync_ddp_if_available(tensor, group, reduce_op=reduce_op)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/fabric/utilities/distributed.py", line 171, in _sync_ddp_if_available
    call._call_and_handle_interrupt(
    return _sync_ddp(result, group=group, reduce_op=reduce_op)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/fabric/utilities/distributed.py", line 221, in _sync_ddp
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
    torch.distributed.all_reduce(result, op=op, group=group, async_op=False)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1536, in all_reduce
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch
    return function(*args, **kwargs)
    work = group.allreduce([tensor], opts)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
RuntimeError: Tensors must be CUDA and dense
    self._run(model, ckpt_path=ckpt_path)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _run
    results = self._run_stage()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 141, in run
    return self.on_run_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 253, in on_run_end
    self._on_evaluation_epoch_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 331, in _on_evaluation_epoch_end
    trainer._logger_connector.on_epoch_end()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 187, in on_epoch_end
    metrics = self.metrics
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py", line 226, in metrics
    return self.trainer._results.metrics(on_step)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 471, in metrics
    value = self._get_cache(result_metric, on_step)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 435, in _get_cache
    result_metric.compute()
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 280, in wrapped_func
    self._computed = compute(*args, **kwargs)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py", line 245, in compute
    cumulated_batch_size = self.meta.sync(self.cumulated_batch_size)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/pytorch/strategies/ddp.py", line 330, in reduce
    return _sync_ddp_if_available(tensor, group, reduce_op=reduce_op)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/fabric/utilities/distributed.py", line 171, in _sync_ddp_if_available
    return _sync_ddp(result, group=group, reduce_op=reduce_op)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/lightning/fabric/utilities/distributed.py", line 221, in _sync_ddp
    torch.distributed.all_reduce(result, op=op, group=group, async_op=False)
  File "/home/miniconda3/envs/pt-lght/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1536, in all_reduce
    work = group.allreduce([tensor], opts)
RuntimeError: Tensors must be CUDA and dense

@awaelchli awaelchli added repro needed The issue is missing a reproducible example and removed needs triage Waiting to be triaged by maintainers labels Oct 18, 2023
@kleinhenz
Copy link

I see this as well

@senarvi
Copy link
Contributor

senarvi commented Oct 19, 2023

I started seeing this error but couldn't figure out what has caused it. It appears after the first validation epoch, apparently when computing a metric in the on_epoch_end callback. Downgrading to 2.0.8 helped.

@pableeto
Copy link

pableeto commented Oct 20, 2023

Same bug here after upgrade to torch==2.1.0 and lightning==2.1.0.

This bug appeared when running Metric.compute() of a torchmetric after a validation epoch.

Edit: I am using lightning fabric instead of lightning trainer. This bug is also triggered.

@fakufaku
Copy link

Same for me. Downgrading to pytorch-lightning==2.0.8 fixed the issue.

@samils7
Copy link

samils7 commented Oct 25, 2023

I've got the same error on torch==2.1.0 and lightning==2.1.0
and fixed when downgrading to pytorch_lightning==2.0.8

@emannix
Copy link

emannix commented Oct 27, 2023

I also just ran across this error. It seems like the self.log(key, val) calls have changed in some way, as in my case the error went away if I manually moved val to the GPU in every call of self.log in my code

@dsuess
Copy link

dsuess commented Oct 27, 2023

My feeling is that the DDP strategy in lightning==2.0.8 initialized distributed backends for both CPU and GPU when running with device=GPU. Below is a minimal example that works with 2.0.8, but crashes in 2.1.0:

import torch
from lightning import Trainer, LightningModule
from torch.utils.data import DataLoader


class LitModel(LightningModule):
    def __init__(self) -> None:
        super().__init__()
        self.layer = torch.nn.Linear(1, 1)

    def training_step(self, x):
        # Everything but the next line is just dummy-code to make it run
        self.log(
            "foo", value=torch.zeros(1, device="cpu"), on_step=True, sync_dist=True
        )
        loss = self.layer(x).mean()
        return loss

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=0.1)

    def train_dataloader(self):
        return DataLoader(torch.randn(32, 1), batch_size=1)


def main():
    model = LitModel()
    trainer = Trainer(devices=2, accelerator="gpu", max_epochs=2)
    trainer.fit(model)


if __name__ == "__main__":
    main()

Note that this isn't restricted to distributed code that's run by lightning. We have some functionality that uses torch.distributed directly and are running into the same exact issue when we try to broadcast non-CUDA tensors.

@egoetz
Copy link

egoetz commented Nov 21, 2023

Has this issue been addressed in nightly? I was really trying to stick to either pip or conda versions and it looks like 2.0.8 is not available on either.

@celpas
Copy link

celpas commented Nov 23, 2023

Same issue with PyTorch 2.1.1 and Lightning 2.1.2

@awaelchli
Copy link
Contributor

It looks like the change comes from this PR: #17334 (git-bisecting code sample by @dsuess)

@awaelchli
Copy link
Contributor

awaelchli commented Nov 27, 2023

It looks like the changes was intentional. The changelog says:

self.loged tensors are now kept in the original device to reduce unnecessary host-to-device synchronizations (#17334)

This means if you pass in the tensor, it already needs to be on the right device and the user needs to explicitly perform the .to() call.

cc @carmocca

@awaelchli awaelchli added working as intended Working as intended and removed repro needed The issue is missing a reproducible example labels Nov 27, 2023
@RuABraun
Copy link

RuABraun commented Dec 4, 2023

The resolution is not clear to me. I'm getting the message "RuntimeError: No backend type associated with device type cpu". If I was logging 20 things some of them on CPU some of GPU what should I be doing? From your comment @awaelchli I would've thought adding .to('cpu') calls but the error message makes me thing the opposite (but moving CPU results back to GPU also seems silly).

@senarvi
Copy link
Contributor

senarvi commented Dec 4, 2023

If I understood correctly, when using self.log(..., sync_dist=True) with DDP, you have to transfer the tensor to the GPU before logging.

Is it possible to move the tensors to the correct device automatically in LightningModule.log()? If not, I feel like this should be mentioned in the documentation, and it would be good to give a better error message. Currently the 15-minute Lightning tutorial instructs to remove any .cuda() or device calls, because LightningModules are hardware agnostic.

@dsuess
Copy link

dsuess commented Dec 5, 2023

@awaelchli Thanks for clarifying. I've found another corner case where the new behaviour breaks existing code: If you re-use a trainer instance multiple times (e.g. for evaluating multiple epochs), you can end up with metrics moved to CPU even if you log them with GPU tensors.

The reason being that the logger connector moves all intermediate results to CPU on teardown. So on the second call to trainer.validate, the helper-state (e.g. cumulated_batch_size) of the cached results are on CPU. This can be fixed by removing all cached results through

trainer.validate_loop._results.clear()

Here's a full example to reproduce this:

import torch
from lightning import Trainer, LightningModule
from torch.utils.data import DataLoader


class LitModel(LightningModule):
    def __init__(self) -> None:
        super().__init__()
        self.layer = torch.nn.Linear(1, 1)

    def training_step(self, x):
        loss = self.layer(x).mean()
        return loss

    def validation_step(self, *args, **kwargs):
        self.log(
            "foo", value=torch.zeros(1, device=self.device), on_step=True, sync_dist=True
        )
        return super().validation_step(*args, **kwargs)

    def configure_optimizers(self):
        return torch.optim.SGD(self.parameters(), lr=0.1)

    def val_dataloader(self):
        return DataLoader(torch.randn(32, 1), batch_size=1)


def main():
    model = LitModel()
    trainer = Trainer(devices=2, accelerator="gpu", max_epochs=2)
    trainer.validate(model)
    # Uncomment the following line to fix the issue
    #trainer.validate_loop._results.clear()
    trainer.validate(model)


if __name__ == "__main__":
    main()

zhehuaichen added a commit to zhehuaichen/NeMo that referenced this issue Dec 22, 2023
zhehuaichen added a commit to zhehuaichen/NeMo that referenced this issue Dec 22, 2023
@vitusbenson
Copy link

The reason being that the logger connector moves all intermediate results to CPU on teardown. So on the second call to trainer.validate, the helper-state (e.g. cumulated_batch_size) of the cached results are on CPU. This can be fixed by removing all cached results through

trainer.validate_loop._results.clear()

If you want to call trainer.fit twice, the analogue fix is:

trainer.fit_loop.epoch_loop.val_loop._results.clear()

@yirending
Copy link

I'm having the same issue using the latest version and resolved by downgrading to lightning==2.0.9.

@ouioui199
Copy link

I've solved the issue on lightning==2.1.3 . When rewriting any epoch_end function, if you log, just make sure that the tensor is on gpu device. If you initialize new tensor, initialize it with device=self.device

@xzklwj
Copy link

xzklwj commented Feb 1, 2024

I've solved the issue on lightning==2.1.3 . When rewriting any epoch_end function, if you log, just make sure that the tensor is on gpu device. If you initialize new tensor, initialize it with device=self.device

@ouioui199 suggestion works. I changed my code from
self.log_dict( {f"test_map_{label}": value for label, value in zip(self.id2label.values(), mAP_per_class)}, sync_dist=True, )

to

self.log_dict( {f"test_map_{label}": value.to("cuda") for label, value in zip(self.id2label.values(), mAP_per_class)}, sync_dist=True, )

ziw-liu added a commit to mehta-lab/VisCy that referenced this issue Apr 8, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
@asusdisciple
Copy link

Got the same error, when trying to compute a confusion matrix on a callback when I call metric.plot():

def on_test_epoch_end(self) -> None:
     metric = MulticlassConfusionMatrix(num_classes=self.num_classes).to("cpu")

     outputs = torch.cat(self.x_test, dim=0).to("cpu")
     labels = torch.cat(self.y_test, dim=0).to("cpu")
     outputs = torch.softmax(outputs, dim=1).argmax(dim=1)
     metric.update(outputs, labels)
     pl = ["Latin", "Russian", "Arabic", "Chinese"]
     fig_, ax_ = metric.plot(labels=pl)
     fig_.savefig("test.png")

titu1994 added a commit to NVIDIA/NeMo that referenced this issue Jun 7, 2024
* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>


---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer



* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values



---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)



* update apex install in dockerfile



* use fetch head



---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support lora weight tying

Signed-off-by: jasonwan <[email protected]>

* add copyright header

Signed-off-by: jasonwan <[email protected]>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <[email protected]>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <[email protected]>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <[email protected]>

* revert mcore config change

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* clean up ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* rollback git merge errors

Signed-off-by: jasonwan <[email protected]>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <[email protected]>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <[email protected]>

* one more fix for optional test set

Signed-off-by: jasonwan <[email protected]>

* support merging lora weights in mcore

Signed-off-by: jasonwan <[email protected]>

* update mcore for cpu init

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <[email protected]>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <[email protected]>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <[email protected]>

* disable dist ckpt in peft

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <[email protected]>

* add ckpt conversion to CI

Signed-off-by: jasonwan <[email protected]>

* update CI

Signed-off-by: jasonwan <[email protected]>

* mcore_mixin docstring

Signed-off-by: jasonwan <[email protected]>

* minor change in mcore peft error message

Signed-off-by: jasonwan <[email protected]>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <[email protected]>

* correct mcore fp8 config

Signed-off-by: jasonwan <[email protected]>

* add TE installation

Signed-off-by: jasonwan <[email protected]>

* support mcore adapter tuning

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <[email protected]>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <[email protected]>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <[email protected]>

* minor fix for prompt learning

Signed-off-by: jasonwan <[email protected]>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <[email protected]>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: Kelvin Liu <[email protected]>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <[email protected]>

* adjust key names based on O2

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <[email protected]>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <[email protected]>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <[email protected]>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <[email protected]>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <[email protected]>

* Fix minor typo

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <[email protected]>
Co-authored-by: Alexander Jipa <[email protected]>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <[email protected]>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

---------

Signed-off-by: omahs <[email protected]>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <[email protected]>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <[email protected]>

* use mcore

Signed-off-by: eharper <[email protected]>

* revert test change

Signed-off-by: eharper <[email protected]>

* pleasefixme

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <[email protected]>

* comment test temporarily

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <[email protected]>

* revert comment (#7368)

Signed-off-by: eharper <[email protected]>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <[email protected]>

* upper bounding ptl (#7370)

Signed-off-by: eharper <[email protected]>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <[email protected]>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <[email protected]>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <[email protected]>

* add to llama inference config

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progres…
marcromeyn added a commit to NVIDIA/NeMo that referenced this issue Jun 7, 2024
* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer

* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)

* update apex install in dockerfile

* use fetch head

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support lora weight tying

Signed-off-by: jasonwan <[email protected]>

* add copyright header

Signed-off-by: jasonwan <[email protected]>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <[email protected]>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <[email protected]>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <[email protected]>

* revert mcore config change

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* clean up ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* rollback git merge errors

Signed-off-by: jasonwan <[email protected]>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <[email protected]>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <[email protected]>

* one more fix for optional test set

Signed-off-by: jasonwan <[email protected]>

* support merging lora weights in mcore

Signed-off-by: jasonwan <[email protected]>

* update mcore for cpu init

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <[email protected]>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <[email protected]>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <[email protected]>

* disable dist ckpt in peft

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <[email protected]>

* add ckpt conversion to CI

Signed-off-by: jasonwan <[email protected]>

* update CI

Signed-off-by: jasonwan <[email protected]>

* mcore_mixin docstring

Signed-off-by: jasonwan <[email protected]>

* minor change in mcore peft error message

Signed-off-by: jasonwan <[email protected]>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <[email protected]>

* correct mcore fp8 config

Signed-off-by: jasonwan <[email protected]>

* add TE installation

Signed-off-by: jasonwan <[email protected]>

* support mcore adapter tuning

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <[email protected]>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <[email protected]>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <[email protected]>

* minor fix for prompt learning

Signed-off-by: jasonwan <[email protected]>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <[email protected]>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: Kelvin Liu <[email protected]>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <[email protected]>

* adjust key names based on O2

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <[email protected]>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <[email protected]>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <[email protected]>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <[email protected]>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <[email protected]>

* Fix minor typo

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <[email protected]>
Co-authored-by: Alexander Jipa <[email protected]>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <[email protected]>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

---------

Signed-off-by: omahs <[email protected]>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <[email protected]>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <[email protected]>

* use mcore

Signed-off-by: eharper <[email protected]>

* revert test change

Signed-off-by: eharper <[email protected]>

* pleasefixme

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <[email protected]>

* comment test temporarily

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <[email protected]>

* revert comment (#7368)

Signed-off-by: eharper <[email protected]>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <[email protected]>

* upper bounding ptl (#7370)

Signed-off-by: eharper <[email protected]>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <[email protected]>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <[email protected]>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <[email protected]>

* add to llama inference config

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progress bar to reflect total microbatch cnt

Signed-off-by: Abhishree <[email protected]>

* Modify CustomProgressBar class

1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch
2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder

Signed-off-by: Abhishree <[email protected]>

* Add CustomProgressBar callback to tuning files

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Set Activation Checkpointing Defaults (#7404)

* Set Activation Checkpointing Defaults

Signed-off-by: Abhinav Khattar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for None

Signed-off-by: Abhinav Khattar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* make loss mask default to false (#7407)

Signed-off-by: eharper <[email protected]>

* Add dummy userbuffer config files (#7408)

Signed-off-by: Sangkug Lym <[email protected]>

* add missing ubconf files (#7412)

Signed-off-by: Abhinav Khattar <[email protected]>

* New tutorial on Speech Data Explorer (#7405)

* Added Google Colab based tutorial on Speech Data Explorer

Signed-off-by: George Zelenfroynd <[email protected]>

* Update ptl training ckpt conversion script to work with dist ckpt (#7416)

* update ptl convert script

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* don't break legacy

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Allow disabling sanity checking when num_sanity_val_steps=0 (#7413)

* Allow disabling sanity checking when num_sanity_val_steps=0

Signed-off-by: Abhishree <[email protected]>

* Update num_sanity_val_steps to be a multiple of num_microbatches

Signed-off-by: Abhishree Thittenamane <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more informa…
akoumpa added a commit to NVIDIA/NeMo that referenced this issue Jun 10, 2024
…rategy (#9387)

* Integrating mcore's DistributedDataParallel into MegatronStrategy

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Apply ddp-hooks from pytorch only when needed

Signed-off-by: Marc Romeyn <[email protected]>

* bugfix if using mcore distOpt with sft (#9356)

* bugfix if using mcore distOpt

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* fix typo infer_seq_lenght -> infer_seq_length (#9370)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Rachitg/ag (#9083)

* Rachitg/ag (#9081)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix

* bugfix

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Adding the original change made for label_models (#9377) (#9378)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253)

* Lazily warn about using greedy strategy instead of greedy_batch
strategy.

Previously, the warning would often run spuriously, since several
existing code paths simply call "change_decoding_strategy()" after
having first initialized a Module, rather than changing the config
before initializing the Module. This can be confusing.

The only problem I can see with this is that using logging inside a
forward() method might interfere with some compiler toolkits like
Torchscript or thunder.compile. Presumably it would be easy to add a
conditional statement to avoid this statement in a compiler context if
necessary.

Signed-off-by: Daniel Galvez <[email protected]>
Co-authored-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Update README.rst (#9393)

Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed  nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes.

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* a2a fix removed tp world size and group from init (#8944) (#8952)

Signed-off-by: Anmol Gupta <[email protected]>
Co-authored-by: anmolgupt <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add config option for FP32 embedding grads (#8953)

* Add config option for FP32 embedding grads (#8946)

Signed-off-by: Tim Moon <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Changes to enable CUDA graph for LLM (#8955)

* Changes to enable CUDA graph for LLM (#8751)

* Use next instead of get_batch

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* CUDA graph changes

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change to enable CG with weight caching

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Use next instead of get_batch"

This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"

This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Remove skip_weight_update argument

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Bug fix + cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use new TE API for FP8 Param transpose

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change config param cuda_graph to enable_cuda_graph

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Enable TE RNGStatesTracker through config

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change te_rng_tracker to use_te_rng_tracker

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* FP8 weight transpose handled inside TE

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py""

This reverts commit e31862481216f9adf7fa584a0c0262916c935639.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: vasunvidia <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Enhance Distributed Adam (#9051)

* Enhance Distributed Adam (#9037)

* Fix deprecated env.

Signed-off-by: Wil Kong <[email protected]>

* Use user desired value for distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Preserve memory format in parameter buffer of distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather.

Signed-off-by: Wil Kong <[email protected]>

* Provide API to lock SHArP tree for distributed adam within nodes.

Signed-off-by: Wil Kong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Wil Kong <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390)

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Apply isort and black reformatting

---------

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: tango4j <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Properly catch failed tests by introduction of workflow templates (#9324)

* ci: Refactor tests into reusable template

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Fix sending alerts on failure

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* disable slack

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix alerting

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Increase timeout for `L0_Unit_Tests_CPU`

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout for `Speech_Checkpoints_tests`

Signed-off-by: Oliver Koenig <[email protected]>

* improve readability

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* finalize

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* add missing rm statement for `L2_PTQ_Llama2_Export_Only`

Signed-off-by: Oliver Koenig <[email protected]>

* all your comments are belong to us

Signed-off-by: Oliver Koenig <[email protected]>

* remove github output

Signed-off-by: Oliver Koenig <[email protected]>

* revive more comments

Signed-off-by: Oliver Koenig <[email protected]>

* add L2: ASR dev run - part two

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix T5 G2P Input and Output Types (#9224) (#9269)

* fix t5 g2p model

* Apply isort and black reformatting

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198)

* Fix the "cast ping pong" problem when we run AMP inference.

This has been tested only for Parakeet-CTC-1.1B right now. This
problem certainly exists elsewhere.

Automatic mixed precision and inference do not play well together.

First, automatic mixed precision was created back when neural networks
were much simpler. In particular, they did not have softmax and layer
norm as frequent operations. In the era of transformers, softmax and
layer norm are very common. AMP will uncoditionally output fp32
outputs from these operations, even if their inputs are fp16. See
here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32

This is no longer necessary, now that layer norm does accumulation in
fp32 in pytorch, even if the input is fp16:
https://github.com/pytorch/pytorch/issues/66707

Do infernece by casting model to bfloat16, not by using AMP.

Do feature preprocessing in float32 for accuracy. Warn if someone
tries to input a non-float32 tensor.

Always create the output in the type the rest of the model expects.

Sort manifests by duration.

Signed-off-by: Daniel Galvez <[email protected]>

* Always cast softmax inputs to float32 when in training mode.

While we don't need this for accurate results in b/float16, this is a
safety precaution to make sure that training accuracy does not
regress.

Signed-off-by: Daniel Galvez <[email protected]>

---------

Signed-off-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Huvu/rag pipeline citest (#9384)

* huvu/NeMo_rag_citest first commit

* adding llama-index to dependency

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjusting data/models path in ci-test to dependency

* putting llama-index to optional

* update cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <[email protected]>

* Re-org export code (#9353)

* reorg the export code

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* replaced log with raise

Signed-off-by: Onur Yilmaz <[email protected]>

* add converter and loader folders

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_ckpt_convert into the converter folder

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_file into loader folder

Signed-off-by: Onur Yilmaz <[email protected]>

* reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo file back into nemo folder

Signed-off-by: Onur Yilmaz <[email protected]>

* renamed nemo folder to nemo_ckpt_loader

Signed-off-by: Onur Yilmaz <[email protected]>

* remove unused function

Signed-off-by: Onur Yilmaz <[email protected]>

* removed nemo file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* moved a function to tensorrt_llm_run file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* Remove unused imports

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* import csv added

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399)

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* disable overlap for qkv (#9079)

* disable overlap for qkv (#9072)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix circular import for MM dataprep notebook (#9287) (#9292)

* update launcher name and fix mm circular import

* Apply isort and black reformatting

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* add check if num layers is divisible by pp size (#9208) (#9298)

* add check if num_layers % pp == 0

* Apply isort and black reformatting

* move num_layers / pp check to build_transformer_config

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add HF siglip vision encoder (#9185)

* temp save

Signed-off-by: yaoyu-33 <[email protected]>

* temp save 2

Signed-off-by: yaoyu-33 <[email protected]>

* update code

Signed-off-by: yaoyu-33 <[email protected]>

* enable seq packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix neva and clip

Signed-off-by: yaoyu-33 <[email protected]>

* Enable parallel seq packing algo and few other fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Pipeline parallel support

Signed-off-by: yaoyu-33 <[email protected]>

* Update data preprocess

Signed-off-by: yaoyu-33 <[email protected]>

* fix few pp issues

Signed-off-by: yaoyu-33 <[email protected]>

* enable sequence packing w/ PP

Signed-off-by: yaoyu-33 <[email protected]>

* Fix cu_seqlens in inputs

Signed-off-by: yaoyu-33 <[email protected]>

* add assert

Signed-off-by: yaoyu-33 <[email protected]>

* Depend on PP to decide whether do padding

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few PP evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Address comments

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add llama3 template

Signed-off-by: yaoyu-33 <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* Fix license

Signed-off-by: yaoyu-33 <[email protected]>

* Fix llama3

Signed-off-by: yaoyu-33 <[email protected]>

* Few fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* llama3 inference fix

Signed-off-by: yaoyu-33 <[email protected]>

* Force vision encoder to run in fp32

Signed-off-by: yaoyu-33 <[email protected]>

* Revert "Force vision encoder to run in fp32"

This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Try adding distributed format of checkpoint

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Allow dist checkpoint to be non-strict

Signed-off-by: yaoyu-33 <[email protected]>

* Fix

Signed-off-by: yaoyu-33 <[email protected]>

* Some fixes for PP + dist ckpt in Neva

Signed-off-by: yaoyu-33 <[email protected]>

* fix peft

Signed-off-by: yaoyu-33 <[email protected]>

* few fixes for lora

Signed-off-by: yaoyu-33 <[email protected]>

* checkpoint updates

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* bug fix

Signed-off-by: yaoyu-33 <[email protected]>

* Add HF siglip vision encoder

Signed-off-by: HuiyingLi <[email protected]>

* handle steerlm label in nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* Add neva dist checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix CLEAN RESPONSE logic to not use last EOS

Signed-off-by: HuiyingLi <[email protected]>

* strip extra_id_1 from clean response

Signed-off-by: HuiyingLi <[email protected]>

* change inference time image processor

Signed-off-by: HuiyingLi <[email protected]>

* resolve comments

Signed-off-by: yaoyu-33 <[email protected]>

* remove open_clip vision encoder for siglip

Signed-off-by: HuiyingLi <[email protected]>

* update neva dist ckpt apis

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix return

Signed-off-by: yaoyu-33 <[email protected]>

* resolve CLEAN RESPONSE multiturn issue

Signed-off-by: HuiyingLi <[email protected]>

* code format

Signed-off-by: HuiyingLi <[email protected]>

* fixes for isort

Signed-off-by: HuiyingLi <[email protected]>

* refac image processor loading to util

Signed-off-by: HuiyingLi <[email protected]>

* black and isort

Signed-off-by: HuiyingLi <[email protected]>

* move crop size assertion

Signed-off-by: HuiyingLi <[email protected]>

* few neva fixes

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* [Nemo CICD] timeouts fix (#9407)

* timeouts fix

* timeouts fix

Signed-off-by: Marc Romeyn <[email protected]>

* Removing un-used ModelConfig class (#9389)

Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer

* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)

* update apex install in dockerfile

* use fetch head

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support …
ziw-liu added a commit to mehta-lab/VisCy that referenced this issue Jun 11, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* always use untrainable head for FCMAE

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* custom head

* ddp caching fixes

* fix caching when using combined loader

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix normalization in example config

* prefetch more in validation

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* typing fixes

* fix test dataset

* fix invert transform

* add ddp prepare flag for combined data module

* remove redundant operations

* filter empty detections

* pass trainer to underlying data modules in concatenated

* hack: add test dataloader for LiveCell dataset

* test datasets for livecell and ctmc

* fix merge error

* fix merge error

* fix mAP default for over 100 detections

* bump torchmetric

* fix combined loader training for virtual staining task

* fix non-combined data loader training

* add fcmae to graph script

* fix type hint

* format

* add back convolutiuon option for fcmae head

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
janekl added a commit to NVIDIA/NeMo that referenced this issue Jun 12, 2024
* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>


---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer



* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values



---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)



* update apex install in dockerfile



* use fetch head



---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support lora weight tying

Signed-off-by: jasonwan <[email protected]>

* add copyright header

Signed-off-by: jasonwan <[email protected]>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <[email protected]>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <[email protected]>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <[email protected]>

* revert mcore config change

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* clean up ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* rollback git merge errors

Signed-off-by: jasonwan <[email protected]>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <[email protected]>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <[email protected]>

* one more fix for optional test set

Signed-off-by: jasonwan <[email protected]>

* support merging lora weights in mcore

Signed-off-by: jasonwan <[email protected]>

* update mcore for cpu init

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <[email protected]>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <[email protected]>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <[email protected]>

* disable dist ckpt in peft

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <[email protected]>

* add ckpt conversion to CI

Signed-off-by: jasonwan <[email protected]>

* update CI

Signed-off-by: jasonwan <[email protected]>

* mcore_mixin docstring

Signed-off-by: jasonwan <[email protected]>

* minor change in mcore peft error message

Signed-off-by: jasonwan <[email protected]>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <[email protected]>

* correct mcore fp8 config

Signed-off-by: jasonwan <[email protected]>

* add TE installation

Signed-off-by: jasonwan <[email protected]>

* support mcore adapter tuning

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <[email protected]>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <[email protected]>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <[email protected]>

* minor fix for prompt learning

Signed-off-by: jasonwan <[email protected]>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <[email protected]>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: Kelvin Liu <[email protected]>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <[email protected]>

* adjust key names based on O2

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <[email protected]>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <[email protected]>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <[email protected]>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <[email protected]>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <[email protected]>

* Fix minor typo

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <[email protected]>
Co-authored-by: Alexander Jipa <[email protected]>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <[email protected]>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

---------

Signed-off-by: omahs <[email protected]>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <[email protected]>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <[email protected]>

* use mcore

Signed-off-by: eharper <[email protected]>

* revert test change

Signed-off-by: eharper <[email protected]>

* pleasefixme

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <[email protected]>

* comment test temporarily

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <[email protected]>

* revert comment (#7368)

Signed-off-by: eharper <[email protected]>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <[email protected]>

* upper bounding ptl (#7370)

Signed-off-by: eharper <[email protected]>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <[email protected]>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <[email protected]>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <[email protected]>

* add to llama inference config

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progres…
janekl added a commit to NVIDIA/NeMo that referenced this issue Jun 12, 2024
…rategy (#9387)

* Integrating mcore's DistributedDataParallel into MegatronStrategy

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Apply ddp-hooks from pytorch only when needed

Signed-off-by: Marc Romeyn <[email protected]>

* bugfix if using mcore distOpt with sft (#9356)

* bugfix if using mcore distOpt

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* fix typo infer_seq_lenght -> infer_seq_length (#9370)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Rachitg/ag (#9083)

* Rachitg/ag (#9081)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix

* bugfix

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Adding the original change made for label_models (#9377) (#9378)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253)

* Lazily warn about using greedy strategy instead of greedy_batch
strategy.

Previously, the warning would often run spuriously, since several
existing code paths simply call "change_decoding_strategy()" after
having first initialized a Module, rather than changing the config
before initializing the Module. This can be confusing.

The only problem I can see with this is that using logging inside a
forward() method might interfere with some compiler toolkits like
Torchscript or thunder.compile. Presumably it would be easy to add a
conditional statement to avoid this statement in a compiler context if
necessary.

Signed-off-by: Daniel Galvez <[email protected]>
Co-authored-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Update README.rst (#9393)

Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed  nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes.

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* a2a fix removed tp world size and group from init (#8944) (#8952)

Signed-off-by: Anmol Gupta <[email protected]>
Co-authored-by: anmolgupt <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add config option for FP32 embedding grads (#8953)

* Add config option for FP32 embedding grads (#8946)

Signed-off-by: Tim Moon <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Changes to enable CUDA graph for LLM (#8955)

* Changes to enable CUDA graph for LLM (#8751)

* Use next instead of get_batch

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* CUDA graph changes

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change to enable CG with weight caching

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Use next instead of get_batch"

This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"

This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Remove skip_weight_update argument

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Bug fix + cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use new TE API for FP8 Param transpose

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change config param cuda_graph to enable_cuda_graph

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Enable TE RNGStatesTracker through config

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change te_rng_tracker to use_te_rng_tracker

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* FP8 weight transpose handled inside TE

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py""

This reverts commit e31862481216f9adf7fa584a0c0262916c935639.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: vasunvidia <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Enhance Distributed Adam (#9051)

* Enhance Distributed Adam (#9037)

* Fix deprecated env.

Signed-off-by: Wil Kong <[email protected]>

* Use user desired value for distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Preserve memory format in parameter buffer of distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather.

Signed-off-by: Wil Kong <[email protected]>

* Provide API to lock SHArP tree for distributed adam within nodes.

Signed-off-by: Wil Kong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Wil Kong <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390)

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Apply isort and black reformatting

---------

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: tango4j <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Properly catch failed tests by introduction of workflow templates (#9324)

* ci: Refactor tests into reusable template

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Fix sending alerts on failure

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* disable slack

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix alerting

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Increase timeout for `L0_Unit_Tests_CPU`

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout for `Speech_Checkpoints_tests`

Signed-off-by: Oliver Koenig <[email protected]>

* improve readability

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* finalize

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* add missing rm statement for `L2_PTQ_Llama2_Export_Only`

Signed-off-by: Oliver Koenig <[email protected]>

* all your comments are belong to us

Signed-off-by: Oliver Koenig <[email protected]>

* remove github output

Signed-off-by: Oliver Koenig <[email protected]>

* revive more comments

Signed-off-by: Oliver Koenig <[email protected]>

* add L2: ASR dev run - part two

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix T5 G2P Input and Output Types (#9224) (#9269)

* fix t5 g2p model

* Apply isort and black reformatting

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198)

* Fix the "cast ping pong" problem when we run AMP inference.

This has been tested only for Parakeet-CTC-1.1B right now. This
problem certainly exists elsewhere.

Automatic mixed precision and inference do not play well together.

First, automatic mixed precision was created back when neural networks
were much simpler. In particular, they did not have softmax and layer
norm as frequent operations. In the era of transformers, softmax and
layer norm are very common. AMP will uncoditionally output fp32
outputs from these operations, even if their inputs are fp16. See
here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32

This is no longer necessary, now that layer norm does accumulation in
fp32 in pytorch, even if the input is fp16:
https://github.com/pytorch/pytorch/issues/66707

Do infernece by casting model to bfloat16, not by using AMP.

Do feature preprocessing in float32 for accuracy. Warn if someone
tries to input a non-float32 tensor.

Always create the output in the type the rest of the model expects.

Sort manifests by duration.

Signed-off-by: Daniel Galvez <[email protected]>

* Always cast softmax inputs to float32 when in training mode.

While we don't need this for accurate results in b/float16, this is a
safety precaution to make sure that training accuracy does not
regress.

Signed-off-by: Daniel Galvez <[email protected]>

---------

Signed-off-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Huvu/rag pipeline citest (#9384)

* huvu/NeMo_rag_citest first commit

* adding llama-index to dependency

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjusting data/models path in ci-test to dependency

* putting llama-index to optional

* update cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <[email protected]>

* Re-org export code (#9353)

* reorg the export code

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* replaced log with raise

Signed-off-by: Onur Yilmaz <[email protected]>

* add converter and loader folders

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_ckpt_convert into the converter folder

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_file into loader folder

Signed-off-by: Onur Yilmaz <[email protected]>

* reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo file back into nemo folder

Signed-off-by: Onur Yilmaz <[email protected]>

* renamed nemo folder to nemo_ckpt_loader

Signed-off-by: Onur Yilmaz <[email protected]>

* remove unused function

Signed-off-by: Onur Yilmaz <[email protected]>

* removed nemo file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* moved a function to tensorrt_llm_run file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* Remove unused imports

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* import csv added

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399)

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* disable overlap for qkv (#9079)

* disable overlap for qkv (#9072)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix circular import for MM dataprep notebook (#9287) (#9292)

* update launcher name and fix mm circular import

* Apply isort and black reformatting

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* add check if num layers is divisible by pp size (#9208) (#9298)

* add check if num_layers % pp == 0

* Apply isort and black reformatting

* move num_layers / pp check to build_transformer_config

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add HF siglip vision encoder (#9185)

* temp save

Signed-off-by: yaoyu-33 <[email protected]>

* temp save 2

Signed-off-by: yaoyu-33 <[email protected]>

* update code

Signed-off-by: yaoyu-33 <[email protected]>

* enable seq packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix neva and clip

Signed-off-by: yaoyu-33 <[email protected]>

* Enable parallel seq packing algo and few other fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Pipeline parallel support

Signed-off-by: yaoyu-33 <[email protected]>

* Update data preprocess

Signed-off-by: yaoyu-33 <[email protected]>

* fix few pp issues

Signed-off-by: yaoyu-33 <[email protected]>

* enable sequence packing w/ PP

Signed-off-by: yaoyu-33 <[email protected]>

* Fix cu_seqlens in inputs

Signed-off-by: yaoyu-33 <[email protected]>

* add assert

Signed-off-by: yaoyu-33 <[email protected]>

* Depend on PP to decide whether do padding

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few PP evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Address comments

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add llama3 template

Signed-off-by: yaoyu-33 <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* Fix license

Signed-off-by: yaoyu-33 <[email protected]>

* Fix llama3

Signed-off-by: yaoyu-33 <[email protected]>

* Few fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* llama3 inference fix

Signed-off-by: yaoyu-33 <[email protected]>

* Force vision encoder to run in fp32

Signed-off-by: yaoyu-33 <[email protected]>

* Revert "Force vision encoder to run in fp32"

This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Try adding distributed format of checkpoint

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Allow dist checkpoint to be non-strict

Signed-off-by: yaoyu-33 <[email protected]>

* Fix

Signed-off-by: yaoyu-33 <[email protected]>

* Some fixes for PP + dist ckpt in Neva

Signed-off-by: yaoyu-33 <[email protected]>

* fix peft

Signed-off-by: yaoyu-33 <[email protected]>

* few fixes for lora

Signed-off-by: yaoyu-33 <[email protected]>

* checkpoint updates

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* bug fix

Signed-off-by: yaoyu-33 <[email protected]>

* Add HF siglip vision encoder

Signed-off-by: HuiyingLi <[email protected]>

* handle steerlm label in nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* Add neva dist checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix CLEAN RESPONSE logic to not use last EOS

Signed-off-by: HuiyingLi <[email protected]>

* strip extra_id_1 from clean response

Signed-off-by: HuiyingLi <[email protected]>

* change inference time image processor

Signed-off-by: HuiyingLi <[email protected]>

* resolve comments

Signed-off-by: yaoyu-33 <[email protected]>

* remove open_clip vision encoder for siglip

Signed-off-by: HuiyingLi <[email protected]>

* update neva dist ckpt apis

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix return

Signed-off-by: yaoyu-33 <[email protected]>

* resolve CLEAN RESPONSE multiturn issue

Signed-off-by: HuiyingLi <[email protected]>

* code format

Signed-off-by: HuiyingLi <[email protected]>

* fixes for isort

Signed-off-by: HuiyingLi <[email protected]>

* refac image processor loading to util

Signed-off-by: HuiyingLi <[email protected]>

* black and isort

Signed-off-by: HuiyingLi <[email protected]>

* move crop size assertion

Signed-off-by: HuiyingLi <[email protected]>

* few neva fixes

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* [Nemo CICD] timeouts fix (#9407)

* timeouts fix

* timeouts fix

Signed-off-by: Marc Romeyn <[email protected]>

* Removing un-used ModelConfig class (#9389)

Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer

* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)

* update apex install in dockerfile

* use fetch head

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support …
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* always use untrainable head for FCMAE

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* custom head

* ddp caching fixes

* fix caching when using combined loader

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix normalization in example config

* prefetch more in validation

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* typing fixes

* fix test dataset

* fix invert transform

* add ddp prepare flag for combined data module

* remove redundant operations

* filter empty detections

* pass trainer to underlying data modules in concatenated

* hack: add test dataloader for LiveCell dataset

* test datasets for livecell and ctmc

* fix merge error

* fix merge error

* fix mAP default for over 100 detections

* bump torchmetric

* fix combined loader training for virtual staining task

* fix non-combined data loader training

* add fcmae to graph script

* fix type hint

* format

* add back convolutiuon option for fcmae head

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* always use untrainable head for FCMAE

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* custom head

* ddp caching fixes

* fix caching when using combined loader

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix normalization in example config

* prefetch more in validation

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* typing fixes

* fix test dataset

* fix invert transform

* add ddp prepare flag for combined data module

* remove redundant operations

* filter empty detections

* pass trainer to underlying data modules in concatenated

* hack: add test dataloader for LiveCell dataset

* test datasets for livecell and ctmc

* fix merge error

* fix merge error

* fix mAP default for over 100 detections

* bump torchmetric

* fix combined loader training for virtual staining task

* fix non-combined data loader training

* add fcmae to graph script

* fix type hint

* format

* add back convolutiuon option for fcmae head

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* always use untrainable head for FCMAE

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* custom head

* ddp caching fixes

* fix caching when using combined loader

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix normalization in example config

* prefetch more in validation

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* typing fixes

* fix test dataset

* fix invert transform

* add ddp prepare flag for combined data module

* remove redundant operations

* filter empty detections

* pass trainer to underlying data modules in concatenated

* hack: add test dataloader for LiveCell dataset

* test datasets for livecell and ctmc

* fix merge error

* fix merge error

* fix mAP default for over 100 detections

* bump torchmetric

* fix combined loader training for virtual staining task

* fix non-combined data loader training

* add fcmae to graph script

* fix type hint

* format

* add back convolutiuon option for fcmae head

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit to mehta-lab/VisCy that referenced this issue Jun 18, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* always use untrainable head for FCMAE

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* custom head

* ddp caching fixes

* fix caching when using combined loader

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix normalization in example config

* prefetch more in validation

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* typing fixes

* fix test dataset

* fix invert transform

* add ddp prepare flag for combined data module

* remove redundant operations

* filter empty detections

* pass trainer to underlying data modules in concatenated

* hack: add test dataloader for LiveCell dataset

* test datasets for livecell and ctmc

* fix merge error

* fix merge error

* fix mAP default for over 100 detections

* bump torchmetric

* fix combined loader training for virtual staining task

* fix non-combined data loader training

* add fcmae to graph script

* fix type hint

* format

* add back convolutiuon option for fcmae head

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
JesusPaz pushed a commit to JesusPaz/NeMo that referenced this issue Jun 18, 2024
…DIA#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>


---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer



* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values



---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)



* update apex install in dockerfile



* use fetch head



---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support lora weight tying

Signed-off-by: jasonwan <[email protected]>

* add copyright header

Signed-off-by: jasonwan <[email protected]>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <[email protected]>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <[email protected]>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <[email protected]>

* revert mcore config change

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* clean up ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* rollback git merge errors

Signed-off-by: jasonwan <[email protected]>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <[email protected]>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <[email protected]>

* one more fix for optional test set

Signed-off-by: jasonwan <[email protected]>

* support merging lora weights in mcore

Signed-off-by: jasonwan <[email protected]>

* update mcore for cpu init

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <[email protected]>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <[email protected]>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <[email protected]>

* disable dist ckpt in peft

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <[email protected]>

* add ckpt conversion to CI

Signed-off-by: jasonwan <[email protected]>

* update CI

Signed-off-by: jasonwan <[email protected]>

* mcore_mixin docstring

Signed-off-by: jasonwan <[email protected]>

* minor change in mcore peft error message

Signed-off-by: jasonwan <[email protected]>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <[email protected]>

* correct mcore fp8 config

Signed-off-by: jasonwan <[email protected]>

* add TE installation

Signed-off-by: jasonwan <[email protected]>

* support mcore adapter tuning

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <[email protected]>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <[email protected]>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <[email protected]>

* minor fix for prompt learning

Signed-off-by: jasonwan <[email protected]>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <[email protected]>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: Kelvin Liu <[email protected]>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <[email protected]>

* adjust key names based on O2

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <[email protected]>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <[email protected]>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <[email protected]>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <[email protected]>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <[email protected]>

* Fix minor typo

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <[email protected]>
Co-authored-by: Alexander Jipa <[email protected]>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <[email protected]>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

---------

Signed-off-by: omahs <[email protected]>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <[email protected]>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <[email protected]>

* use mcore

Signed-off-by: eharper <[email protected]>

* revert test change

Signed-off-by: eharper <[email protected]>

* pleasefixme

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <[email protected]>

* comment test temporarily

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <[email protected]>

* revert comment (#7368)

Signed-off-by: eharper <[email protected]>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <[email protected]>

* upper bounding ptl (#7370)

Signed-off-by: eharper <[email protected]>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <[email protected]>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <[email protected]>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <[email protected]>

* add to llama inference config

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progres…
JesusPaz pushed a commit to JesusPaz/NeMo that referenced this issue Jun 18, 2024
…rategy (NVIDIA#9387)

* Integrating mcore's DistributedDataParallel into MegatronStrategy

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Apply ddp-hooks from pytorch only when needed

Signed-off-by: Marc Romeyn <[email protected]>

* bugfix if using mcore distOpt with sft (#9356)

* bugfix if using mcore distOpt

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* fix typo infer_seq_lenght -> infer_seq_length (#9370)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Rachitg/ag (#9083)

* Rachitg/ag (#9081)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix

* bugfix

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Adding the original change made for label_models (#9377) (#9378)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253)

* Lazily warn about using greedy strategy instead of greedy_batch
strategy.

Previously, the warning would often run spuriously, since several
existing code paths simply call "change_decoding_strategy()" after
having first initialized a Module, rather than changing the config
before initializing the Module. This can be confusing.

The only problem I can see with this is that using logging inside a
forward() method might interfere with some compiler toolkits like
Torchscript or thunder.compile. Presumably it would be easy to add a
conditional statement to avoid this statement in a compiler context if
necessary.

Signed-off-by: Daniel Galvez <[email protected]>
Co-authored-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Update README.rst (#9393)

Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed  nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes.

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* a2a fix removed tp world size and group from init (#8944) (#8952)

Signed-off-by: Anmol Gupta <[email protected]>
Co-authored-by: anmolgupt <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add config option for FP32 embedding grads (#8953)

* Add config option for FP32 embedding grads (#8946)

Signed-off-by: Tim Moon <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Changes to enable CUDA graph for LLM (#8955)

* Changes to enable CUDA graph for LLM (#8751)

* Use next instead of get_batch

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* CUDA graph changes

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change to enable CG with weight caching

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Use next instead of get_batch"

This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"

This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Remove skip_weight_update argument

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Bug fix + cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use new TE API for FP8 Param transpose

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change config param cuda_graph to enable_cuda_graph

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Enable TE RNGStatesTracker through config

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change te_rng_tracker to use_te_rng_tracker

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* FP8 weight transpose handled inside TE

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py""

This reverts commit e31862481216f9adf7fa584a0c0262916c935639.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: vasunvidia <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Enhance Distributed Adam (#9051)

* Enhance Distributed Adam (#9037)

* Fix deprecated env.

Signed-off-by: Wil Kong <[email protected]>

* Use user desired value for distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Preserve memory format in parameter buffer of distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather.

Signed-off-by: Wil Kong <[email protected]>

* Provide API to lock SHArP tree for distributed adam within nodes.

Signed-off-by: Wil Kong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Wil Kong <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390)

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Apply isort and black reformatting

---------

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: tango4j <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Properly catch failed tests by introduction of workflow templates (#9324)

* ci: Refactor tests into reusable template

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Fix sending alerts on failure

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* disable slack

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix alerting

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Increase timeout for `L0_Unit_Tests_CPU`

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout for `Speech_Checkpoints_tests`

Signed-off-by: Oliver Koenig <[email protected]>

* improve readability

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* finalize

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* add missing rm statement for `L2_PTQ_Llama2_Export_Only`

Signed-off-by: Oliver Koenig <[email protected]>

* all your comments are belong to us

Signed-off-by: Oliver Koenig <[email protected]>

* remove github output

Signed-off-by: Oliver Koenig <[email protected]>

* revive more comments

Signed-off-by: Oliver Koenig <[email protected]>

* add L2: ASR dev run - part two

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix T5 G2P Input and Output Types (#9224) (#9269)

* fix t5 g2p model

* Apply isort and black reformatting

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198)

* Fix the "cast ping pong" problem when we run AMP inference.

This has been tested only for Parakeet-CTC-1.1B right now. This
problem certainly exists elsewhere.

Automatic mixed precision and inference do not play well together.

First, automatic mixed precision was created back when neural networks
were much simpler. In particular, they did not have softmax and layer
norm as frequent operations. In the era of transformers, softmax and
layer norm are very common. AMP will uncoditionally output fp32
outputs from these operations, even if their inputs are fp16. See
here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32

This is no longer necessary, now that layer norm does accumulation in
fp32 in pytorch, even if the input is fp16:
https://github.com/pytorch/pytorch/issues/66707

Do infernece by casting model to bfloat16, not by using AMP.

Do feature preprocessing in float32 for accuracy. Warn if someone
tries to input a non-float32 tensor.

Always create the output in the type the rest of the model expects.

Sort manifests by duration.

Signed-off-by: Daniel Galvez <[email protected]>

* Always cast softmax inputs to float32 when in training mode.

While we don't need this for accurate results in b/float16, this is a
safety precaution to make sure that training accuracy does not
regress.

Signed-off-by: Daniel Galvez <[email protected]>

---------

Signed-off-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Huvu/rag pipeline citest (#9384)

* huvu/NeMo_rag_citest first commit

* adding llama-index to dependency

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjusting data/models path in ci-test to dependency

* putting llama-index to optional

* update cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <[email protected]>

* Re-org export code (#9353)

* reorg the export code

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* replaced log with raise

Signed-off-by: Onur Yilmaz <[email protected]>

* add converter and loader folders

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_ckpt_convert into the converter folder

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_file into loader folder

Signed-off-by: Onur Yilmaz <[email protected]>

* reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo file back into nemo folder

Signed-off-by: Onur Yilmaz <[email protected]>

* renamed nemo folder to nemo_ckpt_loader

Signed-off-by: Onur Yilmaz <[email protected]>

* remove unused function

Signed-off-by: Onur Yilmaz <[email protected]>

* removed nemo file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* moved a function to tensorrt_llm_run file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* Remove unused imports

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* import csv added

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399)

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* disable overlap for qkv (#9079)

* disable overlap for qkv (#9072)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix circular import for MM dataprep notebook (#9287) (#9292)

* update launcher name and fix mm circular import

* Apply isort and black reformatting

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* add check if num layers is divisible by pp size (#9208) (#9298)

* add check if num_layers % pp == 0

* Apply isort and black reformatting

* move num_layers / pp check to build_transformer_config

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add HF siglip vision encoder (#9185)

* temp save

Signed-off-by: yaoyu-33 <[email protected]>

* temp save 2

Signed-off-by: yaoyu-33 <[email protected]>

* update code

Signed-off-by: yaoyu-33 <[email protected]>

* enable seq packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix neva and clip

Signed-off-by: yaoyu-33 <[email protected]>

* Enable parallel seq packing algo and few other fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Pipeline parallel support

Signed-off-by: yaoyu-33 <[email protected]>

* Update data preprocess

Signed-off-by: yaoyu-33 <[email protected]>

* fix few pp issues

Signed-off-by: yaoyu-33 <[email protected]>

* enable sequence packing w/ PP

Signed-off-by: yaoyu-33 <[email protected]>

* Fix cu_seqlens in inputs

Signed-off-by: yaoyu-33 <[email protected]>

* add assert

Signed-off-by: yaoyu-33 <[email protected]>

* Depend on PP to decide whether do padding

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few PP evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Address comments

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add llama3 template

Signed-off-by: yaoyu-33 <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* Fix license

Signed-off-by: yaoyu-33 <[email protected]>

* Fix llama3

Signed-off-by: yaoyu-33 <[email protected]>

* Few fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* llama3 inference fix

Signed-off-by: yaoyu-33 <[email protected]>

* Force vision encoder to run in fp32

Signed-off-by: yaoyu-33 <[email protected]>

* Revert "Force vision encoder to run in fp32"

This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Try adding distributed format of checkpoint

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Allow dist checkpoint to be non-strict

Signed-off-by: yaoyu-33 <[email protected]>

* Fix

Signed-off-by: yaoyu-33 <[email protected]>

* Some fixes for PP + dist ckpt in Neva

Signed-off-by: yaoyu-33 <[email protected]>

* fix peft

Signed-off-by: yaoyu-33 <[email protected]>

* few fixes for lora

Signed-off-by: yaoyu-33 <[email protected]>

* checkpoint updates

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* bug fix

Signed-off-by: yaoyu-33 <[email protected]>

* Add HF siglip vision encoder

Signed-off-by: HuiyingLi <[email protected]>

* handle steerlm label in nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* Add neva dist checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix CLEAN RESPONSE logic to not use last EOS

Signed-off-by: HuiyingLi <[email protected]>

* strip extra_id_1 from clean response

Signed-off-by: HuiyingLi <[email protected]>

* change inference time image processor

Signed-off-by: HuiyingLi <[email protected]>

* resolve comments

Signed-off-by: yaoyu-33 <[email protected]>

* remove open_clip vision encoder for siglip

Signed-off-by: HuiyingLi <[email protected]>

* update neva dist ckpt apis

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix return

Signed-off-by: yaoyu-33 <[email protected]>

* resolve CLEAN RESPONSE multiturn issue

Signed-off-by: HuiyingLi <[email protected]>

* code format

Signed-off-by: HuiyingLi <[email protected]>

* fixes for isort

Signed-off-by: HuiyingLi <[email protected]>

* refac image processor loading to util

Signed-off-by: HuiyingLi <[email protected]>

* black and isort

Signed-off-by: HuiyingLi <[email protected]>

* move crop size assertion

Signed-off-by: HuiyingLi <[email protected]>

* few neva fixes

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* [Nemo CICD] timeouts fix (#9407)

* timeouts fix

* timeouts fix

Signed-off-by: Marc Romeyn <[email protected]>

* Removing un-used ModelConfig class (#9389)

Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer

* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)

* update apex install in dockerfile

* use fetch head

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support …
rohitrango pushed a commit to rohitrango/NeMo that referenced this issue Jun 25, 2024
…DIA#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>


---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer



* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values



---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit c24bb454bf1fa6f5820f1805c6387254a73220b9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)



* update apex install in dockerfile



* use fetch head



---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support lora weight tying

Signed-off-by: jasonwan <[email protected]>

* add copyright header

Signed-off-by: jasonwan <[email protected]>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <[email protected]>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <[email protected]>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <[email protected]>

* revert mcore config change

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* clean up ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* rollback git merge errors

Signed-off-by: jasonwan <[email protected]>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <[email protected]>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <[email protected]>

* one more fix for optional test set

Signed-off-by: jasonwan <[email protected]>

* support merging lora weights in mcore

Signed-off-by: jasonwan <[email protected]>

* update mcore for cpu init

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <[email protected]>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <[email protected]>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <[email protected]>

* disable dist ckpt in peft

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <[email protected]>

* add ckpt conversion to CI

Signed-off-by: jasonwan <[email protected]>

* update CI

Signed-off-by: jasonwan <[email protected]>

* mcore_mixin docstring

Signed-off-by: jasonwan <[email protected]>

* minor change in mcore peft error message

Signed-off-by: jasonwan <[email protected]>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <[email protected]>

* correct mcore fp8 config

Signed-off-by: jasonwan <[email protected]>

* add TE installation

Signed-off-by: jasonwan <[email protected]>

* support mcore adapter tuning

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <[email protected]>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <[email protected]>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <[email protected]>

* minor fix for prompt learning

Signed-off-by: jasonwan <[email protected]>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <[email protected]>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: Kelvin Liu <[email protected]>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging. Signed-off-by: Micha Livne <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <[email protected]>

* adjust key names based on O2

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <[email protected]>

* list of fields for context

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <[email protected]>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <[email protected]>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <[email protected]>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <[email protected]>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <[email protected]>

* Fix minor typo

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <[email protected]>
Co-authored-by: Alexander Jipa <[email protected]>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <[email protected]>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typos

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

* fix typo

Signed-off-by: omahs <[email protected]>

---------

Signed-off-by: omahs <[email protected]>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <[email protected]>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <[email protected]>

* use mcore

Signed-off-by: eharper <[email protected]>

* revert test change

Signed-off-by: eharper <[email protected]>

* pleasefixme

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <[email protected]>

* comment test temporarily

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <[email protected]>

* revert comment (#7368)

Signed-off-by: eharper <[email protected]>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <[email protected]>

* upper bounding ptl (#7370)

Signed-off-by: eharper <[email protected]>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <[email protected]>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <[email protected]>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <[email protected]>

* add to llama inference config

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <[email protected]>

* remove copy from other models

Signed-off-by: Maanu Grover <[email protected]>

* modify attribute not arg

Signed-off-by: Maanu Grover <[email protected]>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <[email protected]>

* rename function and add docstring

Signed-off-by: Maanu Grover <[email protected]>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <[email protected]>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <[email protected]>

* set default value

Signed-off-by: Maanu Grover <[email protected]>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <[email protected]>

* rename mapping function

Signed-off-by: Maanu Grover <[email protected]>

* ununsed import

Signed-off-by: Maanu Grover <[email protected]>

* save torch datatype to model

Signed-off-by: Maanu Grover <[email protected]>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <[email protected]>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <[email protected]>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <[email protected]>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <[email protected]>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <[email protected]>

* unused imports

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <[email protected]>

* Fix typo

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <[email protected]>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <[email protected]>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <[email protected]>

* Update Jenkinsfile

Signed-off-by: Jason Wang <[email protected]>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <[email protected]>

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <[email protected]>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <[email protected]>

* update commit

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <[email protected]>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <[email protected]>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <[email protected]>

* use trainer arg

Signed-off-by: Maanu Grover <[email protected]>

---------

Signed-off-by: Maanu Grover <[email protected]>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <[email protected]>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <[email protected]>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* Refactor

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <[email protected]>

* move dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <[email protected]>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <[email protected]>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <[email protected]>

* fix load dist ckpt

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <[email protected]>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Jimmy Zhang <[email protected]>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progres…
rohitrango pushed a commit to rohitrango/NeMo that referenced this issue Jun 25, 2024
…rategy (NVIDIA#9387)

* Integrating mcore's DistributedDataParallel into MegatronStrategy

Signed-off-by: Marc Romeyn <[email protected]>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Apply ddp-hooks from pytorch only when needed

Signed-off-by: Marc Romeyn <[email protected]>

* bugfix if using mcore distOpt with sft (#9356)

* bugfix if using mcore distOpt

Signed-off-by: Alexandros Koumparoulis <[email protected]>

* Apply isort and black reformatting

Signed-off-by: akoumpa <[email protected]>

---------

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Signed-off-by: akoumpa <[email protected]>
Co-authored-by: akoumpa <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* fix typo infer_seq_lenght -> infer_seq_length (#9370)

Signed-off-by: Alexandros Koumparoulis <[email protected]>
Co-authored-by: Marc Romeyn <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Rachitg/ag (#9083)

* Rachitg/ag (#9081)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix

* bugfix

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Adding the original change made for label_models (#9377) (#9378)

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Dgalvez/fix greedy batch strategy name r2.0.0rc0 (#9243) (#9253)

* Lazily warn about using greedy strategy instead of greedy_batch
strategy.

Previously, the warning would often run spuriously, since several
existing code paths simply call "change_decoding_strategy()" after
having first initialized a Module, rather than changing the config
before initializing the Module. This can be confusing.

The only problem I can see with this is that using logging inside a
forward() method might interfere with some compiler toolkits like
Torchscript or thunder.compile. Presumably it would be easy to add a
conditional statement to avoid this statement in a compiler context if
necessary.

Signed-off-by: Daniel Galvez <[email protected]>
Co-authored-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Update README.rst (#9393)

Revised content per https://gitlab-master.nvidia.com/nemo-framework-tme/documentation/-/issues/25. Also removed reference to NIMs in LLMs and MMs Deployment and Optimization. It should be NVIDIA NeMo Microservices and not NIM. Removed  nemo:24.03.framework and nemo:24.01.speech in Docker Containers section and replaced with 24.05 . Please verify all changes.

Signed-off-by: jgerh <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* a2a fix removed tp world size and group from init (#8944) (#8952)

Signed-off-by: Anmol Gupta <[email protected]>
Co-authored-by: anmolgupt <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add config option for FP32 embedding grads (#8953)

* Add config option for FP32 embedding grads (#8946)

Signed-off-by: Tim Moon <[email protected]>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Changes to enable CUDA graph for LLM (#8955)

* Changes to enable CUDA graph for LLM (#8751)

* Use next instead of get_batch

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* CUDA graph changes

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change to enable CG with weight caching

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Use next instead of get_batch"

This reverts commit 0021bb444cdd1b27674fc0cfea909c1a42475336.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py

Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py"

This reverts commit b4f736ed2b39f6c48d2868ac3febb82c763ab3fb.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Remove skip_weight_update argument

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Bug fix + cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Use new TE API for FP8 Param transpose

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change config param cuda_graph to enable_cuda_graph

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Enable TE RNGStatesTracker through config

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Change te_rng_tracker to use_te_rng_tracker

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* FP8 weight transpose handled inside TE

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Cleanup

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Revert "Revert "Copy jbaczek/mcore_parallel_state_api_change branch leaving out changes to nemo/export/quantize/quantizer.py""

This reverts commit e31862481216f9adf7fa584a0c0262916c935639.

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

* Fix merge conflicts

Signed-off-by: Vasudevan Rengasamy <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Vasudevan Rengasamy <[email protected]>
Signed-off-by: Jan Baczek <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: vasunvidia <[email protected]>
Co-authored-by: Jaemin Choi <[email protected]>
Co-authored-by: Jan Baczek <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Enhance Distributed Adam (#9051)

* Enhance Distributed Adam (#9037)

* Fix deprecated env.

Signed-off-by: Wil Kong <[email protected]>

* Use user desired value for distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Preserve memory format in parameter buffer of distributed adam.

Signed-off-by: Wil Kong <[email protected]>

* Fix the contiguous_param_buffer bug about bprop overlap and redundant copy after all-gather.

Signed-off-by: Wil Kong <[email protected]>

* Provide API to lock SHArP tree for distributed adam within nodes.

Signed-off-by: Wil Kong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Wil Kong <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: Wil Kong <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: Wil Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ericharper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Force diarizer to use CUDA if cuda is available and if device=None. (#9380) (#9390)

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Fixed clustering diarizer to load MSDD to GPU by default if cuda on

* Apply isort and black reformatting

---------

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: tango4j <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: tango4j <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Properly catch failed tests by introduction of workflow templates (#9324)

* ci: Refactor tests into reusable template

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Fix sending alerts on failure

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* disable slack

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix alerting

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* ci: Increase timeout for `L0_Unit_Tests_CPU`

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout

Signed-off-by: Oliver Koenig <[email protected]>

* increase timeout for `Speech_Checkpoints_tests`

Signed-off-by: Oliver Koenig <[email protected]>

* improve readability

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* test

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* finalize

Signed-off-by: Oliver Koenig <[email protected]>

* fix

Signed-off-by: Oliver Koenig <[email protected]>

* add missing rm statement for `L2_PTQ_Llama2_Export_Only`

Signed-off-by: Oliver Koenig <[email protected]>

* all your comments are belong to us

Signed-off-by: Oliver Koenig <[email protected]>

* remove github output

Signed-off-by: Oliver Koenig <[email protected]>

* revive more comments

Signed-off-by: Oliver Koenig <[email protected]>

* add L2: ASR dev run - part two

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Pablo Garay <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix T5 G2P Input and Output Types (#9224) (#9269)

* fix t5 g2p model

* Apply isort and black reformatting

---------

Signed-off-by: Jason <[email protected]>
Signed-off-by: blisc <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: blisc <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Use model-cast-to-bfloat16 rather than AMP-to-bfloat16 for inference. (#9198)

* Fix the "cast ping pong" problem when we run AMP inference.

This has been tested only for Parakeet-CTC-1.1B right now. This
problem certainly exists elsewhere.

Automatic mixed precision and inference do not play well together.

First, automatic mixed precision was created back when neural networks
were much simpler. In particular, they did not have softmax and layer
norm as frequent operations. In the era of transformers, softmax and
layer norm are very common. AMP will uncoditionally output fp32
outputs from these operations, even if their inputs are fp16. See
here: https://pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float32

This is no longer necessary, now that layer norm does accumulation in
fp32 in pytorch, even if the input is fp16:
https://github.com/pytorch/pytorch/issues/66707

Do infernece by casting model to bfloat16, not by using AMP.

Do feature preprocessing in float32 for accuracy. Warn if someone
tries to input a non-float32 tensor.

Always create the output in the type the rest of the model expects.

Sort manifests by duration.

Signed-off-by: Daniel Galvez <[email protected]>

* Always cast softmax inputs to float32 when in training mode.

While we don't need this for accurate results in b/float16, this is a
safety precaution to make sure that training accuracy does not
regress.

Signed-off-by: Daniel Galvez <[email protected]>

---------

Signed-off-by: Daniel Galvez <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Huvu/rag pipeline citest (#9384)

* huvu/NeMo_rag_citest first commit

* adding llama-index to dependency

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adjusting data/models path in ci-test to dependency

* putting llama-index to optional

* update cicd-main.yml

---------

Co-authored-by: Huy Vu2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Marc Romeyn <[email protected]>

* Re-org export code (#9353)

* reorg the export code

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* replaced log with raise

Signed-off-by: Onur Yilmaz <[email protected]>

* add converter and loader folders

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_ckpt_convert into the converter folder

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo_file into loader folder

Signed-off-by: Onur Yilmaz <[email protected]>

* reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg converter

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* continue to reorg

Signed-off-by: Onur Yilmaz <[email protected]>

* move nemo file back into nemo folder

Signed-off-by: Onur Yilmaz <[email protected]>

* renamed nemo folder to nemo_ckpt_loader

Signed-off-by: Onur Yilmaz <[email protected]>

* remove unused function

Signed-off-by: Onur Yilmaz <[email protected]>

* removed nemo file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* moved a function to tensorrt_llm_run file

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* Remove unused imports

Signed-off-by: Onur Yilmaz <[email protected]>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <[email protected]>

* import csv added

Signed-off-by: Onur Yilmaz <[email protected]>

---------

Signed-off-by: Onur Yilmaz <[email protected]>
Signed-off-by: oyilmaz-nvidia <[email protected]>
Co-authored-by: oyilmaz-nvidia <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* ci: Fix `L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav` (#9399)

Signed-off-by: Oliver Koenig <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* disable overlap for qkv (#9079)

* disable overlap for qkv (#9072)

* disable overlap for qkv

Signed-off-by: Rachit Garg <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: michal2409 <[email protected]>

---------

Signed-off-by: Rachit Garg <[email protected]>
Signed-off-by: michal2409 <[email protected]>
Signed-off-by: Michal Futrega <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: Rachit Garg <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <[email protected]>
Co-authored-by: michal2409 <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Fix circular import for MM dataprep notebook (#9287) (#9292)

* update launcher name and fix mm circular import

* Apply isort and black reformatting

---------

Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Co-authored-by: Chen Cui <[email protected]>
Co-authored-by: cuichenx <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* add check if num layers is divisible by pp size (#9208) (#9298)

* add check if num_layers % pp == 0

* Apply isort and black reformatting

* move num_layers / pp check to build_transformer_config

---------

Signed-off-by: dimapihtar <[email protected]>
Signed-off-by: dimapihtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: dimapihtar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Add HF siglip vision encoder (#9185)

* temp save

Signed-off-by: yaoyu-33 <[email protected]>

* temp save 2

Signed-off-by: yaoyu-33 <[email protected]>

* update code

Signed-off-by: yaoyu-33 <[email protected]>

* enable seq packing

Signed-off-by: yaoyu-33 <[email protected]>

* fix neva and clip

Signed-off-by: yaoyu-33 <[email protected]>

* Enable parallel seq packing algo and few other fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Pipeline parallel support

Signed-off-by: yaoyu-33 <[email protected]>

* Update data preprocess

Signed-off-by: yaoyu-33 <[email protected]>

* fix few pp issues

Signed-off-by: yaoyu-33 <[email protected]>

* enable sequence packing w/ PP

Signed-off-by: yaoyu-33 <[email protected]>

* Fix cu_seqlens in inputs

Signed-off-by: yaoyu-33 <[email protected]>

* add assert

Signed-off-by: yaoyu-33 <[email protected]>

* Depend on PP to decide whether do padding

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add docstring

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Fix few PP evaluation issues

Signed-off-by: yaoyu-33 <[email protected]>

* Address comments

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add llama3 template

Signed-off-by: yaoyu-33 <[email protected]>

* address comments

Signed-off-by: yaoyu-33 <[email protected]>

* Fix license

Signed-off-by: yaoyu-33 <[email protected]>

* Fix llama3

Signed-off-by: yaoyu-33 <[email protected]>

* Few fixes

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Few neva bugs

Signed-off-by: yaoyu-33 <[email protected]>

* llama3 inference fix

Signed-off-by: yaoyu-33 <[email protected]>

* Force vision encoder to run in fp32

Signed-off-by: yaoyu-33 <[email protected]>

* Revert "Force vision encoder to run in fp32"

This reverts commit 9d2160d96cb3e2a27a18538950ef43b4482c04da.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Try adding distributed format of checkpoint

Signed-off-by: yaoyu-33 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Allow dist checkpoint to be non-strict

Signed-off-by: yaoyu-33 <[email protected]>

* Fix

Signed-off-by: yaoyu-33 <[email protected]>

* Some fixes for PP + dist ckpt in Neva

Signed-off-by: yaoyu-33 <[email protected]>

* fix peft

Signed-off-by: yaoyu-33 <[email protected]>

* few fixes for lora

Signed-off-by: yaoyu-33 <[email protected]>

* checkpoint updates

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* bug fix

Signed-off-by: yaoyu-33 <[email protected]>

* Add HF siglip vision encoder

Signed-off-by: HuiyingLi <[email protected]>

* handle steerlm label in nv_dpo template

Signed-off-by: HuiyingLi <[email protected]>

* Add neva dist checkpoint converter

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix CLEAN RESPONSE logic to not use last EOS

Signed-off-by: HuiyingLi <[email protected]>

* strip extra_id_1 from clean response

Signed-off-by: HuiyingLi <[email protected]>

* change inference time image processor

Signed-off-by: HuiyingLi <[email protected]>

* resolve comments

Signed-off-by: yaoyu-33 <[email protected]>

* remove open_clip vision encoder for siglip

Signed-off-by: HuiyingLi <[email protected]>

* update neva dist ckpt apis

Signed-off-by: yaoyu-33 <[email protected]>

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <[email protected]>

* fix return

Signed-off-by: yaoyu-33 <[email protected]>

* resolve CLEAN RESPONSE multiturn issue

Signed-off-by: HuiyingLi <[email protected]>

* code format

Signed-off-by: HuiyingLi <[email protected]>

* fixes for isort

Signed-off-by: HuiyingLi <[email protected]>

* refac image processor loading to util

Signed-off-by: HuiyingLi <[email protected]>

* black and isort

Signed-off-by: HuiyingLi <[email protected]>

* move crop size assertion

Signed-off-by: HuiyingLi <[email protected]>

* few neva fixes

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>

---------

Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: HuiyingLi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Pablo Garay <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* [Nemo CICD] timeouts fix (#9407)

* timeouts fix

* timeouts fix

Signed-off-by: Marc Romeyn <[email protected]>

* Removing un-used ModelConfig class (#9389)

Co-authored-by: Chen Cui <[email protected]>
Signed-off-by: Marc Romeyn <[email protected]>

* Extend multimodal/speech_llm with lhotse, t5 and bestow supports (#9169)

* Fixes

* Docs fix

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* Add support for custom NeMo fields in Lhotse-NeMo adapters (attach to cut.custom)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* support distributed_fused_adam

Signed-off-by: zhehuaichen <[email protected]>

* Add support for sharded NeMo manifest files

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support megatron_amp_O2

Signed-off-by: zhehuaichen <[email protected]>

* Support heterogeneous sampling rates in non tarred NeMo manifests

* migrate to PTL2.0

Signed-off-by: stevehuang52 <[email protected]>

* clean up

Signed-off-by: stevehuang52 <[email protected]>

* update manifest util

Signed-off-by: stevehuang52 <[email protected]>

* Support multiple tokenizer/parser types, aggregate tokenizers, and custom language fields

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* agg and normal tokenizers actually work

* Support weights for NeMo tarred manifests

* Temporarily hardcoded pnc stripping/lowercasing

* fix

* make pnc hack configurable from the config and disabled by default

* fix the hack

* migrate to ptl2.1 to support multiple dataloaders

Signed-off-by: stevehuang52 <[email protected]>

* support encoder overwrite

Signed-off-by: zhehuaichen <[email protected]>

* update misc

Signed-off-by: stevehuang52 <[email protected]>

* fix eval and clean up

Signed-off-by: stevehuang52 <[email protected]>

* support add_sep for perception model

Signed-off-by: zhehuaichen <[email protected]>

* fix https://github.com/Lightning-AI/pytorch-lightning/issues/18803

Signed-off-by: zhehuaichen <[email protected]>

* add_bos

Signed-off-by: zhehuaichen <[email protected]>

* Transformer decoder with conditioning for canary (#8091)

* initial commit for multi-task conf-enc transf-dec for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing decoder states caching during training

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option to limit the number of open streams (#8095)

* audio signal support in multi

Signed-off-by: zhehuaichen <[email protected]>

* update asr evaluator

Signed-off-by: stevehuang52 <[email protected]>

* fix from
https://github.com/NVIDIA/NeMo/commit/fcc0f9f6ff7947c3c7fba3ed17d8ec8af6391397
and
https://github.com/NVIDIA/NeMo/commit/f97c9016e6438ca4174b66bf9c3e248b28197aaa

Signed-off-by: zhehuaichen <[email protected]>

* transcribe fn for Canary models (#8110)

* improve readability

Signed-off-by: Krishna Puvvada <[email protected]>

* adding context in transcribe function for ConfTransfModels

Signed-off-by: Krishna Puvvada <[email protected]>

* supporting relative paths in transcribe function for canary

Signed-off-by: Krishna Puvvada <[email protected]>

* removing cuts.sort_by_duration in __getitem__ to maintain manifest order during inference

Signed-off-by: Krishna Puvvada <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Krishna Puvvada <[email protected]>
Co-authored-by: Krishna Puvvada <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* update for eval

Signed-off-by: stevehuang52 <[email protected]>

* update for evaluation

Signed-off-by: stevehuang52 <[email protected]>

* fix bleu

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Add missing audio_filepath validation for Canary (#8119)

* Add missing audio_filepath validation for Canary

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add default concat_sampling_probabilities

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse dataset in speechllm

Signed-off-by: zhehuaichen <[email protected]>

* bypass get_iterator_k_split

Signed-off-by: zhehuaichen <[email protected]>

* tmp fix

Signed-off-by: zhehuaichen <[email protected]>

* try to use fixed batch with megatron

Signed-off-by: zhehuaichen <[email protected]>

* add batch logging

Signed-off-by: zhehuaichen <[email protected]>

* support unfrozen llm

Signed-off-by: zhehuaichen <[email protected]>

* Create README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* Update README.md

Signed-off-by: He Huang (Steve) <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* rename

Signed-off-by: stevehuang52 <[email protected]>

* add llama prompt template

Signed-off-by: zhehuaichen <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* support sample alpha

Signed-off-by: zhehuaichen <[email protected]>

* support lhotse validation set and canary pretrained ckpt with pseudo label

Signed-off-by: zhehuaichen <[email protected]>

* make sure backward compatibility

Signed-off-by: zhehuaichen <[email protected]>

* remove pad

Signed-off-by: zhehuaichen <[email protected]>

* make sure asr_model is frozen

Signed-off-by: zhehuaichen <[email protected]>

* support greedy decoding

Signed-off-by: zhehuaichen <[email protected]>

* valid on lhotse

Signed-off-by: zhehuaichen <[email protected]>

* fix multi dataloader in val case for lhotse SALM; add default data
names; keep asr model tokenizer by default to enable adding canary
dataset

Signed-off-by: zhehuaichen <[email protected]>

* remove the bruteforce _keep_special_tokens implementation

Signed-off-by: zhehuaichen <[email protected]>

* decoding_ratio and convert_canary_prompt_to_text support

Signed-off-by: zhehuaichen <[email protected]>

* canary_tokens_augment_ratio

Signed-off-by: zhehuaichen <[email protected]>

* debug

Signed-off-by: zhehuaichen <[email protected]>

* bug fix

Signed-off-by: zhehuaichen <[email protected]>

* fix lhotse based eval of llama canary model

Signed-off-by: zhehuaichen <[email protected]>

* support some overwrite for eval

Signed-off-by: zhehuaichen <[email protected]>

* support zero shot prompt in training

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* support cross attention based SALM

Signed-off-by: zhehuaichen <[email protected]>

* fix for batch train/valid of cross

Signed-off-by: zhehuaichen <[email protected]>

* support learnable gate and plotting

Signed-off-by: zhehuaichen <[email protected]>

* support using pseudo label in prompt rather than cross att

Signed-off-by: zhehuaichen <[email protected]>

* bug fix for perception cfg and context tokens shift

Signed-off-by: zhehuaichen <[email protected]>

* DentityConnectorsAdd

Signed-off-by: zhehuaichen <[email protected]>

* fix ckpt saving

Signed-off-by: zhehuaichen <[email protected]>

* Support RnnGatedCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* add include_ffw and fix _optimizer_param_groups for all unfrozen run

Signed-off-by: zhehuaichen <[email protected]>

* support grad acc when using bucket

Signed-off-by: zhehuaichen <[email protected]>

* support TransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ProjectTransformerCrossAttention

Signed-off-by: zhehuaichen <[email protected]>

* support ++model.use_am_tokenizer ++model.override_vocab_size ++model.override.hidden_size

Signed-off-by: zhehuaichen <[email protected]>

* support question set on val without canary

Signed-off-by: zhehuaichen <[email protected]>

* support load_audio_encoder and wip in optim_param_groups

Signed-off-by: zhehuaichen <[email protected]>

* minor fix for audio pretrain model init

Signed-off-by: zhehuaichen <[email protected]>

* simplify canary_tokens_augment

Signed-off-by: zhehuaichen <[email protected]>

* use question in the manifest if it exists

Signed-off-by: zhehuaichen <[email protected]>

* support dataset weighting for non tar

Signed-off-by: zhehuaichen <[email protected]>

* Update SpeechLLM code (#8475)

* add pleasefixme marker for potential failed nightly tests. (#7678)

Signed-off-by: Xuesong Yang <[email protected]>

* Add new text segmentation library for better TTS quality (#7645)

* Add new text segmentation library for better TTS quality
* Update zh_cn_pinyin.py

added detailed instruction on how to install pkuseg.

Signed-off-by: Xuesong Yang <[email protected]>

* Update requirements_tts.txt

remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <[email protected]>

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774)

* Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer

* Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add '32-true' for precision values

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix(clustering_diarizer.py): fix typo (#7772)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* fix(diarization-README): typo (#7771)

Signed-off-by: Jean-Louis Queguiner <[email protected]>

* Fix bug wrt change decoding strategy for bpe models (#7762) (#7764)

* Fix bug wrt change decoding strategy for bpe models

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove incorrect extra argument for load_from_checkpoint_dir() (#7500)

Signed-off-by: Robin Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add nemo to mcore GPT conversion script  (#7730)

* add conversion script

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove references to 'ckpt'

Signed-off-by: Chen Cui <[email protected]>

* add one more sanity check to make sure there is no unexpected keys in state dict

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make cpu loading work

Signed-off-by: Chen Cui <[email protected]>

* make script work for llama2 models

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address code check

Signed-off-by: Chen Cui <[email protected]>

* remove trainer precision (was for old sanity check)

Signed-off-by: Chen Cui <[email protected]>

* fix script for llama2 model

Signed-off-by: Chen Cui <[email protected]>

* remove commented code

Signed-off-by: Chen Cui <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785)

Signed-off-by: anferico <[email protected]>

* Add some docs and update scripts for ASR (#7790)

* Add some docs and update scripts

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* set context for text memmap to fork (#7784)

* set context for text memmap to fork

Signed-off-by: arendu <[email protected]>

* typo

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>

* add training with multiple audios

Signed-off-by: stevehuang52 <[email protected]>

* Support flash decoding (#7744)

* Add flash-decoding

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <[email protected]>

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761)

* Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747)

* Change accelerator to auto

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in nlp_checkpoint_port.py

Signed-off-by: Abhishree <[email protected]>

* Pass omegaconf object to trainer in export.py

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* docs: fix typos (#7758)

Signed-off-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Abhishree <[email protected]>

* Snake act (#7736)

Signed-off-by: Abhishree <[email protected]>

* Update gpt_dataset.py (#6963)

Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Signed-off-by: shuoer86 <[email protected]>
Signed-off-by: Xin Yao <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: shuoer86 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Xin Yao <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788)

* add selection criteria for reference audios

Signed-off-by: anferico <[email protected]>

* Update configuration files

Signed-off-by: anferico <[email protected]>

* add informative comment in config files

Signed-off-by: anferico <[email protected]>

* sample random index for reference audio selection

Signed-off-by: anferico <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: anferico <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update text server to support compute logprobs (#7733)

* update text server to support compute logprobs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

---------

Signed-off-by: Zhilin Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add multi-layer feat extract and fix random question insertion

Signed-off-by: stevehuang52 <[email protected]>

* Configure MCore logger (#7781)

Signed-off-by: Mikołaj Błaż <[email protected]>

* Revert "PEFT eval fix (#7626) (#7638)" (#7693)

This reverts commit c24bb454bf1fa6f5820f1805c6387254a73220b9.

* remove TN from ctc_segm tut (#7807)

Signed-off-by: Evelina <[email protected]>

* [TTS] Support audio offsets in TTS data loaders (#7156)

* [TTS] Support audio offsets in TTS data loaders

Signed-off-by: Ryan <[email protected]>

* [TTS] Change docstring mentions of .pt to .npy

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Update Apex install command in Dockerfile (#7794) (#7804)

* move core install to /workspace (#7706)

* update apex install in dockerfile

* use fetch head

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Nemo to HF converter for LLaMA model (#7770)

* Create config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Add files via upload

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update config_llama_truncate.yaml

Signed-off-by: Utkarsh <[email protected]>

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update convert_nemo_llama_to_hf.py

Signed-off-by: Utkarsh <[email protected]>

* clean up trainer

* remove dependency on yaml config. load config from nemo file instead.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* enable ckpt saving into other precision formats

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support 70b + cleanup qkv slice logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

* move hf model folder code from comment to function and add instruction to run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Utkarsh <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Chen Cui <[email protected]>

* Save best NeMo model only when necessary (#7836)

Signed-off-by: Ante Jukić <[email protected]>

* add guard if its a distributed checkpoint (#7845)

Signed-off-by: Gerald Shen <[email protected]>

* Fix tn duplex (#7808)

* fix duplex tn infer

Signed-off-by: Evelina <[email protected]>

* fix typo

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix TN docs

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update transformers cache on Jenkins (#7854)

* update transformers cache

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* add cd

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>

* Update README.rst for container update (#7844)

Signed-off-by: fayejf <[email protected]>

* Add support for finetuning with huggingface datasets (#7834)

* add finetune with huggingface dataset

Signed-off-by: stevehuang52 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update yaml

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* add extrac hf text and update

Signed-off-by: stevehuang52 <[email protected]>

* update and refactor

Signed-off-by: stevehuang52 <[email protected]>

* move dataset dependency to common

Signed-off-by: stevehuang52 <[email protected]>

* add docstring

Signed-off-by: stevehuang52 <[email protected]>

* Add to Dics

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add ci test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add max steps in jenkins

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* reduce max steps

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* jenkins test

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* add bs=2

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <[email protected]>

* Multimodal merge (#7728)

* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <[email protected]>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <[email protected]>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <[email protected]>

Signed-off-by: jasonwan <[email protected]>

* fix config

Signed-off-by: jasonwan <[email protected]>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <[email protected]>

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <[email protected]>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <[email protected]>

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <[email protected]>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <[email protected]>

* change layer names for SFT

Signed-off-by: Hongbin Liu <[email protected]>

* fix bug in SFT

Signed-off-by: Hongbin Liu <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: Hongbin Liu <[email protected]>
Co-authored-by: Hongbin Liu <[email protected]>
Co-authored-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <[email protected]>

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <[email protected]>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <[email protected]>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <[email protected]>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <[email protected]>

* fix variable names

Signed-off-by: jasonwan <[email protected]>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <[email protected]>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <[email protected]>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <[email protected]>

* small fix for lora and ptuning

Signed-off-by: jasonwan <[email protected]>

* support layerwise peft

Signed-off-by: jasonwan <[email protected]>

* support multiple target layers

Signed-off-by: jasonwan <[email protected]>

* support lora GQA

Signed-off-by: jasonwan <[email protected]>

* support amp O2

Signed-off-by: jasonwan <[email protected]>

* revert & more O2 fix

Signed-off-by: jasonwan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <[email protected]>

* support …
@ringohoffman
Copy link
Contributor

Same bug here after upgrade to torch==2.1.0 and lightning==2.1.0.

This bug appeared when running Metric.compute() of a torchmetric after a validation epoch.

Edit: I am using lightning fabric instead of lightning trainer. This bug is also triggered.

For me, I also saw this on Metric.compute(). It happened when I was running integration tests where one used a DDPStraregy and the other used a single process strategy on the cpu. After distributed process groups are created, an error seems to be raised if a metric is computed on the cpu.

import lightning
import torch
import torchmetrics
from torch import nn

fabric = lightning.Fabric(accelerator="cuda", devices=2)
fabric.launch()
module = nn.Linear(2, 1)
module = fabric.setup(module)

metric = torchmetrics.Accuracy(task="multiclass", num_classes=2)
metric.update(torch.tensor([0., 1.]), torch.tensor([0, 1]))
metric.compute()
RuntimeError:
No backend type associated with device type cpu

It seems to happen because torchmetrics uses torch.distributed.group.WORLD as the process group for CPU metrics.

@kapoorlab
Copy link

Same for me. Downgrading to pytorch-lightning==2.0.8 fixed the issue.

This is the only thing that works to date, is there any update from the lightning devs to when this issue would be properly resolved else we will be tied to that version forever.

drewnutt added a commit to abhinadduri/panspecies-dti that referenced this issue Aug 30, 2024
Had to rollback lightning due to this issue: Lightning-AI/pytorch-lightning#18803
Now uses the `exp-id` to save the best model checkpoint (also fixed DDP
issues with saving to wandb). Removed `device` from model call.
@Holer90
Copy link

Holer90 commented Sep 17, 2024

I was having the same error message when using MeanAveragePrecision() on Databricks.

For me it worked adding the following three kwargs when the metric was initialized:

  • compute_on_cpu=False
  • sync_on_compute=False
  • dist_sync_on_step=True

All three arguments are needed to solve it in my case.

My code now looks like:

metric = MeanAveragePrecision(
          iou_type="segm", 
          class_metrics=True, 
          compute_on_cpu=False, 
          sync_on_compute=False,
          dist_sync_on_step=True, 
)

@isaacgerg
Copy link

@Holer90 This also "fixes" the issue for me. Thank you for posting it. Do you have any intuition on why these kwargs fix the issue?

@edmcman
Copy link

edmcman commented Oct 29, 2024

It would be nice if the error message revealed which metric was the problem.

@mdifatta
Copy link

This error can also be reproduced using torchmetrics 1.3.2 when storing lists of tensors in CPU compute_on_cpu=True

This should be because of a bug in the documentation of sync_dist and in the raise of the warning: TorchMetric already have internally the right logic to do the reduction and don't need that parameter and don't need sync_dist=True to correctly reduce metrics across devices. See: #20153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ver: 2.1.x working as intended Working as intended
Projects
None yet
Development

No branches or pull requests