From 0c2b6837872447029200b55ede3cb8a5dd3af001 Mon Sep 17 00:00:00 2001
From: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Date: Tue, 17 Jan 2023 13:36:22 -0800
Subject: [PATCH] NeMo Forced Aligner (#5571)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Merge r1.13.0 main (#5570)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* Rename Speech Dataset Processor to Speech Data Processor (#5378)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Megatron Export Update (#5343)

* export update for Megatron + change ORT optimization

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated export_utils to use autocast instead of manually casting >:/

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* removed dtype from LayerNorm

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* added comment

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* reverting changes on FloatCast

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* Cherry-picked changes from megatron-norm

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* updated asr_model import to cast_utils

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* updated del onnx_model place

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* changed ort optimization to basic -> temp fix

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Disable sync_batch_comm in validation_step for GPT (#5397)

* disable sync_batch_comm in validation_step

Signed-off-by: ericharper <complex451@gmail.com>

* Read sync_batch_comm from config or default to False

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Comment out test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Radtts 1.13 (#5451)

* [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358)
* [TTS] add CI test for RADTTS training recipe.

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* export_utils bugfix (#5480)

* updated export_utils

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Export fixes for Riva (#5496)

* Export fixes for Riva

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Cleaning up training_utils

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* added set_start_method + function param bugfix (#5539)

* added set_start_method + function param bugfix

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* upper bound torchmetrics

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>

* remove notebook (#5548)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Optimized loop and bugfix in SDE (#5573)

- Fixed bug with loading custom data attributes from JSON in Speech Data Explorer

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update torchmetrics  (#5566)

* add task arg

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update state

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* remove useless files. (#5580)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add initial NFA code

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make use of the specified device during viterbi decoding

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix CodeQL notes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix CodeQL warning

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add an option to defer data setup from ``__init__`` to ``setup`` (#5569)

* Add an option to defer dataloader setup from __init__ to setup

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Updated doc

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make utt_id specified by number of parts of audio_filepath user wishes to use

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* remove audio_sr TODO - reduce risk of silent bugs

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add check that model is CTC

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove unused import

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Text generation improvement (UI client, data parallel support) (#5437)

* Squashed commit of the following:

commit a5e124f34be31bd6eafe5e5fdf5bedcd0d50915c
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Oct 13 15:07:42 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 35b424044fe80c3081e7756ab21244f701716f7e
Author: Yi Dong <yidong@nvidia.com>
Date:   Thu Oct 13 08:04:49 2022 -0700

    get rid of base

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 2955210e2311791543538cfbb5ad26b79414c954
Merge: d52edef8c eaf6757ca
Author: Yi Dong <yidong@nvidia.com>
Date:   Thu Oct 13 13:17:02 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit d52edef8cd7b36593838fb270047e80f8ccb652e
Author: Yi Dong <yidong@nvidia.com>
Date:   Thu Oct 13 13:16:24 2022 +0000

    align with main

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit eaf6757ca5be8e099492f57c81d984429b0ad49c
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Oct 13 13:12:11 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit c4b86d97626ea0721bf8fb4c0a45dec5becc94c9
Author: Yi Dong <yidong@nvidia.com>
Date:   Thu Oct 13 13:10:58 2022 +0000

    same as main

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit e335de51bcc0d681c58b568c3d8c238bc5687c3b
Merge: c231086e0 4463a9fe9
Author: Yi Dong <yidong@nvidia.com>
Date:   Thu Oct 13 13:08:09 2022 +0000

    Merge branch 'main' into universal_prompt

commit c231086e057f1efaa915f691d84664cb3d5aad85
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Oct 12 19:59:12 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 6a821a4b49a23dd3408a706a2a3dd393149b0bb1
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 19:56:17 2022 +0000

    default to pad

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 9d908e39fef1beed9ba2da4d1a6806161eb7ef25
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 19:55:44 2022 +0000

    add the option to pad the tokens

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 876dc395b43fdeeaa2bcbbe13c76523633764c33
Merge: fbb0f4035 fe3c77ee9
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 19:20:47 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit fe3c77ee93ab6cf3ea152db68cb6beefcac2a392
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 18:59:49 2022 +0000

    fix import again

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit fbb0f4035c6cd6bfefed50a20605503de8c1dccb
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Oct 12 16:00:24 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 372ca8c0d7988f2339b15888dc72aa21f4fb6937
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 15:58:32 2022 +0000

    enable server

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit cbe05d9fbc978f812cfbb671f45f147f300713c4
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 13:07:28 2022 +0000

    fix comment error

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 1948048922e726ec6131e44b1a745389f18d4ef2
Merge: 232c2cce3 984f5c09a
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 13:05:30 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit 232c2cce34d7a8b902da406706f3dd9b39475091
Merge: 34c8a68df 658243fb6
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 12 12:50:00 2022 +0000

    Merge branch 'fix_global_init' into universal_prompt

commit 984f5c09a6dbf1d1fb5aa30ed9b0df188e66a50f
Merge: 658243fb6 3fda5de46
Author: Yi Dong <43824965+yidong72@users.noreply.github.com>
Date:   Wed Oct 12 08:42:11 2022 -0400

    Merge branch 'main' into fix_global_init

commit 658243fb6580191b5d60edd30cde16dcc23cbb85
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 12 12:40:57 2022 +0000

    fix import error

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 8e0fe1cad05ec288ec122b3cd0e139a96872e08c
Author: Yi Dong <doyend@gmail.com>
Date:   Tue Oct 11 22:44:12 2022 +0000

    update the fused kernel

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 536cf6bef9447b75843fad630729c47a2fba35f3
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 11 14:44:52 2022 -0700

    add the missing file

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 1b437ec41dc5e354453ce0a089bca0171cbcb6c2
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 11 14:43:14 2022 -0700

    fix fused softmax

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 7813f60e05f9783af61f8c14ec1cb0c6c4f1f263
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 11 14:16:48 2022 -0700

    move global step to base

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 34c8a68df084b18d377e84415d9f07b2cd6673dd
Author: Yi Dong <doyend@gmail.com>
Date:   Thu Oct 6 13:50:11 2022 +0000

    fix pipeline for eval

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit eee5d38218f26660c3ffebe9f615c850c80a1f0d
Author: Yi Dong <doyend@gmail.com>
Date:   Thu Oct 6 13:48:22 2022 +0000

    fix for pipleline parallel

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 323bca73e7ef6099ee79c0a2fffac7b709ed6c5d
Merge: 125e49947 e3b4c4d1f
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 5 19:29:13 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit 125e4994760448ff75dd9328395813eda1c87547
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 5 19:29:04 2022 +0000

    add share option

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit e3b4c4d1f7346c9fa596f3cca6d4df0a9e05c368
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 5 11:43:48 2022 -0700

    make sure consolidation works

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit a5c833964ecf05dc460ca1da69275c4019742150
Merge: 2a07ab52d abcb74be2
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 5 18:40:29 2022 +0000

    Merge branch 'universal_prompt' of github.com:NVIDIA/NeMo into universal_prompt

commit 2a07ab52d95f15ba666823028c69e23825666c05
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 5 18:40:23 2022 +0000

    added requirement

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 3abecd9dd1611993a87c537636abe7f7e6a9b04c
Author: Yi Dong <doyend@gmail.com>
Date:   Wed Oct 5 18:39:42 2022 +0000

    added a simple web server

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit abcb74be2caf1cdec40eb9ba2be4dde4d45a3b4b
Author: Yi Dong <yidong@nvidia.com>
Date:   Wed Oct 5 06:54:12 2022 -0700

    fix empty val loss

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit b8eb92ac4a0d665570af75e34c9ba3c2e2420c26
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 4 19:25:30 2022 -0700

    text gen working

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit d59f3e3f3a6fd19736d1c5706fed65a3dd4049ba
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 4 16:08:40 2022 -0700

    first change

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 59d077585e6962a669b824af58f64e8a0bea6547
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 4 15:00:40 2022 -0700

    revert

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 12a0f3902d99e9179403644bd951c045df716ca7
Author: Yi Dong <doyend@gmail.com>
Date:   Tue Oct 4 21:26:23 2022 +0000

    init imp

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 62a15dfd943cc48be495ac61b9f2f00995775c5f
Merge: 82c90d2cd e0cc6b767
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 4 11:58:26 2022 -0700

    Merge branch 'main' into universal_prompt

commit 82c90d2cd0fd156f16a4b899f8c741d598f33990
Author: Yi Dong <yidong@nvidia.com>
Date:   Tue Oct 4 11:17:13 2022 -0700

    add sync

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 9819b703eef877d90cd1257bf3610c69de9b4d7e
Author: Yi Dong <yidong@nvidia.com>
Date:   Sun Oct 2 17:52:34 2022 -0700

    fix save model

    Signed-off-by: root <root@luna-0197.selene.nvidia.com>

commit e4937e2fc5fb7d70754c97668416e4a69c3079fe
Author: Yi Dong <yidong@nvidia.com>
Date:   Sat Oct 1 18:56:09 2022 +0000

    working

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit b73b06d1c7cf5417a6d87cb33d8ed83a57e38b7b
Author: Yi Dong <yidong@nvidia.com>
Date:   Sat Oct 1 17:34:03 2022 +0000

    calcuate the mask

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 9db3bc13eb65a94a475b837603351da68e3745bc
Author: Yi Dong <yidong@nvidia.com>
Date:   Fri Sep 30 23:26:32 2022 +0000

    fix bug in datasets

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit f289900375d4412f53f8110be00fec6587627550
Author: Yi Dong <yidong@nvidia.com>
Date:   Fri Sep 30 22:29:40 2022 +0000

    update the code

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 8e28a1f208aabaab72dbe769e72756baada04d99
Author: Yi Dong <yidong@nvidia.com>
Date:   Fri Sep 30 21:52:52 2022 +0000

    added new ds

    Signed-off-by: Yi Dong <yidong@nvidia.com>

commit 8d41315bab7ce90e200a8a7d1023c34f8e046897
Author: Yi Dong <doyend@gmail.com>
Date:   Fri Sep 30 18:57:09 2022 +0000

    added new files

    Signed-off-by: Yi Dong <doyend@gmail.com>

commit 984e0e94e15e16323c1ba1ca2efeabd84f69463f
Merge: cbe8b7ab1 fa6cd8588
Author: Yi Dong <doyend@gmail.com>
Date:   Thu Sep 29 21:43:29 2022 +0000

    Merge branch 'llm-prompt-learning-improvements' into universal_prompt

commit fa6cd858839277939446afe7275976078d54c512
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Sep 29 16:47:30 2022 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit 78ba46e5d6fde1be53c08e1e30a54cce59824be0
Merge: 7d6d46742 8d670bc77
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Thu Sep 29 09:43:27 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 7d6d46742170a66758287a207d67e1b1bfd15613
Author: Virginia Adams <vadams@nvidia.com>
Date:   Thu Sep 29 16:42:43 2022 +0000

    Removed inference step and added sentence peice check to predict step

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 20fd265acd6f7f9912cf52155fe66ccfa6b201a2
Author: Virginia Adams <vadams@nvidia.com>
Date:   Thu Sep 29 15:26:32 2022 +0000

    fixed first stage check for pipeline parallel T5 pt

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 3637be2b258c8d9028856f9971edb7da4a8121f0
Merge: a3ea722fd 986a76612
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Wed Sep 28 10:23:30 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit a3ea722fdc12fbcc5989b76ef5643a574b763bc4
Merge: 770967a52 971485ce7
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Mon Sep 26 13:35:52 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 770967a5251a474b6dcc2d44bf9a2076adbcb604
Merge: d23bf6c30 e3ac280a8
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Mon Sep 26 10:17:03 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit d23bf6c30acc0e3f6af9b4e24547669866a34d62
Merge: de6a31651 333d2b749
Author: Virginia Adams <vadams@nvidia.com>
Date:   Mon Sep 26 10:05:16 2022 -0700

    Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements

commit de6a31651e63d88a42b971794d93f18ff5a3cdff
Author: Virginia Adams <vadams@nvidia.com>
Date:   Mon Sep 26 17:00:53 2022 +0000

    Updated PP check to be on first stage pipeline only

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 333d2b7498e6742ce66436f733c980a74616900c
Merge: 592c0986a a39fc925a
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Fri Sep 23 16:11:21 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 592c0986a476a91b57b8605d7b70830d7acfa021
Author: Virginia Adams <vadams@nvidia.com>
Date:   Fri Sep 23 23:08:41 2022 +0000

    Fixed unused import and CI test bug

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit ea9cd82d85638bc60ae4ad7ef105db931c8e3455
Merge: ce4b72c8c b566c2d0e
Author: Virginia Adams <vadams@nvidia.com>
Date:   Fri Sep 23 18:57:25 2022 +0000

    Merge branch 'llm-prompt-learning-improvements' of https://github.com/NVIDIA/NeMo into llm-prompt-learning-improvements

commit ce4b72c8c52f32be336e323dd78a38089edc3e7c
Author: Virginia Adams <vadams@nvidia.com>
Date:   Fri Sep 23 18:57:16 2022 +0000

    Switch to import from base class

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit b566c2d0e35a068f758fd1310bc620a47be4590b
Merge: 6621f2854 e872061ac
Author: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Date:   Fri Sep 23 10:09:03 2022 -0700

    Merge branch 'main' into llm-prompt-learning-improvements

commit 6621f28543828a48484a5637f6c9f3ccb23a5b02
Author: Virginia Adams <vadams@nvidia.com>
Date:   Wed Sep 14 20:47:35 2022 +0000

    python format fix

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 8deafc8987b6af5f7b99a250310f57a40198c37f
Author: Virginia Adams <vadams@nvidia.com>
Date:   Wed Sep 14 20:28:02 2022 +0000

    Save .nemo on new best val score

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 761bd36969cb465d6a129e9eee6ce1f883d3cf41
Author: Virginia Adams <vadams@nvidia.com>
Date:   Wed Sep 14 18:03:19 2022 +0000

    Added automatic checkpoint to nemo file method

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

commit 3be4ed57b6cd3ddfe4876d78650dfe8fe794598b
Author: Virginia Adams <vadams@nvidia.com>
Date:   Wed Sep 14 02:11:56 2022 +0000

    Make GPT use base prompt learning model class:

    Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix LGTM

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix validation

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change for the lm eval

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make text generation work in data parallel environment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implement the service with rest service

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* surpress log

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore function needed for NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* handles no answer only

Signed-off-by: Yi Dong <yidong@nvidia.com>

* Fix config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* added knn to web

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix lgtm.com comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

* output the retrieved context

Signed-off-by: Yi Dong <yidong@nvidia.com>

* allow no neighbor query

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the imports

Signed-off-by: Yi Dong <yidong@nvidia.com>

* warn only once

Signed-off-by: Yi Dong <yidong@nvidia.com>

* Change output file format from JSON to JSONL

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* new t0 dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* Add T0 data preproc scripts

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merge and multiprocessing

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix for is_correct

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* fix epoch > 2

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handles multiple dataloader

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove template

Signed-off-by: Yi Dong <yidong@nvidia.com>

* Refactor T0 dataset

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add script to merge train folder into individual training files to minimize number of blends

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added on the fly service

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add combo instance

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added combo service

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* send weights back to server

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix index store

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add reset button

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add add eos

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use a seperate bert service

Signed-off-by: Yi Dong <yidong@nvidia.com>

* no loss of accuracy

Signed-off-by: Yi Dong <yidong@nvidia.com>

* pin the gradio version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* Remove bin compat

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix header lines

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* evaluate based on text generation

Signed-off-by: Yi Dong <yidong@nvidia.com>

* exact match result aggregation

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working SP and SA

Signed-off-by: Yi Dong <yidong@nvidia.com>

* sync

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix checkpoint

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix eval

Signed-off-by: Yi Dong <yidong@nvidia.com>

* backup states

Signed-off-by: Yi Dong <yidong@nvidia.com>

* backup states reset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix evaluation for sentence piece

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix a bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* potential fix in the future

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the universal codes

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove universal strategy

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address reviewer comment

Signed-off-by: Yi Dong <yidong@nvidia.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add align function docstrings and make most args optional

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove redundant returns of viterbi and log probs matrices

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename h# to <initial_silence>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update manifest format description in README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* always remove any spaces from utt_id

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Patch the hanging of threads on very large stderr (#5589) (#5590)

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* O2 style amp for gpt3 ptuning (#5246)

* enable amp o2 plugin

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* only create master param if param requires gradient

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* remove pytorch autocast

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Update optimizer_with_main_params.py

Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>

* create master grad only if param group requires grad

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix grad scaler for pp > 1

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Better patch hydra (#5591) (#5592)

* Readd buffereing and thread drain to Hydra Launcher

Signed-off-by: smajumdar <titu1994@gmail.com>

* Readd buffereing and thread drain to Hydra Launcher

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Yet another fix with hydra multirun (#5594) (#5595)

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add RETRO model documentation (#5578)

* added retro doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* finish data part

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the data format

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added training script

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added training and evaluation steps

Signed-off-by: Yi Dong <yidong@nvidia.com>

* edit the text

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the images

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix beginning

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the grammar

Signed-off-by: Yi Dong <yidong@nvidia.com>

* trim it down

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add wandb option

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add reference

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix path

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the parameters table

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix section

Signed-off-by: Yi Dong <yidong@nvidia.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix: setup_multiple validation/test data (#5585)

Fix: setup_multiple validation/test data (#5585)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move to optimizer based EMA implementation (#5169)

* Move to optimizer

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix replacing weights

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Allow swapping of weights be optional

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Save 2 models

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Use different hook

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Expose cpu device

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add clause to see if this fixes issue with O2 optimizer

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Try to get O2 working

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* WIP

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fixes to tests

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add guard

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove import

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add guard

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add comment

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove overwrite

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Add BatchNorm, currently tests fail

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix tests/functionality for batch norm

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Get rid of NLP changes

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* AIStore for ASR datasets (#5462)

AIStore for ASR datasets

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add support for MHA adapters to ASR (#5396)

* Convert AbstractAdapterModule to AbstractAdapterMixin

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporary fixes to new signature of mixin

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add adapter util for constants, add all mha adapters.

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update name of function

Signed-off-by: smajumdar <titu1994@gmail.com>

* Roll back changes to convASR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Convert AbstractAdapterModule to AbstractAdapterMixin

Signed-off-by: smajumdar <titu1994@gmail.com>

* First draft of Conformer support for MHA attention

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add some preliminary tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for projection of the hidden dimension for attention

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for squeezeformer

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update train adapter config

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add tests for squeezeformer and unit tests for new modules

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update config for hp search,set limits on modules for conformer and squeezeformer, update adapter mixin, add cache to import_from_class_path

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update location of adapters

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add pre_norm for proper attention learning, Fix the issue with nan/inf in pos_bias_u and pos_bias_v

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update expmanager to clean up checkpoints

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix style

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add docstrings and update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add docstrings and update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add docstrings and update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update training scripts

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update config and docs

Signed-off-by: smajumdar <titu1994@gmail.com>

* Expose nemo delete function

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct adapter partial state saving

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct a bug with state management of adapter tokens

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Pull down EMA test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of adapter module utility class

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove unused TTS eval functions w/ pesq and pystoi dependencies (#5605) (#5606)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Create separator parameter

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Call align function with hydra config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update usage example

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update Dockerfile (#5614) (#5616)

Pinned to use `numba==0.53.1` to avoid crashing in training with `num_workers > 0`. This is just a temporary workaround, still need to fix it in the future.

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make separate pretrained_name and model_path parameters

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* make "optional" tags bold in markdown

Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move non-main functions to utils dir

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Temp workaround: Disable test with cache_audio=True since it is failing in CI (#5607) (#5615)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS] fix ranges of char set for accented letters. (#5607)

* [TTS] fix ranges of char set for accented letters.
* remove digits pattern and added unit tests for math operators.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Change success message to reduce confusion (#5621)

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update documentation and tutorials for Adapters  (#5610)

* Improve docs for adapter and tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve docs for adapter and tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename test file

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS] add type hints and change varialbe names for tokenizers and g2p (#5602)

* [TTS] add type hints and change variable names for tokenizers and g2p

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* 1. Added missing import for gather_objects. (#5627)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. (#5596) (#5625)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fixed RadTTS unit test (#5572)

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* remove tests (#5633)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][DOC] add notes about automatic conversion to target sampling rates. (#5624) (#5634)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Conformer local attention (#5525)

* local attn and merge

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* optional

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* override

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* incorporate comments

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* comment

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* changes, test

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* changes

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* check att context

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* readme link

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* utils

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* update

Signed-off-by: sam1373 <samuelkriman@gmail.com>

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add core classes and functions for online clustering diarizer part 1 (#5526)

* Add core classes and functions for online clustering diarizer

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add audio to labels code

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve type errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added unit=tests for very short audio

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Filled all missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved conflict and added missing docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed unit-test errors

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the wrongly added file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fix wrongly included file - megatron_gpt_model.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* resolve code quality issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed unit-test errors and bugs

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed total_sec for offline_clustering toy_data in unit-tests

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed merging index offset bug

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only including part 1 files

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused function

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* divided nmesc_clustering.py into two and reflected first-pass comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding offline/online_clustering.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code QL autocomment

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Removed unused imports

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update nemo/collections/asr/parts/utils/online_clustering.py

Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>

* Reflected comments

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved code scanning issue

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update nemo/collections/asr/parts/utils/offline_clustering.py

Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models (#5639) (#5641)

* add stt_eo_conformer_ctc_large model

* stt_eo_conformer_transducer_large

Co-authored-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Removed unused import

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Specify that filepaths need to be absolute

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* replaces any spaces in utt_id with dashes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make hydra script callable by another script

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* do not specify default model or model_downsample_factor

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [Dockerfile] Remove AIS archive from docker image (#5629)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Measure audio_sr from audio instead of needing to specify

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch (#5541)

* Chinese TTS replaces default pypinyin dict
* Add jieba word segmenter as an option

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make separate parameters for device of transcription and viterbi steps

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add mention of gecko

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [workflow] add exclude labels option to ignore cherry-picks in release changelog. (#5645)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. (#5643) (#5647)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [Add] ASR+VAD Inference Pipeline (#5575)

Added offline ASR+VAD inference pipeline that matches with what's in RIVA, along with some feature-based ASR and classification datasets.

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* rename separator to ctm_grouping_separator and refactor

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Bert interleaved (#5556)

* Adding SP and SAR support Bert

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding SP and SAR support Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding SP and SAR support Bert

* Adding SP and SAR support Bert

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Adding Sequence parallel support to Bert

* Update bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Adding tests

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* Adding interleaved pipeline parallelism

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing Eric's comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressing Eric's comments

* Fix bug fix sequence parallel and Interleaved

* Fix bug fix sequence parallel and Interleaved

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add duration padding support for RADTTS inference (#5650)

* Added duration padding support for RADTTS inference

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Kevin Shih <kshih@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add remove_blank_tokens_from_ctm parameter

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Dont save initial_silence line in CTM

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add DLLogger support to exp_manager (#5658)

* Add DLLogger support to exp_manager

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

* Move dllogger to separate file and check import

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add minimum_timestamp_duration parameter

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add suggestion about removing blanks to README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* reorder args

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* clarify description of ctm_grouping_separator in README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update docstring

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] bugfix for ngc cli installation. (#5652) (#5664)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Port stateless timer to exp manager (#5584)

* Port stateless timer to exp manager

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes and remove from all megatron code

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change message

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix EMA restart by allowing device to be set by the class init (#5668)

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove SDP (moved to separate repo) - merge to main (#5630)

* Remove sdp files from tools folder

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add page to docs with new SDP location

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add interface for making amax reduction optional for FP8 (#5447)

* add TE interface for making amax reduction optional

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS] add tts dict cust notebook (#5662)

* add tts dict cust notebook

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fixed audio links

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove old notebook

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix typo

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [ASR] Audio processing base, multi-channel enhancement models (#5356)

* Audio processing base model, enc-mask-dec enhancement, tests and modules

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed review comments

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fixed CodeQL warnings

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed PR comments

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed PR comments:
- renamed AudioProcessingModel to AudioToAudioModel
- various small modifications
- updated unit tests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Addressed comments
- Moved spectrogram to audio_preprocessing
- Renamed MultichannelFeatures
- Updated config and unit tests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Expose ClusteringDiarizer device (#5681)

* Expose device for users to set

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Expose device for users to set

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Beam Search support to ASR transcribe() (#5443)

* Add support for beam decoding via high level API.

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add ctc decoding section

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update ctc transcribe API to return results from beam search

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add argument to preserve arpa file

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update script to use hydra config, add some support for future compute timesteps, add doc for ctc decoding

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update eval script and doc to use new API

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add tests for ctc greedy decoding

Signed-off-by: smajumdar <titu1994@gmail.com>

* Address reviewer comments and add docstrings

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix changes and address comments

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Propagate attention_dropout flag for GPT-3 (#5669)

* Propagate attention_dropout flag for GPT-3

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>

* Add default to megatron_gpt_config

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Enc-Dec model size reporting fixes (#5623)

* Update for enc-dec models

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix for bert as well

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix for PP

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Multiblank Transducer (#5527)

* multi-blank transducers

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* one line bug fix

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* change interface of RNNTDecoding class to extract num-extra-output from joint instead of constructor

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* addressed PR comments

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] fix broken link for the script. (#5680)

* change to main branch.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TN/TTS docs] TN customization, g2p docs moved to tts (#5683)

* TN customization, g2p docs moved to tts

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* link new TTS tutorial

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* combine 3 and 4

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove note

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add prompt learning tests (#5649)

* patch to allow using tokenizers without additional_special_tokens_ids attribute

Signed-off-by: arendu <adithya.r@gmail.com>

* added gpt prompt learning and t5 prompt learning, made them run one after the other

Signed-off-by: arendu <adithya.r@gmail.com>

* fixed changes

Signed-off-by: arendu <adithya.r@gmail.com>

* gave unique names

Signed-off-by: arendu <adithya.r@gmail.com>

* num workers set to 0

Signed-off-by: arendu <adithya.r@gmail.com>

* fixes to make num_workers>0 fast by using persistent_workers flag in dataloaders

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated to num_workers 8

Signed-off-by: arendu <adithya.r@gmail.com>

* updates to make num_workers arg in gpt/t5 infernce/training work

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: arendu <adithya.r@gmail.com>

* add num_workers arg in jenkins

Signed-off-by: arendu <adithya.r@gmail.com>

* bs fix

Signed-off-by: arendu <adithya.r@gmail.com>

* numworkers > 0 added for gpt prompt learning eval

Signed-off-by: arendu <adithya.r@gmail.com>

* added num_workers

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* remove output (#5689) (#5690)

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Minor fixes (#5691)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* temp disbale speaker reco CI (#5696)

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* some tokenizers do not have additional_special_tokens_ids attribute (#5642) (#5648)

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Bump setuptools from 59.5.0 to 65.5.1 in /requirements (#5704)

Bumps [setuptools](https://github.com/pypa/setuptools) from 59.5.0 to 65.5.1.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v59.5.0...v65.5.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Merge 1.14.0 main (#5705)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* [TTS][ZH] fix broken link for the script. (#5666)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* unpin lightning

Signed-off-by: ericharper <complex451@gmail.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Don't print exp_manager warning when max_steps == -1 (#5725)

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* pin torchmetrics version (#5720)

* fix torchmetrics version

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add lower bound

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update to pytorch 22.12 container (#5694)

* update to pytorch 22.12 container

Signed-off-by: ericharper <complex451@gmail.com>

* please fix waveglow export in 22.12 container

Signed-off-by: ericharper <complex451@gmail.com>

* Update torch.stft() calls due to deprecation of return_complex=False (#5729)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* Update ASR torch.stft() call to use return_complex=True (#5730)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add keep_initializers_as_inputs to _export method (#5731)

Signed-off-by: Patrick Simianer <patrick@lilt.com>

Signed-off-by: Patrick Simianer <patrick@lilt.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* added tab former doc to the index page (#5733)

Signed-off-by: Yi Dong <yidong@nvidia.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* ALiBi Positional Embeddings (#5467)

* 1. Working on alibi positional embeddings.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Added encoder and decoder alibi classes.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Simplified code.
2. Added bidirectional support.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added support in config to alibi.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Added Jenkins tests.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added missing file.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Ensure EMA checkpoints are also deleted when normal checkpoints are (#5724)

* Ensure EMA checkpoints are also deleted when normal checkpoints are

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Simplify test

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Remove comment

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Fix bug where `save_best_model` caused a crash

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

* Swap to logging only on rank 0

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix P-Tuning Truncation (#5663)

* untokenize truncated field

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated truncation method arugments

Signed-off-by: Virginia Adams <vadams@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update 00_NeMo_Primer.ipynb (#5740)

Fixed a minor typo in primer tutorial.

Signed-off-by: schaltung <amoreno@gmail.com>

Signed-off-by: schaltung <amoreno@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Support non-standard padding token id (#5543)

* Support non-standard padding token id

Read the id of the padding token from the tokenizer when creating the
embedding, rather than always defaulting to 0. This allows use of
(admittedly bizarre) non-standard tokenizer models that don't give the
padding token <PAD> the id 0.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Numeri <kaden.uhlig@lilt.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* typo and link fixed (#5741) (#5744)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* link fixed (#5745) (#5746)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS] Update Spanish TTS model to 1.15 (#5742)

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix for incorrect computation of batched alignment in transducers (#5692)

* Fix rnnt alignment bug and add test

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Add tests/fixes for more decoding configurations

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Add tests/fixes for frame confidence computation

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Rename test file to avoid local execution

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Add test to jenkinsfile

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Proper fix for alignments + remove code duplication

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Return back separate mask processing

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Override cleanup fixture

Signed-off-by: Igor Gitman <igitman@nvidia.com>

* Add a TODO for multiblank RNNT

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move Attention and MLP classes to a separate file in Megatron transformers (#5453)

* Move attention and mlp to separate files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add new attention and mlp files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix import in tests

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports in attention

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix missing import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Adithyare/prompt learning seed (#5749)

* patch to allow using tokenizers without additional_special_tokens_ids attribute

Signed-off-by: arendu <adithya.r@gmail.com>

* seeding for param-efficient learning methods

Signed-off-by: arendu <adithya.r@gmail.com>

* seeding the datasampler

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed seed_everything

Signed-off-by: arendu <adithya.r@gmail.com>

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Set the stream position to 0 for pydub (#5752)

Signed-off-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>

Signed-off-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix: conformer encoder forward when length is None (#5761)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update Tacotron2 NGC checkpoint load to latest version (#5760) (#5762)


Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. (#5753)

* refine GermanCharsTokenizer to support only graphemes as inputs by removing sentence-level phoneme representation;
* refine GermanCharsTokenizer to preserve  mixed cases from the original input graphemes;
* add a new Thorsten's 22.10 dataset;
* revise thorsten voice neutral datasets preparation script to support two versions of thorsten's voice datasets in a single script;

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Refactor so token, word and additonal segment-level alignments are generated in the same run

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* change CTM rounding to remove unnecessary decimal figures

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move obtaining start and end of batch line IDs to separate util function

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Sanitize params before DLLogger log_hyperparams (#5736)

* Sanitize params before DLLogger log_hyperparams

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Allow to run alignment on transcribed pred_text

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename output_ctm_folder to output_dir

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* rename n_parts_for_ctm to audio_filepath_parts_in_utt_id

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename some variables to improve readability

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* move constants to separate file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add extra data args to support proper finetuning of HF converted T5 checkpoints (#5719)

* Initial addition of extra args

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change defaults

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename some functions

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update year

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* No-script TS export, prepared for ONNX export  (#5653)

* Changed unfold to reshape, merged padding chenges
* Almost working ONNX export of RadTTS
* restored radtts function
* Added explicit assume_padded flag
* Fixing attn_mask
* Fixing unfold
* Trying no hx
* Back with hx
* Made fx only for tracing
* Tests annotated
* Fully working no-script TS export, prepared for ONNX export
* Restored no-autocast block, addressed code review
* Fine-tuning autocast option
* Protecting InstanceNorm
* Forcing eval and param freeze on export

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* ASR evaluator (#5728)

* backbone

Signed-off-by: fayejf <fayejf07@gmail.com>

* engineer and analyzer

Signed-off-by: fayejf <fayejf07@gmail.com>

* offline_by_chunked

Signed-off-by: fayejf <fayejf07@gmail.com>

* test_ds wip

Signed-off-by: fayejf <fayejf07@gmail.com>

* temp remove inference

Signed-off-by: fayejf <fayejf07@gmail.com>

* mandarin yaml

Signed-off-by: fayejf <fayejf07@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* augmentor and a few updates

Signed-off-by: fayejf <fayejf07@gmail.com>

* address alerts and revert unnecessary changes

Signed-off-by: fayejf <fayejf07@gmail.com>

* Add readme

Signed-off-by: fayejf <fayejf07@gmail.com>

* rename

Signed-off-by: fayejf <fayejf07@gmail.com>

* typo fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* small fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* add missing header

Signed-off-by: fayejf <fayejf07@gmail.com>

* rename augmentor_config to augmentor

Signed-off-by: fayejf <fayejf07@gmail.com>

* raname inference_mode to inference

Signed-off-by: fayejf <fayejf07@gmail.com>

* move utils.py

Signed-off-by: fayejf <fayejf07@gmail.com>

* update temp file

Signed-off-by: fayejf <fayejf07@gmail.com>

* make wer cer clear

Signed-off-by: fayejf <fayejf07@gmail.com>

* seed_everything

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix missing rn augmentor_config in rnnt

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix rnnt transcribe

Signed-off-by: fayejf <fayejf07@gmail.com>

* add more docstring and style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* address codeQL

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comments

Signed-off-by: fayejf <fayejf07@gmail.com>

* update readme

Signed-off-by: fayejf <fayejf07@gmail.com>

* clearer

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Docs g2p update (#5769) (#5775)

* links update, riva docs link

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* adding back tar script for decoder dataset for duplex (#5773)

* adding back tar script for decoder dataset for duplex

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [ASR][Test] Enable test for cache audio with a single worker (#5763)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fixing masking in RadTTS bottleneck layer (#5771)

* Fixing masking in RadTTS bottleneck layer

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update torchaudio dependency version for tutorials (#5781) (#5782)


Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* [TTS][ZH] bugfix import jieba errors. (#5776) (#5784)


Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix typos

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* update requirements.txt

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make default devices None and set to GPU if it is available

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add warning for non-zero minimum_timestamp_duration

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Clarify phrasing in README regarding raising error if pred_text exists

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update README section on evaluating alignment accuracy

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix some code in creating segments

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add some unit tests for NFA boundary_info creation

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Added test for function adding t_start and t_end

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add comments to get_y_and_boundary_info_for_utt and remove redundant variables

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add comments to get_batch_tensors_and_boundary_info

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add comments to make_output_files.py

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add comments to viterbi decoding code

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add copyright headers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Change req to nemo_toolkit[all]

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Patrick Simianer <patrick@lilt.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: schaltung <amoreno@gmail.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: David Mosallanezhad <dmosallanezh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: kevjshih <kevin.j.shih@gmail.com>
Co-authored-by: Kevin Shih <kshih@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Hainan Xu <hainan.xv@gmail.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pks <pks@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: schaltung <amoreno@gmail.com>
Co-authored-by: Kaden Uhlig <Numeri@users.noreply.github.com>
Co-authored-by: Numeri <kaden.uhlig@lilt.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
---
 tools/nemo_forced_aligner/README.md           |  84 ++++
 tools/nemo_forced_aligner/align.py            | 287 +++++++++++++
 tools/nemo_forced_aligner/requirements.txt    |   2 +
 .../test_add_t_start_end_to_boundary_info.py  | 121 ++++++
 .../test_get_y_and_boundary_info_for_utt.py   | 158 +++++++
 tools/nemo_forced_aligner/utils/constants.py  |  19 +
 tools/nemo_forced_aligner/utils/data_prep.py  | 385 ++++++++++++++++++
 .../utils/make_output_files.py                | 210 ++++++++++
 .../utils/viterbi_decoding.py                 | 136 +++++++
 9 files changed, 1402 insertions(+)
 create mode 100644 tools/nemo_forced_aligner/README.md
 create mode 100644 tools/nemo_forced_aligner/align.py
 create mode 100644 tools/nemo_forced_aligner/requirements.txt
 create mode 100644 tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py
 create mode 100644 tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py
 create mode 100644 tools/nemo_forced_aligner/utils/constants.py
 create mode 100644 tools/nemo_forced_aligner/utils/data_prep.py
 create mode 100644 tools/nemo_forced_aligner/utils/make_output_files.py
 create mode 100644 tools/nemo_forced_aligner/utils/viterbi_decoding.py

diff --git a/tools/nemo_forced_aligner/README.md b/tools/nemo_forced_aligner/README.md
new file mode 100644
index 0000000000000..1f96eba988871
--- /dev/null
+++ b/tools/nemo_forced_aligner/README.md
@@ -0,0 +1,84 @@
+# NeMo Forced Aligner (NFA)
+
+A tool for doing Forced Alignment using Viterbi decoding of NeMo CTC-based models.
+
+## Usage example 
+
+``` bash
+python <path_to_NeMo>/tools/nemo_forced_aligner/align.py \
+        pretrained_name="stt_en_citrinet_1024_gamma_0_25" \
+        model_downsample_factor=8 \
+        manifest_filepath=<path to manifest of utterances you want to align> \
+        output_dir=<path to where your ctm files will be saved>
+```
+
+## How do I use NeMo Forced Aligner?
+To use NFA, all you need to provide is a correct NeMo manifest (with `"audio_filepath"` and `"text"` fields).
+
+Call the `align.py` script, specifying the parameters as follows:
+
+* `pretrained_name`: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded from NGC and used for generating the log-probs which we will use to do alignment. Any Quartznet, Citrinet, Conformer CTC model should work, in any language (only English has been tested so far). If `model_path` is specified, `pretrained_name` must not be specified.
+>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors.
+
+* `model_path`: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the log-probs which we will use to do alignment. If `pretrained_name` is specified, `model_path` must not be specified.
+>Note: NFA can only use CTC models (not Transducer models) at the moment. If you want to transcribe a long audio file (longer than ~5-10 mins), do not use Conformer CTC model as that will likely give Out Of Memory errors.
+
+* `model_downsample_factor`: the downsample factor of the ASR model. It should be 2 if your model is QuartzNet, 4 if it is Conformer CTC, 8 if it is Citrinet.
+
+* `manifest_filepath`: The path to the manifest of the data you want to align, containing `'audio_filepath'` and `'text'` fields. The audio filepaths need to be absolute paths.
+
+* `output_dir`: The folder where to save CTM files containing the generated alignments and new JSON manifest containing paths to those CTM files. There will be one CTM file per utterance (ie one CTM file per line in the manifest). The files will be called `<output_dir>/{tokens,words,additional_segments}/<utt_id>.ctm` and each line in each file will start with `<utt_id>`. By default, `utt_id` will be the stem of the audio_filepath. This can be changed by overriding `audio_filepath_parts_in_utt_id`. The new JSON manifest will be at `<output_dir>/<original manifest file name>_with_ctm_paths.json`.
+
+* **[OPTIONAL]** `align_using_pred_text`: if True, will transcribe the audio using the ASR model (specified by `pretrained_name` or `model_path`) and then use that transcription as the 'ground truth' for the forced alignment. The `"pred_text"` will be saved in the output JSON manifest at `<output_dir>/{original manifest name}_with_ctm_paths.json`. To avoid over-writing other transcribed texts, if there are already `"pred_text"` entries in the original manifest, the program will exit without attempting to generate alignments.  (Default: False). 
+
+* **[OPTIONAL]** `transcribe_device`: The device that will be used for generating log-probs (i.e. transcribing). If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method. (Default: `None`).
+
+* **[OPTIONAL]** `viterbi_device`: The device that will be used for doing Viterbi decoding. If None, NFA will set it to 'cuda' if it is available (otherwise will set it to 'cpu'). If specified `transcribe_device` needs to be a string that can be input to the `torch.device()` method.(Default: `None`).
+
+* **[OPTIONAL]** `batch_size`: The batch_size that will be used for generating log-probs and doing Viterbi decoding. (Default: 1).
+
+* **[OPTIONAL]** `additional_ctm_grouping_separator`: the string used to separate CTM segments if you want to obtain CTM files at a level that is not the token level or the word level. NFA will always produce token-level and word-level CTM files in: `<output_dir>/tokens/<utt_id>.ctm` and `<output_dir>/words/<utt_id>.ctm`. If `additional_ctm_grouping_separator` is specified, an additional folder `<output_dir>/{tokens/words/additional_segments}/<utt_id>.ctm` will be created containing CTMs for `addtional_ctm_grouping_separator`-separated segments. (Default: `None`. Cannot be empty string or space (" "), as space-separated word-level CTMs will always be saved in `<output_dir>/words/<utt_id>.ctm`.)
+> Note: the `additional_ctm_grouping_separator` will be removed from the ground truth text and all the output CTMs, ie it is treated as a marker which is not part of the ground truth. The separator will essentially be treated as a space, and any additional spaces around it will be amalgamated into one, i.e. if `additional_ctm_grouping_separator="|"`, the following texts will be treated equivalently: `“abc|def”`, `“abc |def”`, `“abc| def”`, `“abc | def"`.
+
+* **[OPTIONAL]** `remove_blank_tokens_from_ctm`: a boolean denoting whether to remove <blank> tokens from token-level output CTMs. (Default: False). 
+
+* **[OPTIONAL]** `audio_filepath_parts_in_utt_id`: This specifies how many of the 'parts' of the audio_filepath we will use (starting from the final part of the audio_filepath) to determine the utt_id that will be used in the CTM files. (Default: 1, i.e. utt_id will be the stem of the basename of audio_filepath). Note also that any spaces that are present in the audio_filepath will be replaced with dashes, so as not to change the number of space-separated elements in the CTM files.
+
+* **[OPTIONAL]** `minimum_timestamp_duration`: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio file. Note that this may cause timestamps to overlap. (Default: 0, i.e. no modifications to predicted duration).
+
+# Input manifest file format
+By default, NFA needs to be provided with a 'manifest' file where each line specifies the absolute "audio_filepath" and "text" of each utterance that you wish to produce alignments for, like the format below:
+```json
+{"audio_filepath": "/absolute/path/to/audio.wav", "text": "the transcription of the utterance"}
+```
+
+You can omit the `"text"` field from the manifest if you specify `align_using_pred_text=true`. In that case, any `"text"` fields in the manifest will be ignored: the ASR model at `pretrained_name` or `model_path` will be used to transcribe the audio and obtain `"pred_text"`, which will be used as the 'ground truth' for the forced alignment process. The `"pred_text"` will also be saved in the output manifest JSON file at `<output_dir>/<original manifest file name>_with_ctm_paths.json`. To remove the possibility of overwriting `"pred_text"`, NFA will raise an error if `align_using_pred_text=true` and there are existing `"pred_text"` fields in the original manifest.
+
+> Note: NFA does not require `"duration"` fields in the manifest, and can align long audio files without running out of memory. Depending on your machine specs, you can align audios up to 5-10 minutes on Conformer CTC models, up to around 1.5 hours for QuartzNet models, and up to several hours for Citrinet models. NFA will also produce better alignments the more accurate the ground-truth `"text"` is.
+
+
+# Output CTM file format
+For each utterance specified in a line of `manifest_filepath`, several CTM files will be generated:
+* a CTM file containing token-level alignments at `<output_dir>/tokens/<utt_id>.ctm`,
+* a CTM file containing word-level alignments at `<output_dir>/words/<utt_id>.ctm`,
+* if `additional_ctm_grouping_separator` is specified, there will also be a CTM file containing those segments at `output_dir/additional_segments`.
+Each CTM file will contain lines of the format:
+`<utt_id> 1 <start time in samples> <duration in samples> <text, ie token/word/segment>`.
+Note the second item in the line (the 'channel ID', which is required by the CTM file format) is always 1, as NFA operates on single channel audio.
+
+# Output JSON manifest file format
+A new manifest file will be saved at `<output_dir>/<original manifest file name>_with_ctm_paths.json`. It will contain the same fields as the original manifest, and additionally:
+* `"token_level_ctm_filepath"`
+* `"word_level_ctm_filepath"`
+* `"additonal_segment_level_ctm_filepath"` (if `additional_ctm_grouping_separator` is specified)
+* `"pred_text"` (if `align_using_pred_text=true`)
+
+
+# How do I evaluate the alignment accuracy?
+Ideally you would have some 'true' CTM files to compare with your generated CTM files. With these you could obtain metrics such as the mean (absolute) errors between predicted starts/ends and the 'true' starts/ends of the segments.
+
+Alternatively (or additionally), you can visualize the quality of alignments using tools such as Gecko, which can play your audio file and display the predicted alignments at the same time. The Gecko tool requires you to upload an audio file and at least one CTM file. The Gecko tool can be accessed here: https://gong-io.github.io/gecko/. More information about the Gecko tool can be found on its Github page here: https://github.com/gong-io/gecko. 
+
+**Note**: the following may help improve your experience viewing the CTMs in Gecko:
+* setting `minimum_timestamp_duration` to a larger number, as Gecko may not display some tokens/words/segments properly if their timestamps are too short.
+* setting `remove_blank_tokens_from_ctm=true` if you are analyzing token-level CTMs, as it will make the Gecko visualization less cluttered.
diff --git a/tools/nemo_forced_aligner/align.py b/tools/nemo_forced_aligner/align.py
new file mode 100644
index 0000000000000..5f2a781a381fe
--- /dev/null
+++ b/tools/nemo_forced_aligner/align.py
@@ -0,0 +1,287 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from dataclasses import dataclass, is_dataclass
+from typing import Optional
+
+import torch
+from omegaconf import OmegaConf
+from utils.data_prep import (
+    get_audio_sr,
+    get_batch_starts_ends,
+    get_batch_tensors_and_boundary_info,
+    get_manifest_lines_batch,
+    is_entry_in_all_lines,
+    is_entry_in_any_lines,
+)
+from utils.make_output_files import make_ctm, make_new_manifest
+from utils.viterbi_decoding import viterbi_decoding
+
+from nemo.collections.asr.models.ctc_models import EncDecCTCModel
+from nemo.collections.asr.parts.utils.transcribe_utils import setup_model
+from nemo.core.config import hydra_runner
+from nemo.utils import logging
+
+
+"""
+Align the utterances in manifest_filepath. 
+Results are saved in ctm files in output_dir.
+
+Arguments:
+    pretrained_name: string specifying the name of a CTC NeMo ASR model which will be automatically downloaded
+        from NGC and used for generating the log-probs which we will use to do alignment.
+        Note: NFA can only use CTC models (not Transducer models) at the moment.
+    model_path: string specifying the local filepath to a CTC NeMo ASR model which will be used to generate the
+        log-probs which we will use to do alignment.
+        Note: NFA can only use CTC models (not Transducer models) at the moment.
+        Note: if a model_path is provided, it will override the pretrained_name.
+    model_downsample_factor: an int indicating the downsample factor of the ASR model, ie the ratio of input 
+        timesteps to output timesteps. 
+        If the ASR model is a QuartzNet model, its downsample factor is 2.
+        If the ASR model is a Conformer CTC model, its downsample factor is 4.
+        If the ASR model is a Citirnet model, its downsample factor is 8.
+    manifest_filepath: filepath to the manifest of the data you want to align,
+        containing 'audio_filepath' and 'text' fields.
+    output_dir: the folder where output CTM files and new JSON manifest will be saved.
+    align_using_pred_text: if True, will transcribe the audio using the specified model and then use that transcription 
+        as the 'ground truth' for the forced alignment. 
+    transcribe_device: None, or a string specifying the device that will be used for generating log-probs (i.e. "transcribing").
+        The string needs to be in a format recognized by torch.device(). If None, NFA will set it to 'cuda' if it is available 
+        (otherwise will set it to 'cpu').
+    viterbi_device: None, or string specifying the device that will be used for doing Viterbi decoding. 
+        The string needs to be in a format recognized by torch.device(). If None, NFA will set it to 'cuda' if it is available 
+        (otherwise will set it to 'cpu').
+    batch_size: int specifying batch size that will be used for generating log-probs and doing Viterbi decoding.
+    additional_ctm_grouping_separator:  the string used to separate CTM segments if you want to obtain CTM files at a 
+        level that is not the token level or the word level. NFA will always produce token-level and word-level CTM 
+        files in: `<output_dir>/tokens/<utt_id>.ctm` and `<output_dir>/words/<utt_id>.ctm`. 
+        If `additional_ctm_grouping_separator` is specified, an additional folder 
+        `<output_dir>/{tokens/words/additional_segments}/<utt_id>.ctm` will be created containing CTMs 
+        for `addtional_ctm_grouping_separator`-separated segments. 
+    remove_blank_tokens_from_ctm:  a boolean denoting whether to remove <blank> tokens from token-level output CTMs. 
+    audio_filepath_parts_in_utt_id: int specifying how many of the 'parts' of the audio_filepath
+        we will use (starting from the final part of the audio_filepath) to determine the 
+        utt_id that will be used in the CTM files. Note also that any spaces that are present in the audio_filepath 
+        will be replaced with dashes, so as not to change the number of space-separated elements in the 
+        CTM files.
+        e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 1 => utt_id will be "e1"
+        e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 2 => utt_id will be "d_e1"
+        e.g. if audio_filepath is "/a/b/c/d/e 1.wav" and audio_filepath_parts_in_utt_id is 3 => utt_id will be "c_d_e1"
+    minimum_timestamp_duration: a float indicating a minimum duration (in seconds) for timestamps in the CTM. If any 
+        line in the CTM has a duration lower than the `minimum_timestamp_duration`, it will be enlarged from the 
+        middle outwards until it meets the minimum_timestamp_duration, or reaches the beginning or end of the audio 
+        file. Note that this may cause timestamps to overlap.
+"""
+
+
+@dataclass
+class AlignmentConfig:
+    # Required configs
+    pretrained_name: Optional[str] = None
+    model_path: Optional[str] = None
+    model_downsample_factor: Optional[int] = None
+    manifest_filepath: Optional[str] = None
+    output_dir: Optional[str] = None
+
+    # General configs
+    align_using_pred_text: bool = False
+    transcribe_device: Optional[str] = None
+    viterbi_device: Optional[str] = None
+    batch_size: int = 1
+    additional_ctm_grouping_separator: Optional[str] = None
+    remove_blank_tokens_from_ctm: bool = False
+    minimum_timestamp_duration: float = 0
+    audio_filepath_parts_in_utt_id: int = 1
+
+
+@hydra_runner(config_name="AlignmentConfig", schema=AlignmentConfig)
+def main(cfg: AlignmentConfig):
+
+    logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')
+
+    if is_dataclass(cfg):
+        cfg = OmegaConf.structured(cfg)
+
+    # Validate config
+    if cfg.model_path is None and cfg.pretrained_name is None:
+        raise ValueError("Both cfg.model_path and cfg.pretrained_name cannot be None")
+
+    if cfg.model_path is not None and cfg.pretrained_name is not None:
+        raise ValueError("One of cfg.model_path and cfg.pretrained_name must be None")
+
+    if cfg.model_downsample_factor is None:
+        raise ValueError("cfg.model_downsample_factor must be specified")
+
+    if cfg.manifest_filepath is None:
+        raise ValueError("cfg.manifest_filepath must be specified")
+
+    if cfg.output_dir is None:
+        raise ValueError("cfg.output_dir must be specified")
+
+    if cfg.batch_size < 1:
+        raise ValueError("cfg.batch_size cannot be zero or a negative number")
+
+    if cfg.additional_ctm_grouping_separator == "" or cfg.additional_ctm_grouping_separator == " ":
+        raise ValueError("cfg.additional_grouping_separator cannot be empty string or space character")
+
+    if cfg.minimum_timestamp_duration < 0:
+        raise ValueError("cfg.minimum_timestamp_duration cannot be a negative number")
+
+    # Validate manifest contents
+    if not is_entry_in_all_lines(cfg.manifest_filepath, "audio_filepath"):
+        raise RuntimeError(
+            "At least one line in cfg.manifest_filepath does not contain an 'audio_filepath' entry. "
+            "All lines must contain an 'audio_filepath' entry."
+        )
+
+    if cfg.align_using_pred_text:
+        if is_entry_in_any_lines(cfg.manifest_filepath, "pred_text"):
+            raise RuntimeError(
+                "Cannot specify cfg.align_using_pred_text=True when the manifest at cfg.manifest_filepath "
+                "contains 'pred_text' entries. This is because the audio will be transcribed and may produce "
+                "a different 'pred_text'. This may cause confusion."
+            )
+    else:
+        if not is_entry_in_all_lines(cfg.manifest_filepath, "text"):
+            raise RuntimeError(
+                "At least one line in cfg.manifest_filepath does not contain a 'text' entry. "
+                "NFA requires all lines to contain a 'text' entry when cfg.align_using_pred_text=True."
+            )
+
+    # init devices
+    if cfg.transcribe_device is None:
+        transcribe_device = torch.device("cuda" if torch.cuda.is_available else "cpu")
+    else:
+        transcribe_device = torch.device(cfg.transcribe_device)
+    logging.info(f"Device to be used for transcription step (`transcribe_device`) is {transcribe_device}")
+
+    if cfg.viterbi_device is None:
+        viterbi_device = torch.device("cuda" if torch.cuda.is_available else "cpu")
+    else:
+        viterbi_device = torch.device(cfg.viterbi_device)
+    logging.info(f"Device to be used for viterbi step (`viterbi_device`) is {viterbi_device}")
+
+    if transcribe_device.type == 'cuda' or viterbi_device.type == 'cuda':
+        logging.warning(
+            'One or both of transcribe_device and viterbi_device are GPUs. If you run into OOM errors '
+            'it may help to change both devices to be the CPU.'
+        )
+
+    # load model
+    model, _ = setup_model(cfg, transcribe_device)
+
+    if not isinstance(model, EncDecCTCModel):
+        raise NotImplementedError(
+            f"Model {cfg.model_name} is not an instance of NeMo EncDecCTCModel."
+            " Currently only instances of EncDecCTCModels are supported"
+        )
+
+    audio_sr = get_audio_sr(cfg.manifest_filepath)
+    logging.info(
+        f"Detected audio sampling rate {audio_sr}Hz in first audio in manifest at {cfg.manifest_filepath}. "
+        "Will assume all audios in manifest have this sampling rate. Sampling rate will be used to determine "
+        "timestamps in output CTM."
+    )
+
+    if cfg.minimum_timestamp_duration > 0:
+        logging.warning(
+            f"cfg.minimum_timestamp_duration has been set to {cfg.minimum_timestamp_duration} seconds. "
+            "This may cause the alignments for some tokens/words/additional segments to be overlapping."
+        )
+
+    # get start and end line IDs of batches
+    starts, ends = get_batch_starts_ends(cfg.manifest_filepath, cfg.batch_size)
+
+    if cfg.align_using_pred_text:
+        # record pred_texts to save them in the new manifest at the end of this script
+        pred_text_all_lines = []
+    else:
+        pred_text_all_lines = None
+
+    # get alignment and save in CTM batch-by-batch
+    for start, end in zip(starts, ends):
+        manifest_lines_batch = get_manifest_lines_batch(cfg.manifest_filepath, start, end)
+
+        (
+            log_probs_batch,
+            y_batch,
+            T_batch,
+            U_batch,
+            token_info_batch,
+            word_info_batch,
+            segment_info_batch,
+            pred_text_batch,
+        ) = get_batch_tensors_and_boundary_info(
+            manifest_lines_batch, model, cfg.additional_ctm_grouping_separator, cfg.align_using_pred_text,
+        )
+
+        if cfg.align_using_pred_text:
+            pred_text_all_lines.extend(pred_text_batch)
+
+        alignments_batch = viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device)
+
+        make_ctm(
+            token_info_batch,
+            alignments_batch,
+            manifest_lines_batch,
+            model,
+            cfg.model_downsample_factor,
+            os.path.join(cfg.output_dir, "tokens"),
+            cfg.remove_blank_tokens_from_ctm,
+            cfg.audio_filepath_parts_in_utt_id,
+            cfg.minimum_timestamp_duration,
+            audio_sr,
+        )
+
+        make_ctm(
+            word_info_batch,
+            alignments_batch,
+            manifest_lines_batch,
+            model,
+            cfg.model_downsample_factor,
+            os.path.join(cfg.output_dir, "words"),
+            False,  # dont try to remove blank tokens because we dont expect them to be there anyway
+            cfg.audio_filepath_parts_in_utt_id,
+            cfg.minimum_timestamp_duration,
+            audio_sr,
+        )
+
+        if cfg.additional_ctm_grouping_separator:
+            make_ctm(
+                segment_info_batch,
+                alignments_batch,
+                manifest_lines_batch,
+                model,
+                cfg.model_downsample_factor,
+                os.path.join(cfg.output_dir, "additional_segments"),
+                False,  # dont try to remove blank tokens because we dont expect them to be there anyway
+                cfg.audio_filepath_parts_in_utt_id,
+                cfg.minimum_timestamp_duration,
+                audio_sr,
+            )
+
+    make_new_manifest(
+        cfg.output_dir,
+        cfg.manifest_filepath,
+        cfg.additional_ctm_grouping_separator,
+        cfg.audio_filepath_parts_in_utt_id,
+        pred_text_all_lines,
+    )
+
+    return None
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/nemo_forced_aligner/requirements.txt b/tools/nemo_forced_aligner/requirements.txt
new file mode 100644
index 0000000000000..3af8ebf1b4881
--- /dev/null
+++ b/tools/nemo_forced_aligner/requirements.txt
@@ -0,0 +1,2 @@
+nemo_toolkit[all]
+pytest
diff --git a/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py
new file mode 100644
index 0000000000000..406c4be1fb702
--- /dev/null
+++ b/tools/nemo_forced_aligner/tests/test_add_t_start_end_to_boundary_info.py
@@ -0,0 +1,121 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import pytest
+from utils.make_output_files import add_t_start_end_to_boundary_info
+
+ALIGNMENT = [
+    1,
+    1,
+    3,
+    3,
+    4,
+    5,
+    7,
+    7,
+    9,
+    10,
+    11,
+    12,
+    13,
+    15,
+    17,
+    17,
+    19,
+    21,
+    23,
+    23,
+]
+
+INPUT_TOKEN_INFO = [
+    {'text': '<b>', 's_start': 0, 's_end': 0},
+    {'text': 'h', 's_start': 1, 's_end': 1},
+    {'text': '<b>', 's_start': 2, 's_end': 2},
+    {'text': 'i', 's_start': 3, 's_end': 3},
+    {'text': '<b>', 's_start': 4, 's_end': 4},
+    {'text': '<space>', 's_start': 5, 's_end': 5},
+    {'text': '<b>', 's_start': 6, 's_end': 6},
+    {'text': 'w', 's_start': 7, 's_end': 7},
+    {'text': '<b>', 's_start': 8, 's_end': 8},
+    {'text': 'o', 's_start': 9, 's_end': 9},
+    {'text': '<b>', 's_start': 10, 's_end': 10},
+    {'text': 'r', 's_start': 11, 's_end': 11},
+    {'text': '<b>', 's_start': 12, 's_end': 12},
+    {'text': 'l', 's_start': 13, 's_end': 13},
+    {'text': '<b>', 's_start': 14, 's_end': 14},
+    {'text': 'd', 's_start': 15, 's_end': 15},
+    {'text': '<b>', 's_start': 16, 's_end': 16},
+    {'text': '<space>', 's_start': 17, 's_end': 17},
+    {'text': '<b>', 's_start': 18, 's_end': 18},
+    {'text': 'h', 's_start': 19, 's_end': 19},
+    {'text': '<b>', 's_start': 20, 's_end': 20},
+    {'text': 'e', 's_start': 21, 's_end': 21},
+    {'text': '<b>', 's_start': 22, 's_end': 22},
+    {'text': 'y', 's_start': 23, 's_end': 23},
+    {'text': '<b>', 's_start': 24, 's_end': 24},
+]
+
+EXPECTED_OUTPUT_TOKEN_INFO = [
+    {'text': 'h', 's_start': 1, 's_end': 1, 't_start': 0, 't_end': 1},
+    {'text': 'i', 's_start': 3, 's_end': 3, 't_start': 2, 't_end': 3},
+    {'text': '<b>', 's_start': 4, 's_end': 4, 't_start': 4, 't_end': 4},
+    {'text': '<space>', 's_start': 5, 's_end': 5, 't_start': 5, 't_end': 5},
+    {'text': 'w', 's_start': 7, 's_end': 7, 't_start': 6, 't_end': 7},
+    {'text': 'o', 's_start': 9, 's_end': 9, 't_start': 8, 't_end': 8},
+    {'text': '<b>', 's_start': 10, 's_end': 10, 't_start': 9, 't_end': 9},
+    {'text': 'r', 's_start': 11, 's_end': 11, 't_start': 10, 't_end': 10},
+    {'text': '<b>', 's_start': 12, 's_end': 12, 't_start': 11, 't_end': 11},
+    {'text': 'l', 's_start': 13, 's_end': 13, 't_start': 12, 't_end': 12},
+    {'text': 'd', 's_start': 15, 's_end': 15, 't_start': 13, 't_end': 13},
+    {'text': '<space>', 's_start': 17, 's_end': 17, 't_start': 14, 't_end': 15},
+    {'text': 'h', 's_start': 19, 's_end': 19, 't_start': 16, 't_end': 16},
+    {'text': 'e', 's_start': 21, 's_end': 21, 't_start': 17, 't_end': 17},
+    {'text': 'y', 's_start': 23, 's_end': 23, 't_start': 18, 't_end': 19},
+]
+
+
+INPUT_WORD_INFO = [
+    {'text': 'hi', 's_start': 1, 's_end': 3},
+    {'text': 'world', 's_start': 7, 's_end': 15},
+    {'text': 'hey', 's_start': 19, 's_end': 23},
+]
+
+EXPECTED_OUTPUT_WORD_INFO = [
+    {'text': 'hi', 's_start': 1, 's_end': 3, 't_start': 0, 't_end': 3},
+    {'text': 'world', 's_start': 7, 's_end': 15, 't_start': 6, 't_end': 13},
+    {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19},
+]
+
+INPUT_SEGMENT_INFO = [
+    {'text': 'hi world', 's_start': 1, 's_end': 15},
+    {'text': 'hey', 's_start': 19, 's_end': 23},
+]
+
+EXPECTED_OUTPUT_SEGMENT_INFO = [
+    {'text': 'hi world', 's_start': 1, 's_end': 15, 't_start': 0, 't_end': 13},
+    {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19},
+]
+
+
+@pytest.mark.parametrize(
+    "input_boundary_info_utt,alignment_utt,expected_output_boundary_info_utt",
+    [
+        (INPUT_TOKEN_INFO, ALIGNMENT, EXPECTED_OUTPUT_TOKEN_INFO),
+        (INPUT_WORD_INFO, ALIGNMENT, EXPECTED_OUTPUT_WORD_INFO),
+        (INPUT_SEGMENT_INFO, ALIGNMENT, EXPECTED_OUTPUT_SEGMENT_INFO),
+    ],
+)
+def test_add_t_start_end_to_boundary_info(input_boundary_info_utt, alignment_utt, expected_output_boundary_info_utt):
+    output_boundary_info_utt = add_t_start_end_to_boundary_info(input_boundary_info_utt, alignment_utt)
+    assert output_boundary_info_utt == expected_output_boundary_info_utt
diff --git a/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py
new file mode 100644
index 0000000000000..f5bc722d5a1c7
--- /dev/null
+++ b/tools/nemo_forced_aligner/tests/test_get_y_and_boundary_info_for_utt.py
@@ -0,0 +1,158 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import pytest
+from utils.data_prep import get_y_and_boundary_info_for_utt
+
+from nemo.collections.asr.models import ASRModel
+
+EN_TEXT = "hi world | hey"
+
+EN_QN_EXPECTED_TOKEN_INFO = [
+    {'text': '<b>', 's_start': 0, 's_end': 0},
+    {'text': 'h', 's_start': 1, 's_end': 1},
+    {'text': '<b>', 's_start': 2, 's_end': 2},
+    {'text': 'i', 's_start': 3, 's_end': 3},
+    {'text': '<b>', 's_start': 4, 's_end': 4},
+    {'text': '<space>', 's_start': 5, 's_end': 5},
+    {'text': '<b>', 's_start': 6, 's_end': 6},
+    {'text': 'w', 's_start': 7, 's_end': 7},
+    {'text': '<b>', 's_start': 8, 's_end': 8},
+    {'text': 'o', 's_start': 9, 's_end': 9},
+    {'text': '<b>', 's_start': 10, 's_end': 10},
+    {'text': 'r', 's_start': 11, 's_end': 11},
+    {'text': '<b>', 's_start': 12, 's_end': 12},
+    {'text': 'l', 's_start': 13, 's_end': 13},
+    {'text': '<b>', 's_start': 14, 's_end': 14},
+    {'text': 'd', 's_start': 15, 's_end': 15},
+    {'text': '<b>', 's_start': 16, 's_end': 16},
+    {'text': '<space>', 's_start': 17, 's_end': 17},
+    {'text': '<b>', 's_start': 18, 's_end': 18},
+    {'text': 'h', 's_start': 19, 's_end': 19},
+    {'text': '<b>', 's_start': 20, 's_end': 20},
+    {'text': 'e', 's_start': 21, 's_end': 21},
+    {'text': '<b>', 's_start': 22, 's_end': 22},
+    {'text': 'y', 's_start': 23, 's_end': 23},
+    {'text': '<b>', 's_start': 24, 's_end': 24},
+]
+
+EN_QN_EXPECTED_WORD_INFO = [
+    {'text': 'hi', 's_start': 1, 's_end': 3},
+    {'text': 'world', 's_start': 7, 's_end': 15},
+    {'text': 'hey', 's_start': 19, 's_end': 23},
+]
+
+EN_QN_EXPECTED_SEGMENT_INFO = [
+    {'text': 'hi world', 's_start': 1, 's_end': 15},
+    {'text': 'hey', 's_start': 19, 's_end': 23},
+]
+
+EN_CN_EXPECTED_TOKEN_INFO = [
+    {'text': '<b>', 's_start': 0, 's_end': 0},
+    {'text': '▁hi', 's_start': 1, 's_end': 1},
+    {'text': '<b>', 's_start': 2, 's_end': 2},
+    {'text': '▁world', 's_start': 3, 's_end': 3},
+    {'text': '<b>', 's_start': 4, 's_end': 4},
+    {'text': '▁he', 's_start': 5, 's_end': 5},
+    {'text': '<b>', 's_start': 6, 's_end': 6},
+    {'text': 'y', 's_start': 7, 's_end': 7},
+    {'text': '<b>', 's_start': 8, 's_end': 8},
+]
+
+EN_CN_EXPECTED_WORD_INFO = [
+    {'text': 'hi', 's_start': 1, 's_end': 1},
+    {'text': 'world', 's_start': 3, 's_end': 3},
+    {'text': 'hey', 's_start': 5, 's_end': 7},
+]
+
+EN_CN_EXPECTED_SEGMENT_INFO = [
+    {'text': 'hi world', 's_start': 1, 's_end': 3},
+    {'text': 'hey', 's_start': 5, 's_end': 7},
+]
+
+
+ZH_TEXT = "人工 智能|技术"
+
+ZH_EXPECTED_TOKEN_INFO = [
+    {'text': '<b>', 's_start': 0, 's_end': 0},
+    {'text': '人', 's_start': 1, 's_end': 1},
+    {'text': '<b>', 's_start': 2, 's_end': 2},
+    {'text': '工', 's_start': 3, 's_end': 3},
+    {'text': '<b>', 's_start': 4, 's_end': 4},
+    {'text': '<space>', 's_start': 5, 's_end': 5},
+    {'text': '<b>', 's_start': 6, 's_end': 6},
+    {'text': '智', 's_start': 7, 's_end': 7},
+    {'text': '<b>', 's_start': 8, 's_end': 8},
+    {'text': '能', 's_start': 9, 's_end': 9},
+    {'text': '<b>', 's_start': 10, 's_end': 10},
+    {'text': '<space>', 's_start': 11, 's_end': 11},
+    {'text': '<b>', 's_start': 12, 's_end': 12},
+    {'text': '技', 's_start': 13, 's_end': 13},
+    {'text': '<b>', 's_start': 14, 's_end': 14},
+    {'text': '术', 's_start': 15, 's_end': 15},
+    {'text': '<b>', 's_start': 16, 's_end': 16},
+]
+
+ZH_EXPECTED_WORD_INFO = [
+    {'text': '人工', 's_start': 1, 's_end': 3},
+    {'text': '智能', 's_start': 7, 's_end': 9},
+    {'text': '技术', 's_start': 13, 's_end': 15},
+]
+
+ZH_EXPECTED_SEGMENT_INFO = [
+    {'text': '人工 智能', 's_start': 1, 's_end': 9},
+    {'text': '技术', 's_start': 13, 's_end': 15},
+]
+
+
+@pytest.mark.parametrize(
+    "text,model_pretrained_name,separator,expected_token_info",
+    [
+        (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_TOKEN_INFO),
+        (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_TOKEN_INFO),
+        (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_TOKEN_INFO),
+    ],
+)
+def test_token_info(text, model_pretrained_name, separator, expected_token_info):
+    model = ASRModel.from_pretrained(model_pretrained_name)
+    _, token_info, *_ = get_y_and_boundary_info_for_utt(text, model, separator)
+    assert token_info == expected_token_info
+
+
+@pytest.mark.parametrize(
+    "text,model_pretrained_name,separator,expected_word_info",
+    [
+        (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_WORD_INFO),
+        (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_WORD_INFO),
+        (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_WORD_INFO),
+    ],
+)
+def test_word_info(text, model_pretrained_name, separator, expected_word_info):
+    model = ASRModel.from_pretrained(model_pretrained_name)
+    _, _, word_info, _ = get_y_and_boundary_info_for_utt(text, model, separator)
+    assert word_info == expected_word_info
+
+
+@pytest.mark.parametrize(
+    "text,model_pretrained_name,separator,expected_segment_info",
+    [
+        (EN_TEXT, "stt_en_quartznet15x5", "|", EN_QN_EXPECTED_SEGMENT_INFO),
+        (EN_TEXT, "stt_en_citrinet_256_gamma_0_25", "|", EN_CN_EXPECTED_SEGMENT_INFO),
+        (ZH_TEXT, "stt_zh_citrinet_512", "|", ZH_EXPECTED_SEGMENT_INFO),
+    ],
+)
+def test_segment_info(text, model_pretrained_name, separator, expected_segment_info):
+    model = ASRModel.from_pretrained(model_pretrained_name)
+    *_, segment_info = get_y_and_boundary_info_for_utt(text, model, separator)
+    assert segment_info == expected_segment_info
diff --git a/tools/nemo_forced_aligner/utils/constants.py b/tools/nemo_forced_aligner/utils/constants.py
new file mode 100644
index 0000000000000..894f880401cbc
--- /dev/null
+++ b/tools/nemo_forced_aligner/utils/constants.py
@@ -0,0 +1,19 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+BLANK_TOKEN = "<b>"
+
+SPACE_TOKEN = "<space>"
+
+V_NEGATIVE_NUM = -1e30
diff --git a/tools/nemo_forced_aligner/utils/data_prep.py b/tools/nemo_forced_aligner/utils/data_prep.py
new file mode 100644
index 0000000000000..26d8a328b50d4
--- /dev/null
+++ b/tools/nemo_forced_aligner/utils/data_prep.py
@@ -0,0 +1,385 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+
+import soundfile as sf
+import torch
+from utils.constants import BLANK_TOKEN, SPACE_TOKEN, V_NEGATIVE_NUM
+
+
+def get_batch_starts_ends(manifest_filepath, batch_size):
+    """
+    Get the start and end ids of the lines we will use for each 'batch'.
+    """
+
+    with open(manifest_filepath, 'r') as f:
+        num_lines_in_manifest = sum(1 for _ in f)
+
+    starts = [x for x in range(0, num_lines_in_manifest, batch_size)]
+    ends = [x - 1 for x in starts]
+    ends.pop(0)
+    ends.append(num_lines_in_manifest)
+
+    return starts, ends
+
+
+def is_entry_in_any_lines(manifest_filepath, entry):
+    """
+    Returns True if entry is a key in any of the JSON lines in manifest_filepath
+    """
+
+    entry_in_manifest = False
+
+    with open(manifest_filepath, 'r') as f:
+        for line in f:
+            data = json.loads(line)
+
+            if entry in data:
+                entry_in_manifest = True
+
+    return entry_in_manifest
+
+
+def is_entry_in_all_lines(manifest_filepath, entry):
+    """
+    Returns True is entry is a key in all of the JSON lines in manifest_filepath.
+    """
+    with open(manifest_filepath, 'r') as f:
+        for line in f:
+            data = json.loads(line)
+
+            if entry not in data:
+                return False
+
+    return True
+
+
+def get_audio_sr(manifest_filepath):
+    """
+    Measure the sampling rate of the audio file in the first line
+    of the manifest at manifest_filepath
+    """
+    with open(manifest_filepath, "r") as f_manifest:
+        first_line = json.loads(f_manifest.readline())
+
+    audio_file = first_line["audio_filepath"]
+    if not os.path.exists(audio_file):
+        raise RuntimeError(f"Did not find filepath {audio_file} which was specified in manifest {manifest_filepath}.")
+
+    with sf.SoundFile(audio_file, "r") as f_audio:
+        return f_audio.samplerate
+
+
+def get_manifest_lines_batch(manifest_filepath, start, end):
+    manifest_lines_batch = []
+    with open(manifest_filepath, "r") as f:
+        for line_i, line in enumerate(f):
+            if line_i == start and line_i == end:
+                manifest_lines_batch.append(json.loads(line))
+                break
+
+            if line_i == end:
+                break
+            if line_i >= start:
+                manifest_lines_batch.append(json.loads(line))
+    return manifest_lines_batch
+
+
+def get_char_tokens(text, model):
+    tokens = []
+    for character in text:
+        if character in model.decoder.vocabulary:
+            tokens.append(model.decoder.vocabulary.index(character))
+        else:
+            tokens.append(len(model.decoder.vocabulary))  # return unk token (same as blank token)
+
+    return tokens
+
+
+def get_y_and_boundary_info_for_utt(text, model, separator):
+    """
+    Get y_token_ids_with_blanks, token_info, word_info and segment_info for the text provided, tokenized 
+    by the model provided.
+    y_token_ids_with_blanks is a list of the indices of the text tokens with the blank token id in between every
+    text token.
+    token_info, word_info and segment_info are lists of dictionaries containing information about 
+    where the tokens/words/segments start and end.
+    For example, 'hi world | hey ' with separator = '|' and tokenized by a BPE tokenizer can have token_info like:
+    token_info = [
+        {'text': '<b>', 's_start': 0, 's_end': 0},
+        {'text': '▁hi', 's_start': 1, 's_end': 1},
+        {'text': '<b>', 's_start': 2, 's_end': 2},
+        {'text': '▁world', 's_start': 3, 's_end': 3},
+        {'text': '<b>', 's_start': 4, 's_end': 4},
+        {'text': '▁he', 's_start': 5, 's_end': 5},
+        {'text': '<b>', 's_start': 6, 's_end': 6},
+        {'text': 'y', 's_start': 7, 's_end': 7},
+        {'text': '<b>', 's_start': 8, 's_end': 8},    
+    ]
+    's_start' and 's_end' indicate where in the sequence of tokens does each token start and end.
+
+    The word_info will be as follows:
+    word_info = [
+        {'text': 'hi', 's_start': 1, 's_end': 1},
+        {'text': 'world', 's_start': 3, 's_end': 3},
+        {'text': 'hey', 's_start': 5, 's_end': 7},
+    ]
+    's_start' and 's_end' indicate where in the sequence of tokens does each word start and end.
+
+    segment_info will be as follows:
+    segment_info = [
+        {'text': 'hi world', 's_start': 1, 's_end': 3},
+        {'text': 'hey', 's_start': 5, 's_end': 7},
+    ]
+    's_start' and 's_end' indicate where in the sequence of tokens does each segment start and end.
+    """
+
+    if not separator:  # if separator is not defined - treat the whole text as one segment
+        segments = [text]
+    else:
+        segments = text.split(separator)
+
+    # remove any spaces at start and end of segments
+    segments = [seg.strip() for seg in segments]
+
+    if hasattr(model, 'tokenizer'):
+
+        BLANK_ID = len(model.decoder.vocabulary)  # TODO: check
+
+        y_token_ids_with_blanks = [BLANK_ID]
+        token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}]
+        word_info = []
+        segment_info = []
+
+        segment_s_pointer = 1  # first segment will start at s=1 because s=0 is a blank
+        word_s_pointer = 1  # first word will start at s=1 because s=0 is a blank
+
+        for segment in segments:
+            words = segment.split(" ")  # we define words to be space-separated sub-strings
+            for word in words:
+
+                word_tokens = model.tokenizer.text_to_tokens(word)
+                word_ids = model.tokenizer.text_to_ids(word)
+                for token, id_ in zip(word_tokens, word_ids):
+                    # add the text token and the blank that follows it
+                    # to our token-based variables
+                    y_token_ids_with_blanks.extend([id_, BLANK_ID])
+                    token_info.extend(
+                        [
+                            {
+                                "text": token,
+                                "s_start": len(y_token_ids_with_blanks) - 2,
+                                "s_end": len(y_token_ids_with_blanks) - 2,
+                            },
+                            {
+                                "text": BLANK_TOKEN,
+                                "s_start": len(y_token_ids_with_blanks) - 1,
+                                "s_end": len(y_token_ids_with_blanks) - 1,
+                            },
+                        ]
+                    )
+
+                # add the word to word_info and increment the word_s_pointer
+                word_info.append(
+                    {
+                        "text": word,
+                        "s_start": word_s_pointer,
+                        "s_end": word_s_pointer + (len(word_tokens) - 1) * 2,  # TODO check this,
+                    }
+                )
+                word_s_pointer += len(word_tokens) * 2  # TODO check this
+
+            # add the segment to segment_info and increment the segment_s_pointer
+            segment_tokens = model.tokenizer.text_to_tokens(segment)
+            segment_info.append(
+                {
+                    "text": segment,
+                    "s_start": segment_s_pointer,
+                    "s_end": segment_s_pointer + (len(segment_tokens) - 1) * 2,
+                }
+            )
+            segment_s_pointer += len(segment_tokens) * 2
+
+        return y_token_ids_with_blanks, token_info, word_info, segment_info
+
+    elif hasattr(model.decoder, "vocabulary"):  # i.e. tokenization is simply character-based
+
+        BLANK_ID = len(model.decoder.vocabulary)  # TODO: check this is correct
+        SPACE_ID = model.decoder.vocabulary.index(" ")
+
+        y_token_ids_with_blanks = [BLANK_ID]
+        token_info = [{"text": BLANK_TOKEN, "s_start": 0, "s_end": 0,}]
+        word_info = []
+        segment_info = []
+
+        segment_s_pointer = 1  # first segment will start at s=1 because s=0 is a blank
+        word_s_pointer = 1  # first word will start at s=1 because s=0 is a blank
+
+        for i_segment, segment in enumerate(segments):
+            words = segment.split(" ")  # we define words to be space-separated characters
+            for i_word, word in enumerate(words):
+
+                # convert string to list of characters
+                word_tokens = list(word)
+                # convert list of characters to list of their ids in the vocabulary
+                word_ids = get_char_tokens(word, model)
+                for token, id_ in zip(word_tokens, word_ids):
+                    # add the text token and the blank that follows it
+                    # to our token-based variables
+                    y_token_ids_with_blanks.extend([id_, BLANK_ID])
+                    token_info.extend(
+                        [
+                            {
+                                "text": token,
+                                "s_start": len(y_token_ids_with_blanks) - 2,
+                                "s_end": len(y_token_ids_with_blanks) - 2,
+                            },
+                            {
+                                "text": BLANK_TOKEN,
+                                "s_start": len(y_token_ids_with_blanks) - 1,
+                                "s_end": len(y_token_ids_with_blanks) - 1,
+                            },
+                        ]
+                    )
+
+                # add space token (and the blank after it) unless this is the final word in the final segment
+                if not (i_segment == len(segments) - 1 and i_word == len(words) - 1):
+                    y_token_ids_with_blanks.extend([SPACE_ID, BLANK_ID])
+                    token_info.extend(
+                        (
+                            {
+                                "text": SPACE_TOKEN,
+                                "s_start": len(y_token_ids_with_blanks) - 2,
+                                "s_end": len(y_token_ids_with_blanks) - 2,
+                            },
+                            {
+                                "text": BLANK_TOKEN,
+                                "s_start": len(y_token_ids_with_blanks) - 1,
+                                "s_end": len(y_token_ids_with_blanks) - 1,
+                            },
+                        )
+                    )
+                # add the word to word_info and increment the word_s_pointer
+                word_info.append(
+                    {
+                        "text": word,
+                        "s_start": word_s_pointer,
+                        "s_end": word_s_pointer + len(word_tokens) * 2 - 2,  # TODO check this,
+                    }
+                )
+                word_s_pointer += len(word_tokens) * 2 + 2  # TODO check this
+
+            # add the segment to segment_info and increment the segment_s_pointer
+            segment_tokens = get_char_tokens(segment, model)
+            segment_info.append(
+                {
+                    "text": segment,
+                    "s_start": segment_s_pointer,
+                    "s_end": segment_s_pointer + (len(segment_tokens) - 1) * 2,
+                }
+            )
+            segment_s_pointer += len(segment_tokens) * 2 + 2
+
+        return y_token_ids_with_blanks, token_info, word_info, segment_info
+
+    else:
+        raise RuntimeError("Cannot get tokens of this model.")
+
+
+def get_batch_tensors_and_boundary_info(manifest_lines_batch, model, separator, align_using_pred_text):
+    """
+    Returns:
+        log_probs, y, T, U (y and U are s.t. every other token is a blank) - these are the tensors we will need
+            during Viterbi decoding.
+        token_info_list, word_info_list, segment_info_list - these are lists of dictionaries which we will need
+            for writing the CTM files with the human-readable alignments.
+        pred_text_list - this is a list of the transcriptions from our model which we will save to our output JSON
+            file if align_using_pred_text is True.
+    """
+
+    # get hypotheses by calling 'transcribe'
+    # we will use the output log_probs, the duration of the log_probs,
+    # and (optionally) the predicted ASR text from the hypotheses
+    audio_filepaths_batch = [line["audio_filepath"] for line in manifest_lines_batch]
+    B = len(audio_filepaths_batch)
+    with torch.no_grad():
+        hypotheses = model.transcribe(audio_filepaths_batch, return_hypotheses=True, batch_size=B)
+
+    log_probs_list_batch = []
+    T_list_batch = []
+    pred_text_batch = []
+    for hypothesis in hypotheses:
+        log_probs_list_batch.append(hypothesis.y_sequence)
+        T_list_batch.append(hypothesis.y_sequence.shape[0])
+        pred_text_batch.append(hypothesis.text)
+
+    # we loop over every line in the manifest that is in our current batch,
+    # and record the y (list of tokens, including blanks), U (list of lengths of y) and
+    # token_info_batch, word_info_batch, segment_info_batch
+    y_list_batch = []
+    U_list_batch = []
+    token_info_batch = []
+    word_info_batch = []
+    segment_info_batch = []
+
+    for i_line, line in enumerate(manifest_lines_batch):
+        if align_using_pred_text:
+            gt_text_for_alignment = pred_text_batch[i_line]
+        else:
+            gt_text_for_alignment = line["text"]
+        y_utt, token_info_utt, word_info_utt, segment_info_utt = get_y_and_boundary_info_for_utt(
+            gt_text_for_alignment, model, separator
+        )
+
+        y_list_batch.append(y_utt)
+        U_list_batch.append(len(y_utt))
+        token_info_batch.append(token_info_utt)
+        word_info_batch.append(word_info_utt)
+        segment_info_batch.append(segment_info_utt)
+
+    # turn log_probs, y, T, U into dense tensors for fast computation during Viterbi decoding
+    T_max = max(T_list_batch)
+    U_max = max(U_list_batch)
+    #  V = the number of tokens in the vocabulary + 1 for the blank token.
+    V = len(model.decoder.vocabulary) + 1
+    T_batch = torch.tensor(T_list_batch)
+    U_batch = torch.tensor(U_list_batch)
+
+    # make log_probs_batch tensor of shape (B x T_max x V)
+    log_probs_batch = V_NEGATIVE_NUM * torch.ones((B, T_max, V))
+    for b, log_probs_utt in enumerate(log_probs_list_batch):
+        t = log_probs_utt.shape[0]
+        log_probs_batch[b, :t, :] = log_probs_utt
+
+    # make y tensor of shape (B x U_max)
+    # populate it initially with all 'V' numbers so that the 'V's will remain in the areas that
+    # are 'padding'. This will be useful for when we make 'log_probs_reorderd' during Viterbi decoding
+    # in a different function.
+    y_batch = V * torch.ones((B, U_max), dtype=torch.int64)
+    for b, y_utt in enumerate(y_list_batch):
+        U_utt = U_batch[b]
+        y_batch[b, :U_utt] = torch.tensor(y_utt)
+
+    return (
+        log_probs_batch,
+        y_batch,
+        T_batch,
+        U_batch,
+        token_info_batch,
+        word_info_batch,
+        segment_info_batch,
+        pred_text_batch,
+    )
diff --git a/tools/nemo_forced_aligner/utils/make_output_files.py b/tools/nemo_forced_aligner/utils/make_output_files.py
new file mode 100644
index 0000000000000..830bf476ff2f8
--- /dev/null
+++ b/tools/nemo_forced_aligner/utils/make_output_files.py
@@ -0,0 +1,210 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+import os
+from pathlib import Path
+
+import soundfile as sf
+from utils.constants import BLANK_TOKEN, SPACE_TOKEN
+
+
+def _get_utt_id(audio_filepath, audio_filepath_parts_in_utt_id):
+    fp_parts = Path(audio_filepath).parts[-audio_filepath_parts_in_utt_id:]
+    utt_id = Path("_".join(fp_parts)).stem
+    utt_id = utt_id.replace(" ", "-")  # replace any spaces in the filepath with dashes
+    return utt_id
+
+
+def add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt):
+    """
+    We use the list of alignments to add the timesteps where each token/word/segment is predicted to
+    start and end.
+    boundary_info_utt can be any one of the variables referred to as `token_info`, `word_info`, `segment_info` 
+    in other parts of the code.
+
+    e.g. the input boundary info could be
+    boundary_info_utt = [
+        {'text': 'hi', 's_start': 1, 's_end': 3},
+        {'text': 'world', 's_start': 7, 's_end': 15},
+        {'text': 'hey', 's_start': 19, 's_end': 23},
+    ]
+
+    and the alignment could be
+    alignment_utt = [ 1, 1, 3, 3, 4, 5, 7, 7, 9, 10, 11, 12, 13, 15, 17, 17, 19, 21, 23, 23]
+    
+    in which case the output would be:
+    boundary_info_utt = [
+        {'text': 'hi', 's_start': 1, 's_end': 3, 't_start': 0, 't_end': 3},
+        {'text': 'world', 's_start': 7, 's_end': 15, 't_start': 6, 't_end': 13},
+        {'text': 'hey', 's_start': 19, 's_end': 23, 't_start': 16, 't_end': 19},
+    ]
+    """
+    # first remove boundary_info of any items that are not in the alignment
+    # the only items we expect not to be in the alignment are blanks that the alignment chooses to skip
+    # we will iterate boundary_info in reverse order for this to make popping the items simple
+    s_in_alignment = set(alignment_utt)
+    for boundary_info_pointer in range(len(boundary_info_utt) - 1, -1, -1):
+        s_in_boundary_info = set(
+            range(
+                boundary_info_utt[boundary_info_pointer]["s_start"],
+                boundary_info_utt[boundary_info_pointer]["s_end"] + 1,
+            )
+        )
+        item_not_in_alignment = True
+        for s_ in s_in_boundary_info:
+            if s_ in s_in_alignment:
+                item_not_in_alignment = False
+
+        if item_not_in_alignment:
+            boundary_info_utt.pop(boundary_info_pointer)
+
+    # now update boundary_info with t_start and t_end
+    boundary_info_pointer = 0
+    for t, s_at_t in enumerate(alignment_utt):
+        if s_at_t == boundary_info_utt[boundary_info_pointer]["s_start"]:
+            if "t_start" not in boundary_info_utt[boundary_info_pointer]:
+                # we have just reached the start of the word/token/segment in the alignment => update t_start
+                boundary_info_utt[boundary_info_pointer]["t_start"] = t
+
+        if t < len(alignment_utt) - 1:  # this if is to avoid accessing an index that is not in the list
+            if alignment_utt[t + 1] > boundary_info_utt[boundary_info_pointer]["s_end"]:
+                if "t_end" not in boundary_info_utt[boundary_info_pointer]:
+                    boundary_info_utt[boundary_info_pointer]["t_end"] = t
+
+                boundary_info_pointer += 1
+        else:  # i.e. t == len(alignment) - 1, i.e. we are a the final element in alignment
+            # add final t_end if we haven't already
+            if "t_end" not in boundary_info_utt[boundary_info_pointer]:
+                boundary_info_utt[boundary_info_pointer]["t_end"] = t
+
+        if boundary_info_pointer == len(boundary_info_utt):
+            # we have finished populating boundary_info with t_start and t_end,
+            # but we might have some final remaining elements (blanks) in the alignment which we dont care about
+            # => break, so as not to cause issues trying to access boundary_info[boundary_info_pointer]
+            break
+
+    return boundary_info_utt
+
+
+def make_ctm(
+    boundary_info_batch,
+    alignments_batch,
+    manifest_lines_batch,
+    model,
+    model_downsample_factor,
+    output_dir,
+    remove_blank_tokens_from_ctm,
+    audio_filepath_parts_in_utt_id,
+    minimum_timestamp_duration,
+    audio_sr,
+):
+    """
+    Function to save CTM files for all the utterances in the incoming batch.
+    """
+
+    assert len(boundary_info_batch) == len(alignments_batch) == len(manifest_lines_batch)
+    # we also assume that utterances are in the same order in boundary_info_batch, alignments_batch
+    # and manifest_lines_batch - this should be the case unless there is a strange bug upstream in the
+    # code
+
+    os.makedirs(output_dir, exist_ok=True)
+
+    # the ratio to convert from timesteps (the units of 't_start' and 't_end' in boundary_info_utt)
+    # to the number of samples ('samples' in the sense of 16000 'samples' per second)
+    timestep_to_sample_ratio = model.preprocessor.featurizer.hop_length * model_downsample_factor
+
+    for boundary_info_utt, alignment_utt, manifest_line in zip(
+        boundary_info_batch, alignments_batch, manifest_lines_batch
+    ):
+
+        boundary_info_utt = add_t_start_end_to_boundary_info(boundary_info_utt, alignment_utt)
+
+        # get utt_id that will be used for saving CTM file as <utt_id>.ctm
+        utt_id = _get_utt_id(manifest_line['audio_filepath'], audio_filepath_parts_in_utt_id)
+
+        # get audio file duration if we will need it later
+        if minimum_timestamp_duration > 0:
+            with sf.SoundFile(manifest_line["audio_filepath"]) as f:
+                audio_file_duration = f.frames / f.samplerate
+
+        with open(os.path.join(output_dir, f"{utt_id}.ctm"), "w") as f_ctm:
+            for boundary_info_ in boundary_info_utt:  # loop over every token/word/segment
+                text = boundary_info_["text"]
+                start_sample = boundary_info_["t_start"] * timestep_to_sample_ratio
+                end_sample = (boundary_info_["t_end"] + 1) * timestep_to_sample_ratio - 1
+
+                start_time = start_sample / audio_sr
+                end_time = end_sample / audio_sr
+
+                if minimum_timestamp_duration > 0 and minimum_timestamp_duration > end_time - start_time:
+                    # make the predicted duration of the token/word/segment longer, growing it outwards equal
+                    # amounts from the predicted center of the token/word/segment
+                    token_mid_point = (start_time + end_time) / 2
+                    start_time = max(token_mid_point - minimum_timestamp_duration / 2, 0)
+                    end_time = min(token_mid_point + minimum_timestamp_duration / 2, audio_file_duration)
+
+                if not (text == BLANK_TOKEN and remove_blank_tokens_from_ctm):  # don't save blanks if we don't want to
+                    # replace any spaces with <space> so we dont introduce extra space characters to our CTM files
+                    text = text.replace(" ", SPACE_TOKEN)
+
+                    f_ctm.write(f"{utt_id} 1 {start_time:.2f} {end_time - start_time:.2f} {text}\n")
+
+    return None
+
+
+def make_new_manifest(
+    output_dir,
+    original_manifest_filepath,
+    additional_ctm_grouping_separator,
+    audio_filepath_parts_in_utt_id,
+    pred_text_all_lines,
+):
+    """
+    Function to save a new manifest with the same info as the original manifest, but also the paths to the
+    CTM files for each utterance and the "pred_text" if it was used for the alignment.
+    """
+    if pred_text_all_lines:
+        with open(original_manifest_filepath, 'r') as f:
+            num_lines_in_manifest = sum(1 for _ in f)
+
+        if not num_lines_in_manifest == len(pred_text_all_lines):
+            raise RuntimeError(
+                f"Number of lines in the original manifest ({num_lines_in_manifest}) does not match "
+                f"the number of pred_texts we have ({len(pred_text_all_lines)}). Something has gone wrong."
+            )
+
+    tgt_manifest_name = str(Path(original_manifest_filepath).stem) + "_with_ctm_paths.json"
+    tgt_manifest_filepath = str(Path(output_dir) / tgt_manifest_name)
+
+    with open(original_manifest_filepath, 'r') as fin, open(tgt_manifest_filepath, 'w') as fout:
+        for i_line, line in enumerate(fin):
+            data = json.loads(line)
+
+            utt_id = _get_utt_id(data["audio_filepath"], audio_filepath_parts_in_utt_id)
+
+            data["token_level_ctm_filepath"] = str(Path(output_dir) / "tokens" / f"{utt_id}.ctm")
+            data["word_level_ctm_filepath"] = str(Path(output_dir) / "words" / f"{utt_id}.ctm")
+
+            if additional_ctm_grouping_separator:
+                data["additional_segment_level_ctm_filepath"] = str(
+                    Path(output_dir) / "additional_segments" / f"{utt_id}.ctm"
+                )
+
+            if pred_text_all_lines:
+                data['pred_text'] = pred_text_all_lines[i_line]
+
+            new_line = json.dumps(data)
+
+            fout.write(f"{new_line}\n")
diff --git a/tools/nemo_forced_aligner/utils/viterbi_decoding.py b/tools/nemo_forced_aligner/utils/viterbi_decoding.py
new file mode 100644
index 0000000000000..bc9a45dda527b
--- /dev/null
+++ b/tools/nemo_forced_aligner/utils/viterbi_decoding.py
@@ -0,0 +1,136 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import torch
+from utils.constants import V_NEGATIVE_NUM
+
+
+def viterbi_decoding(log_probs_batch, y_batch, T_batch, U_batch, viterbi_device):
+    """
+    Do Viterbi decoding with an efficient algorithm (the only for-loop in the 'forward pass' is over the time dimension). 
+    Args:
+        log_probs_batch: tensor of shape (B, T_max, V). The parts of log_probs_batch which are 'padding' are filled
+            with 'V_NEGATIVE_NUM' - a large negative number which represents a very low probability.
+        y_batch: tensor of shape (B, U_max) - contains token IDs including blanks in every other position. The parts of
+            y_batch which are padding are filled with the number 'V'. V = the number of tokens in the vocabulary + 1 for
+            the blank token.
+        T_batch: tensor of shape (B, 1) - contains the durations of the log_probs_batch (so we can ignore the 
+            parts of log_probs_batch which are padding)
+        U_batch: tensor of shape (B, 1) - contains the lengths of y_batch (so we can ignore the parts of y_batch
+            which are padding).
+        viterbi_device: the torch device on which Viterbi decoding will be done.
+
+    Returns:
+        alignments_batch: list of lists containing locations for the tokens we align to at each timestep.
+            Looks like: [[0, 0, 1, 2, 2, 3, 3, ...,  ], ..., [0, 1, 2, 2, 2, 3, 4, ....]].
+            Each list inside alignments_batch is of length T_batch[location of utt in batch].
+    """
+    B, T_max, _ = log_probs_batch.shape
+    U_max = y_batch.shape[1]
+
+    # transfer all tensors to viterbi_device
+    log_probs_batch = log_probs_batch.to(viterbi_device)
+    y_batch = y_batch.to(viterbi_device)
+    T_batch = T_batch.to(viterbi_device)
+    U_batch = U_batch.to(viterbi_device)
+
+    # make tensor that we will put at timesteps beyond the duration of the audio
+    padding_for_log_probs = V_NEGATIVE_NUM * torch.ones((B, T_max, 1), device=viterbi_device)
+    # make log_probs_padded tensor of shape (B, T_max, V +1 ) where all of
+    # log_probs_padded[:,:,-1] is the 'V_NEGATIVE_NUM'
+    log_probs_padded = torch.cat((log_probs_batch, padding_for_log_probs), dim=2)
+    # make log_probs_reordered tensor of shape (B, T_max, U_max)
+    # it contains the log_probs for only the tokens that are in the Ground Truth, and in the order
+    # that they occur
+    log_probs_reordered = torch.gather(input=log_probs_padded, dim=2, index=y_batch.unsqueeze(1).repeat(1, T_max, 1))
+
+    # initialize tensors of viterbi probabilies and backpointers
+    v_matrix = V_NEGATIVE_NUM * torch.ones_like(log_probs_reordered)
+    backpointers = -999 * torch.ones_like(v_matrix)
+    v_matrix[:, 0, :2] = log_probs_reordered[:, 0, :2]
+
+    # Make a letter_repetition_mask the same shape as y_batch
+    # the letter_repetition_mask will have 'True' where the token (including blanks) is the same
+    # as the token two places before it in the ground truth (and 'False everywhere else).
+    # We will use letter_repetition_mask to determine whether the Viterbi algorithm needs to look two tokens back or
+    # three tokens back
+    y_shifted_left = torch.roll(y_batch, shifts=2, dims=1)
+    letter_repetition_mask = y_batch - y_shifted_left
+    letter_repetition_mask[:, :2] = 1  # make sure dont apply mask to first 2 tokens
+    letter_repetition_mask = letter_repetition_mask == 0
+
+    # bp_absolute_template is a tensor we will need during the Viterbi decoding to convert our argmaxes from indices between 0 and 2,
+    # to indices in the range (0, U_max-1) indicating from which token the mostly path up to that point came from.
+    # it is a tensor of shape (B, U_max) that looks like
+    # bp_absolute_template = [
+    #   [0, 1, 2, ...,, U_max]
+    #   [0, 1, 2, ...,, U_max]
+    #   [0, 1, 2, ...,, U_max]
+    #   ... rows repeated so there are B number of rows in total
+    # ]
+    bp_absolute_template = torch.arange(U_max, device=viterbi_device).unsqueeze(0).repeat(B, 1)
+
+    for t in range(1, T_max):
+
+        # e_current is a tensor of shape (B, U_max) of the log probs of every possible token at the current timestep
+        e_current = log_probs_reordered[:, t, :]
+
+        # v_prev is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and in the same token position
+        v_prev = v_matrix[:, t - 1, :]
+
+        # v_prev_shifted is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and 1 token position back
+        v_prev_shifted = torch.roll(v_prev, shifts=1, dims=1)
+        # by doing a roll shift of size 1, we have brought the viterbi probability in the final token position to the
+        # first token position - let's overcome this by 'zeroing out' the probabilities in the firest token position
+        v_prev_shifted[:, 0] = V_NEGATIVE_NUM
+
+        # v_prev_shifted2 is a tensor of shape (B, U_max) of the viterbi probabilities 1 timestep back and 2 token position back
+        v_prev_shifted2 = torch.roll(v_prev, shifts=2, dims=1)
+        v_prev_shifted2[:, :2] = V_NEGATIVE_NUM  # zero out as we did for v_prev_shifted
+        # use our letter_repetition_mask to remove the connections between 2 blanks (so we don't skip over a letter)
+        # and to remove the connections between 2 consective letters (so we don't skip over a blank)
+        v_prev_shifted2.masked_fill_(letter_repetition_mask, V_NEGATIVE_NUM)
+
+        # we need this v_prev_dup tensor so we can calculated the viterbi probability of every possible
+        # token position simultaneously
+        v_prev_dup = torch.cat(
+            (v_prev.unsqueeze(2), v_prev_shifted.unsqueeze(2), v_prev_shifted2.unsqueeze(2),), dim=2,
+        )
+
+        # candidates_v_current are our candidate viterbi probabilities for every token position, from which
+        # we will pick the max and record the argmax
+        candidates_v_current = v_prev_dup + e_current.unsqueeze(2)
+        v_current, bp_relative = torch.max(candidates_v_current, dim=2)
+
+        # convert our argmaxes from indices between 0 and 2, to indices in the range (0, U_max-1) indicating
+        # from which token the mostly path up to that point came from
+        bp_absolute = bp_absolute_template - bp_relative
+
+        # update our tensors containing all the viterbi probabilites and backpointers
+        v_matrix[:, t, :] = v_current
+        backpointers[:, t, :] = bp_absolute
+
+    # trace backpointers TODO: parallelize over batch_size
+    alignments_batch = []
+    for b in range(B):
+        T_b = int(T_batch[b])
+        U_b = int(U_batch[b])
+
+        final_state = int(torch.argmax(v_matrix[b, T_b - 1, U_b - 2 : U_b])) + U_b - 2
+        alignment_b = [final_state]
+        for t in range(T_b - 1, 0, -1):
+            alignment_b.insert(0, int(backpointers[b, t, alignment_b[0]]))
+        alignments_batch.append(alignment_b)
+
+    return alignments_batch