notebooks' link, typo and import fix #4158

fayejf · 2022-05-12T07:01:34Z

What does this PR do ?

bug fix for notebook

Collection:ASR

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: fayejf <[email protected]>

fayejf · 2022-05-12T07:02:59Z

removed the always changing and not vital links in some notebooks. Please let me now if you have any objection.
@titu1994 @nithinraok

* redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]>

* update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * Restored tests previously disabled for 22.03 base (#4109) Signed-off-by: Boris Fomitchev <[email protected]> * add augmentation to label models (#4113) * add augmentation to label models Signed-off-by: nithinraok <[email protected]> * duration fix Signed-off-by: nithinraok <[email protected]> * Call register_bert_model after assigning self.bert_model variable (#4116) Signed-off-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> * Tutorial on ITN with Thutmose tagger and small fixes (#4117) * 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output Signed-off-by: Alexandra Antonova <[email protected]> * fixes for code review Signed-off-by: Alexandra Antonova <[email protected]> * Add tutorial to tutorials.rst Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> * update the default (#4135) Signed-off-by: ekmb <[email protected]> * Fix/punctuation avoid overwritting tmp files (#4144) * Add draft of fixing tmp files overwritting Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Use built-in tempfile library Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * bug_fix_diarization_manifest_creation (#4125) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * WaveGlow input type fixes (#4151) Signed-off-by: Jocelyn Huang <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * Thutmose tagger bug fixes (#4162) * add pretrained ngc model, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * 1. fix typos. 2. write magic functions without space Signed-off-by: Alexandra Antonova <[email protected]> * add example of inference with pretrained model Signed-off-by: Alexandra Antonova <[email protected]> * changed model location to nemo Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * fix space Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * set plugin to None when no apex (#4171) Signed-off-by: ekmb <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * rename folder VAD->vad (#4163) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update package info and dockerfile Signed-off-by: ericharper <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]>

* redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]>

* update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]>

* update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (NVIDIA#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (NVIDIA#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (NVIDIA#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (NVIDIA#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (NVIDIA#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (NVIDIA#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (NVIDIA#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (NVIDIA#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (NVIDIA#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (NVIDIA#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (NVIDIA#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (NVIDIA#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (NVIDIA#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (NVIDIA#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (NVIDIA#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (NVIDIA#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (NVIDIA#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (NVIDIA#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (NVIDIA#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (NVIDIA#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (NVIDIA#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (NVIDIA#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]>

* Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * fix full_randn bucket hang Signed-off-by: stevehuang52 <[email protected]> * remove unused variables Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]>

* Update container to 22.05 (NVIDIA#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * Merge r1.9.0 main (NVIDIA#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (NVIDIA#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (NVIDIA#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (NVIDIA#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (NVIDIA#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (NVIDIA#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (NVIDIA#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (NVIDIA#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (NVIDIA#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (NVIDIA#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (NVIDIA#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (NVIDIA#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (NVIDIA#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (NVIDIA#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (NVIDIA#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (NVIDIA#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (NVIDIA#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (NVIDIA#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (NVIDIA#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (NVIDIA#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (NVIDIA#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (NVIDIA#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (NVIDIA#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * fix full_randn bucket hang Signed-off-by: stevehuang52 <[email protected]> * remove unused variables Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]>

* stateless RNNT working Signed-off-by: Hainan Xu <[email protected]> * batch decode working Signed-off-by: Hainan Xu <[email protected]> * working backup Signed-off-by: Hainan Xu <[email protected]> * good working version Signed-off-by: Hainan Xu <[email protected]> * temporarily make norm layer have affine Signed-off-by: Hainan Xu <[email protected]> * temp Signed-off-by: Hainan Xu <[email protected]> * temp Signed-off-by: Hainan Xu <[email protected]> * [TTS] add staticmethod decoration for BetaBinomialInterpolator (#4319) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] remove redundant lines and declare global variables and capture (#4320) exception of non-supported windows. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Finetune T5 on the prefix-lm objective (#4328) * Add script and yaml config Signed-off-by: MaximumEntropy <[email protected]> * Fix yaml config Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Update yaml to remove hardcoded model path Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fuse bias with geglu in ParallelMLP (#4213) * add code of fused_bias_geglu * call fused_bias_geglu in ParallelMLP * fix some bugs * change biad_gelu_activation to bias_activation_fusion * fix the setting of bias_actication_fusion for T5 * delete bias_gelu_fusion from T5 example config * push reformatted files * hto4h gemms fusion * remove hto4h gemms fusion * push reformatted files * disable bias_activation_fusion while activation is not geglu * add bias_activation_fusion in yaml config file * add bias_gelu_fusion in T5 config yaml file to pass CI test * change bias_gelu_fusion to bias_activation_fusion for T5 CI test * recover latest change Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Support larger datasets for question answering (#4205) * refactor dialogue state tracking for modelling/dataset interoperability Signed-off-by: Zhilin Wang <[email protected]> * fix style changes Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * fix style raised by lgtm Signed-off-by: Zhilin Wang <[email protected]> * fix style formatting Signed-off-by: Zhilin Wang <[email protected]> * update template to include description of intent Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * changes based on requests in review Signed-off-by: Zhilin Wang <[email protected]> * add compatibility with assistant dataset Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * remove dialogue_state_tracking Signed-off-by: Zhilin Wang <[email protected]> * update huggingface utils for dialogue Signed-off-by: Zhilin Wang <[email protected]> * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * fix style Signed-off-by: Zhilin Wang <[email protected]> * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang <[email protected]> * update style Signed-off-by: Zhilin Wang <[email protected]> * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang <[email protected]> * add slot filling prediction and metrics Signed-off-by: Zhilin Wang <[email protected]> * remove unused code Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * refactor metrics code out of Dialogue GPT Model Signed-off-by: Zhilin Wang <[email protected]> * integrate backward compatible support for IntentSlotClassificationModel (bert model) Signed-off-by: Zhilin Wang <[email protected]> * save prediction file for IntentSlotClassification Signed-off-by: Zhilin Wang <[email protected]> * update dialogue gpt model training for megatron gpt Signed-off-by: Zhilin Wang <[email protected]> * remove batch generate for HF GPT2, which causes lower performance Signed-off-by: Zhilin Wang <[email protected]> * add few shot capability to dialogue gpt model Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile and remove unused import Signed-off-by: Zhilin Wang <[email protected]> * update code description and clarity Signed-off-by: Zhilin Wang <[email protected]> * address PR comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * integrate compatibility with ZeroShotIntentModel Signed-off-by: Zhilin Wang <[email protected]> * rename folder to dialogue due to increased scope and further refactor for clarity Signed-off-by: Zhilin Wang <[email protected]> * added dialogue GPT for sequence generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <[email protected]> * add CI test for DialogueGPTGenerationModel Signed-off-by: Zhilin Wang <[email protected]> * integrate DialogueS2SGenerationModel for generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <[email protected]> * modify huggingface utils to support HF t5/BART models Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * remove unused imports Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update bleu metric Signed-off-by: Zhilin Wang <[email protected]> * fix bleu metric style Signed-off-by: Zhilin Wang <[email protected]> * debug bleu metric Signed-off-by: Zhilin Wang <[email protected]> * debug bleu metric Signed-off-by: Zhilin Wang <[email protected]> * update based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * update 2 based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * update 3 based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * integrate sgd generation based on user user utterance and system slot-values to generate system utterance Signed-off-by: Zhilin Wang <[email protected]> * add validation model saving capabilities Signed-off-by: Zhilin Wang <[email protected]> * cleaned up code for SGD Based Answer extender Signed-off-by: Zhilin Wang <[email protected]> * update Dialogue Generation CI Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * fix Jenkins CI issue" Signed-off-by: Zhilin Wang <[email protected]> * add support for design dataset Signed-off-by: Zhilin Wang <[email protected]> * remove unnecessary imports Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * support megatron for dialogue_s2s_generation_model Signed-off-by: Zhilin Wang <[email protected]> * reduce loaded samples in MSMarcoDataProcessor to 64 when cfg.model.dataset.debug_mode=True Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update CI Signed-off-by: Zhilin Wang <[email protected]> * update checkpoint and predictions filename to include epoch number Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * integrate HF BART MNLI into zero shot intent model Signed-off-by: Zhilin Wang <[email protected]> * integrate Dialogue Nearest Neighbour Model Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * refactor Dialogue SGD Data Processor to make interface for models cleaner Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update Dialogue S2S Generation model for DialogueSGDDataProcessor interface Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * support sgd and drive thru datasets by zero shot model and nearest neighbour model Signed-off-by: Zhilin Wang <[email protected]> * add prediction saving code to nearest neighbour and zero shot intent models Signed-off-by: Zhilin Wang <[email protected]> * fix typo in sgd data processor Signed-off-by: Zhilin Wang <[email protected]> * integrate Dialogue Mellon QA Data Processor Signed-off-by: Zhilin Wang <[email protected]> * update mellon qa Signed-off-by: Zhilin Wang <[email protected]> * update dialogue.py to remove outdated info Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <[email protected]> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <[email protected]> * add dialogue docs Signed-off-by: Zhilin Wang <[email protected]> * address review comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix for cfg Signed-off-by: Zhilin Wang <[email protected]> * make dependency on apex optional Signed-off-by: Zhilin Wang <[email protected]> * change NLPDDPluggin calling logic to make it possible to run without apex Signed-off-by: Zhilin Wang <[email protected]> * add first draft of tutorial Signed-off-by: Zhilin Wang <[email protected]> * reduce ms marco size by removing lines without wellFormedAnswers Signed-off-by: Zhilin Wang <[email protected]> * address pr comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update colab tutorial link in dialogue docs Signed-off-by: Zhilin Wang <[email protected]> * include unit test and some refactor to facilitate unit test Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * address pr issues Signed-off-by: Zhilin Wang <[email protected]> * remove typos in dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * support larger files for question answering Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * remove unnecessary artifacts to reduce memory use Signed-off-by: Zhilin Wang <[email protected]> * put 0 tensor to device Signed-off-by: Zhilin Wang <[email protected]> * update link within dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * restore previously delete files Signed-off-by: Zhilin Wang <[email protected]> * update error handling when loss = nan Signed-off-by: Zhilin Wang <[email protected]> * update nan handling Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update spanning loss func Signed-off-by: Zhilin Wang <[email protected]> * update spanning loss Signed-off-by: Zhilin Wang <[email protected]> * fix type error raised in qa_dataset.py Signed-off-by: Zhilin Wang <[email protected]> * add error checking message Signed-off-by: Zhilin Wang <[email protected]> * revert back to float32 Signed-off-by: Zhilin Wang <[email protected]> * revert back to float32 Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update exp logging Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <[email protected]> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <[email protected]> * limit number of negative samples Signed-off-by: Zhilin Wang <[email protected]> * revert post processing Signed-off-by: Zhilin Wang <[email protected]> * revert post processing Signed-off-by: Zhilin Wang <[email protected]> * remove unused methods and style fix Signed-off-by: Zhilin Wang <[email protected]> * add more documentation Signed-off-by: Zhilin Wang <[email protected]> * remove unused imports Signed-off-by: Zhilin Wang <[email protected]> * changes base on PR review Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fix bugs in indexed dataset exam script (#4325) * fix the typo Signed-off-by: Yi Dong <[email protected]> * add neighbors option Signed-off-by: Yi Dong <[email protected]> * change the argument name Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Adding docs for ASR SSL (#4303) * Initial commit for SSL docs Signed-off-by: Krishna Puvvada <[email protected]> * ssl docs update-1 Signed-off-by: Krishna Puvvada <[email protected]> * ssl docs update-2 Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fuse grad division into async grad allreduce (#4327) * O2 runs but O1 does not Signed-off-by: ericharper <[email protected]> * disable async for O1 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update async flag in configure_optimizers Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * update _require if using async Signed-off-by: ericharper <[email protected]> * clean comments Signed-off-by: ericharper <[email protected]> * always all_reduce Signed-off-by: ericharper <[email protected]> * add async grad allreduce and chunk optimization to T5 * push reformatted files after style check * set chunk size as 0 while async grad allreduce is off * more experiments show that 125MB is a better default chunk size for most cases * add grad_allreduce_chunk_size_mb for GPT-3 * at the end of each training step, wait until all async grad allreduce works are done * replace individual allreduce work.wait() with a single dGPU evice synchroonization * add code of fused_bias_geglu * call fused_bias_geglu in ParallelMLP * record the status of each allreduce work seems too much for perf * add more comments * push a reformatted file * fix some bugs * change biad_gelu_activation to bias_activation_fusion * fix the setting of bias_actication_fusion for T5 * delete bias_gelu_fusion from T5 example config * push reformatted files * fuse grad scale with allreduce * push reformatted files * hto4h gemms fusion * remove hto4h gemms fusion * add grad_scale_ar_fusion into GPT-3 * push reformatted files * push reformatted files * rename grad_scale_ar_fusion to grad_div_ar_fusion * disable bias_activation_fusion while activation is not geglu * add bias_activation_fusion in yaml config file * add bias_gelu_fusion in T5 config yaml file to pass CI test * change bias_gelu_fusion to bias_activation_fusion for T5 CI test * recover latest change * add grad_div_ar_fusion in config yaml file * remove a redundant float() Co-authored-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Torchaudio installation fix (#4330) * separate installer added Signed-off-by: Aleksandr Laptev <[email protected]> * apply suggestions, minor fixes Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] enforced pin_memory = True (#4341) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels (#4266) * initial commit Signed-off-by: Akshit Arora <[email protected]> * cleared notebook outputs Signed-off-by: Akshit Arora <[email protected]> * formatting errors Signed-off-by: Akshit Arora <[email protected]> * formatting Signed-off-by: Akshit Arora <[email protected]> * addressed comments Signed-off-by: Akshit Arora <[email protected]> * addressed comments on tutorial Signed-off-by: Akshit Arora <[email protected]> * updated tutorial Signed-off-by: Akshit Arora <[email protected]> * updated grammar and fastpitch description Signed-off-by: Akshit Arora <[email protected]> * updated with feedback Signed-off-by: Akshit Arora <[email protected]> * updated with feedback Signed-off-by: Akshit Arora <[email protected]> * removed unused imports Signed-off-by: Akshit Arora <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Speedup the speech commands dataset processing script (#4347) * Add multiprocessing support to the google speech commands dataset processing script Signed-off-by: Shantanu Acharya <[email protected]> * fix number of args error with __extract_all_files function Signed-off-by: Shantanu Acharya <[email protected]> * fix styling issues Signed-off-by: Shantanu Acharya <[email protected]> * fix bugs with silence set construction and update librosa output write to use soundfile write Signed-off-by: Shantanu Acharya <[email protected]> * add docstrings and return values in __construct_filepaths as dictionary Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix wrong requirement (#4349) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Refactored path to manifest (#4251) Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * IPA support for TTS (#4310) * IPA tokenizer and G2P untested draft Signed-off-by: Jocelyn Huang <[email protected]> * Add IPA CMUdict and new heteronyms list Signed-off-by: Jocelyn Huang <[email protected]> * Add draft FastPitch IPA config Signed-off-by: Jocelyn Huang <[email protected]> * Minor bugfixes for IPA training Signed-off-by: Jocelyn Huang <[email protected]> * Add phoneme_probability to IPA G2P Signed-off-by: Jocelyn Huang <[email protected]> * Updates to IPA FastPitch training config Signed-off-by: Jocelyn Huang <[email protected]> * Update IPA dict and heteronyms file Signed-off-by: Jocelyn Huang <[email protected]> * Adjust default lr for IPA FastPitch to 1e-3 Signed-off-by: Jocelyn Huang <[email protected]> * Rename IPA CMUdict to reflect date Signed-off-by: Jocelyn Huang <[email protected]> * Add docstrings for IPA tokenizer and G2P, update CMUdict path for config Signed-off-by: Jocelyn Huang <[email protected]> * Fix IPA vocab ordering, add options to uppercase graphemes and remove stress symbols Signed-off-by: Jocelyn Huang <[email protected]> * Mark IPA classes as experimental Signed-off-by: Jocelyn Huang <[email protected]> * Update apostrophe-S cases Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Tn install (#4055) * remove conda pynini requirement Signed-off-by: Yang Zhang <[email protected]> * remove remnants Signed-off-by: Yang Zhang <[email protected]> * merge with main Signed-off-by: Yang Zhang <[email protected]> * removing nlp collection dependency from text processing and thus breaking cyclyc imports Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix wrong requirement Signed-off-by: Yang Zhang <[email protected]> * fix bug in vi Signed-off-by: Yang Zhang <[email protected]> * update jenkins folders Signed-off-by: ekmb <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ekmb <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix tutorial (#4352) Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix the post ln (#4350) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [Fix] Hanging for Fully Randomized Bucketing (#4348) * Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * fix full_randn bucket hang Signed-off-by: stevehuang52 <[email protected]> * remove unused variables Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Bits of RADTTS support (#4343) * Bits of RADTTS support Signed-off-by: Boris Fomitchev <[email protected]> * Fixed args mismatch Signed-off-by: Boris Fomitchev <[email protected]> * Style Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> * More review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Prompt Learning Pipeline Parallel (#4291) * Added get_forward_output_and_loss_func and updated train/val steps Signed-off-by: Virginia Adams <[email protected]> * Added preprocess flag before prompt table/encoder access Signed-off-by: Virginia Adams <[email protected]> * Made two optimizer groups, one for frozen, one for soft prompt Signed-off-by: Virginia Adams <[email protected]> * Pipeline parallel working Signed-off-by: Virginia Adams <[email protected]> * Still figuring out setting lr/sched for one param group Signed-off-by: Virginia Adams <[email protected]> * Set betas to zero Signed-off-by: Virginia Adams <[email protected]> * Only unfreeze one sublayer with lr 0.0 Signed-off-by: Virginia Adams <[email protected]> * Pipeline parallel working w/ one optimizer Signed-off-by: Virginia Adams <[email protected]> * Trying to fix Jenkins file Signed-off-by: Virginia Adams <[email protected]> * Trying to fix Jenkins file Signed-off-by: Virginia Adams <[email protected]> * Getting updated jenkins test to work Signed-off-by: Virginia Adams <[email protected]> * Getting updated jenkins test to work Signed-off-by: Virginia Adams <[email protected]> * added prompt learning tp and pp CI tests Signed-off-by: Virginia Adams <[email protected]> * Added amp_o2 model compatibility Signed-off-by: Virginia Adams <[email protected]> * Made CI test smaller Signed-off-by: Virginia Adams <[email protected]> * Still trying to get Jenkins to work Signed-off-by: Virginia Adams <[email protected]> * Still trying to get Jenkins to work Signed-off-by: Virginia Adams <[email protected]> * Temporarily moving prompt learning CI test to beginning Signed-off-by: Virginia Adams <[email protected]> * Changing the layer being unfrozen Signed-off-by: Virginia Adams <[email protected]> * debug jenkins Signed-off-by: Virginia Adams <[email protected]> * Move pp unfreeze to init Signed-off-by: Virginia Adams <[email protected]> * Try to make Jenkins test parallel Signed-off-by: Virginia Adams <[email protected]> * Fix python formatting Signed-off-by: Virginia Adams <[email protected]> * Moved prompt learning tests back to where they belong Signed-off-by: Virginia Adams <[email protected]> * add back checkpoint convertion CI test Signed-off-by: Virginia Adams <[email protected]> * Revert "add back checkpoint convertion CI test" This reverts commit 61e2ffcdefe964c8e74b74d8c10906ae29f32b6d. * Add back checkpoint conversion test Signed-off-by: Virginia Adams <[email protected]> * Setting requires grad to True everywhere Signed-off-by: Virginia Adams <[email protected]> * Updated config comments and simplified param group code Signed-off-by: Virginia Adams <[email protected]> * Added comment on frozen_model having lr=0.0 Signed-off-by: Virginia Adams <[email protected]> * Added configure optimizers methods Signed-off-by: Virginia Adams <[email protected]> * Set amp_o2 to false Signed-off-by: Virginia Adams <[email protected]> * removed o2 code Signed-off-by: Virginia Adams <[email protected]> * Python formatting fix Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * auto switch conformer encoder adapter in_features (#4354) Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Dataloader, collector, loss and metric for multiscale diarization decoder (#4187) * First commit Signed-off-by: Taejin Park <[email protected]> * Checked funtionality and imports Signed-off-by: Taejin Park <[email protected]> * fixed import issues Signed-off-by: Taejin Park <[email protected]> * Removed the changed made by mistake Signed-off-by: Taejin Park <[email protected]> * Style fix Signed-off-by: Taejin Park <[email protected]> * Fixed LGTM errors 001 Signed-off-by: Taejin Park <[email protected]> * Fixed LGTM and style fix Signed-off-by: Taejin Park <[email protected]> * Changed docstrings Signed-off-by: Taejin Park <[email protected]> * LGTM again Signed-off-by: Taejin Park <[email protected]> * Removed unnecessary torch setting lines Signed-off-by: Taejin Park <[email protected]> * Style fix and isort Signed-off-by: Taejin Park <[email protected]> * jbalam-nv comments reflected Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * Reflected comments and created _diar_label.py Signed-off-by: Taejin Park <[email protected]> * Typo fix and style fix Signed-off-by: Taejin Park <[email protected]> * Fixed target_spks[0] index error Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * LGTM unused import IterDataset Signed-off-by: Taejin Park <[email protected]> * revert collection doc year Signed-off-by: Taejin Park <[email protected]> * Code format error in collections.py Signed-off-by: Taejin Park <[email protected]> * fix collections space format error Signed-off-by: Taejin Park <[email protected]> * merged main correctly Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * Reflected all comments and tested Signed-off-by: Taejin Park <[email protected]> * style fix and LGTM Signed-off-by: Taejin Park <[email protected]> * rttm_filepath to rttm_file and removed self included funcs, tested Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Add ASR CTC Decoding module (#4342) * Initial commit Signed-off-by: smajumdar <[email protected]> * Full support for decoding strategy Signed-off-by: smajumdar <[email protected]> * Temp Signed-off-by: smajumdar <[email protected]> * Fix labels of y_sequence Signed-off-by: smajumdar <[email protected]> * Set support for sentencepiece subword merging Signed-off-by: smajumdar <[email protected]> * Fix char and word based token merge alignment Signed-off-by: smajumdar <[email protected]> * Revert incorrect change Signed-off-by: smajumdar <[email protected]> * Update docstring Signed-off-by: smajumdar <[email protected]> * Improve compatibility with greedy tokens and log probs Signed-off-by: smajumdar <[email protected]> * Update scripts to use decoding strategy Signed-off-by: smajumdar <[email protected]> * Add tests and docs Signed-off-by: smajumdar <[email protected]> * Add tests and docs Signed-off-by: smajumdar <[email protected]> * Fix speaker decoder timestamps Signed-off-by: smajumdar <[email protected]> * Fix speaker decoder timestamps Signed-off-by: smajumdar <[email protected]> * Fix decoding of ctc models Signed-off-by: smajumdar <[email protected]> * Address reviewer comments Signed-off-by: smajumdar <[email protected]> * Address reviewer comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Option to disable mp in VAD via num_workers=1 (#4317) * Option to disable mp in VAD via num_workers=1 In certain environments python multiprocessing can deadlock. This adds a convenient version to disable by setting num_workers to 1. Signed-off-by: Georg Kucsko <[email protected]> * add none handling Signed-off-by: Georg Kucsko <[email protected]> * additional none handling Signed-off-by: Georg Kucsko <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * remove redundant bias expand (#4382) * remove redundant bias expand Signed-off-by: Xiaowei Ren <[email protected]> * delete redundant code Signed-off-by: Xiaowei Ren <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Add option for specifying wandb save_dir from config (#4379) * give option to user to specify wandb save dir via config Signed-off-by: Shantanu Acharya <[email protected]> * create save_dir directory for wandb logger if not exists Signed-off-by: Shantanu Acharya <[email protected]> * update save_dir get method with a default value Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Quick wav2vec fix. In-place operation adding convolutional positions to encoder was overwriting leaf history. Wasn't caught on previous torch versions. (#4383) Signed-off-by: tbartley94 <[email protected]> Co-authored-by: tbartley94 <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. (#4388) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Merge r1.10.0 main (#4398) * update branch Signed-off-by: ericharper <[email protected]> * Set headscale false (#4364) Signed-off-by: MaximumEntropy <[email protected]> * Add wandb as dependency (#4365) Signed-off-by: smajumdar <[email protected]> * Raise trainer error (#4356) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Set headscale false (#4364) (#4366) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: smajumdar <[email protected]> * Finetuning changes for BART (#4003) * Temp Signed-off-by: MaximumEntropy <[email protected]> * Checkpoint converter to nemo for bart Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Make position embedding expansion specific to a batch to avoid checkpoint size mismatches (#4357) * Style Signed-off-by: MaximumEntropy <[email protected]> * Fix logging warning Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Fix electronic bug, new time ITN rule (#4355) * fix electronic bug Signed-off-by: ekmb <[email protected]> * add new itn time rule Signed-off-by: ekmb <[email protected]> * revert domain changes Signed-off-by: ekmb <[email protected]> * remove repetition Signed-off-by: ekmb <[email protected]> * Correct support for dataclasses in default module dim (#4372) * Correct support for dataclasses in default module dim Signed-off-by: smajumdar <[email protected]> * Fix path for save of results Signed-off-by: smajumdar <[email protected]> * fix pad id bug (#4377) Signed-off-by: Yi Dong <[email protected]> * Question answering bug fix (#4381) * refactor dialogue state tracking for modelling/dataset interoperability Signed-off-by: Zhilin Wang <[email protected]> * fix style changes Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * fix style raised by lgtm Signed-off-by: Zhilin Wang <[email protected]> * fix style formatting Signed-off-by: Zhilin Wang <[email protected]> * update template to include description of intent Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * changes based on requests in review Signed-off-by: Zhilin Wang <[email protected]> * add compatibility with assistant dataset Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * remove dialogue_state_tracking Signed-off-by: Zhilin Wang <[email protected]> * update huggingface utils for dialogue Signed-off-by: Zhilin Wang <[email protected]> * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * fix style Signed-off-by: Zhilin Wang <[email protected]> * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang <[email protected]> * update style Signed-off-by: Zhilin Wang <[email protected]> * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang <[email protected]> * add slot filling prediction and metrics Signed-off-by: Zhilin Wang <[email protected]> * remove unused code Signed-off-by: Zhilin Wang <[email protected]> * style fix …

* update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (NVIDIA#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (NVIDIA#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (NVIDIA#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (NVIDIA#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (NVIDIA#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (NVIDIA#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (NVIDIA#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (NVIDIA#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (NVIDIA#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (NVIDIA#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (NVIDIA#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (NVIDIA#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (NVIDIA#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (NVIDIA#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (NVIDIA#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (NVIDIA#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (NVIDIA#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (NVIDIA#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (NVIDIA#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (NVIDIA#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (NVIDIA#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (NVIDIA#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]>

* Update container to 22.05 (NVIDIA#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * Merge r1.9.0 main (NVIDIA#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (NVIDIA#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (NVIDIA#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (NVIDIA#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (NVIDIA#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (NVIDIA#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (NVIDIA#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (NVIDIA#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (NVIDIA#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (NVIDIA#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (NVIDIA#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (NVIDIA#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (NVIDIA#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (NVIDIA#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (NVIDIA#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (NVIDIA#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (NVIDIA#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (NVIDIA#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (NVIDIA#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (NVIDIA#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (NVIDIA#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (NVIDIA#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (NVIDIA#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (NVIDIA#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * fix full_randn bucket hang Signed-off-by: stevehuang52 <[email protected]> * remove unused variables Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]>

* stateless RNNT working Signed-off-by: Hainan Xu <[email protected]> * batch decode working Signed-off-by: Hainan Xu <[email protected]> * working backup Signed-off-by: Hainan Xu <[email protected]> * good working version Signed-off-by: Hainan Xu <[email protected]> * temporarily make norm layer have affine Signed-off-by: Hainan Xu <[email protected]> * temp Signed-off-by: Hainan Xu <[email protected]> * temp Signed-off-by: Hainan Xu <[email protected]> * [TTS] add staticmethod decoration for BetaBinomialInterpolator (#4319) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] remove redundant lines and declare global variables and capture (#4320) exception of non-supported windows. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Finetune T5 on the prefix-lm objective (#4328) * Add script and yaml config Signed-off-by: MaximumEntropy <[email protected]> * Fix yaml config Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Update yaml to remove hardcoded model path Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fuse bias with geglu in ParallelMLP (#4213) * add code of fused_bias_geglu * call fused_bias_geglu in ParallelMLP * fix some bugs * change biad_gelu_activation to bias_activation_fusion * fix the setting of bias_actication_fusion for T5 * delete bias_gelu_fusion from T5 example config * push reformatted files * hto4h gemms fusion * remove hto4h gemms fusion * push reformatted files * disable bias_activation_fusion while activation is not geglu * add bias_activation_fusion in yaml config file * add bias_gelu_fusion in T5 config yaml file to pass CI test * change bias_gelu_fusion to bias_activation_fusion for T5 CI test * recover latest change Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Support larger datasets for question answering (#4205) * refactor dialogue state tracking for modelling/dataset interoperability Signed-off-by: Zhilin Wang <[email protected]> * fix style changes Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * fix style raised by lgtm Signed-off-by: Zhilin Wang <[email protected]> * fix style formatting Signed-off-by: Zhilin Wang <[email protected]> * update template to include description of intent Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * changes based on requests in review Signed-off-by: Zhilin Wang <[email protected]> * add compatibility with assistant dataset Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * remove dialogue_state_tracking Signed-off-by: Zhilin Wang <[email protected]> * update huggingface utils for dialogue Signed-off-by: Zhilin Wang <[email protected]> * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * fix style Signed-off-by: Zhilin Wang <[email protected]> * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang <[email protected]> * update style Signed-off-by: Zhilin Wang <[email protected]> * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang <[email protected]> * add slot filling prediction and metrics Signed-off-by: Zhilin Wang <[email protected]> * remove unused code Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * refactor metrics code out of Dialogue GPT Model Signed-off-by: Zhilin Wang <[email protected]> * integrate backward compatible support for IntentSlotClassificationModel (bert model) Signed-off-by: Zhilin Wang <[email protected]> * save prediction file for IntentSlotClassification Signed-off-by: Zhilin Wang <[email protected]> * update dialogue gpt model training for megatron gpt Signed-off-by: Zhilin Wang <[email protected]> * remove batch generate for HF GPT2, which causes lower performance Signed-off-by: Zhilin Wang <[email protected]> * add few shot capability to dialogue gpt model Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile and remove unused import Signed-off-by: Zhilin Wang <[email protected]> * update code description and clarity Signed-off-by: Zhilin Wang <[email protected]> * address PR comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * integrate compatibility with ZeroShotIntentModel Signed-off-by: Zhilin Wang <[email protected]> * rename folder to dialogue due to increased scope and further refactor for clarity Signed-off-by: Zhilin Wang <[email protected]> * added dialogue GPT for sequence generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <[email protected]> * add CI test for DialogueGPTGenerationModel Signed-off-by: Zhilin Wang <[email protected]> * integrate DialogueS2SGenerationModel for generation task (e.g. answer extender) Signed-off-by: Zhilin Wang <[email protected]> * modify huggingface utils to support HF t5/BART models Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * remove unused imports Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update bleu metric Signed-off-by: Zhilin Wang <[email protected]> * fix bleu metric style Signed-off-by: Zhilin Wang <[email protected]> * debug bleu metric Signed-off-by: Zhilin Wang <[email protected]> * debug bleu metric Signed-off-by: Zhilin Wang <[email protected]> * update based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * update 2 based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * update 3 based on PR #3893 Signed-off-by: Zhilin Wang <[email protected]> * integrate sgd generation based on user user utterance and system slot-values to generate system utterance Signed-off-by: Zhilin Wang <[email protected]> * add validation model saving capabilities Signed-off-by: Zhilin Wang <[email protected]> * cleaned up code for SGD Based Answer extender Signed-off-by: Zhilin Wang <[email protected]> * update Dialogue Generation CI Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * fix Jenkins CI issue" Signed-off-by: Zhilin Wang <[email protected]> * add support for design dataset Signed-off-by: Zhilin Wang <[email protected]> * remove unnecessary imports Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * support megatron for dialogue_s2s_generation_model Signed-off-by: Zhilin Wang <[email protected]> * reduce loaded samples in MSMarcoDataProcessor to 64 when cfg.model.dataset.debug_mode=True Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update CI Signed-off-by: Zhilin Wang <[email protected]> * update checkpoint and predictions filename to include epoch number Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * integrate HF BART MNLI into zero shot intent model Signed-off-by: Zhilin Wang <[email protected]> * integrate Dialogue Nearest Neighbour Model Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * refactor Dialogue SGD Data Processor to make interface for models cleaner Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update Dialogue S2S Generation model for DialogueSGDDataProcessor interface Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * update jenkins Signed-off-by: Zhilin Wang <[email protected]> * support sgd and drive thru datasets by zero shot model and nearest neighbour model Signed-off-by: Zhilin Wang <[email protected]> * add prediction saving code to nearest neighbour and zero shot intent models Signed-off-by: Zhilin Wang <[email protected]> * fix typo in sgd data processor Signed-off-by: Zhilin Wang <[email protected]> * integrate Dialogue Mellon QA Data Processor Signed-off-by: Zhilin Wang <[email protected]> * update mellon qa Signed-off-by: Zhilin Wang <[email protected]> * update dialogue.py to remove outdated info Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <[email protected]> * update dialogue_config.yaml Signed-off-by: Zhilin Wang <[email protected]> * add dialogue docs Signed-off-by: Zhilin Wang <[email protected]> * address review comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix for cfg Signed-off-by: Zhilin Wang <[email protected]> * make dependency on apex optional Signed-off-by: Zhilin Wang <[email protected]> * change NLPDDPluggin calling logic to make it possible to run without apex Signed-off-by: Zhilin Wang <[email protected]> * add first draft of tutorial Signed-off-by: Zhilin Wang <[email protected]> * reduce ms marco size by removing lines without wellFormedAnswers Signed-off-by: Zhilin Wang <[email protected]> * address pr comments Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update colab tutorial link in dialogue docs Signed-off-by: Zhilin Wang <[email protected]> * include unit test and some refactor to facilitate unit test Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * address pr issues Signed-off-by: Zhilin Wang <[email protected]> * remove typos in dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * support larger files for question answering Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * remove unnecessary artifacts to reduce memory use Signed-off-by: Zhilin Wang <[email protected]> * put 0 tensor to device Signed-off-by: Zhilin Wang <[email protected]> * update link within dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * restore previously delete files Signed-off-by: Zhilin Wang <[email protected]> * update error handling when loss = nan Signed-off-by: Zhilin Wang <[email protected]> * update nan handling Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update spanning loss func Signed-off-by: Zhilin Wang <[email protected]> * update spanning loss Signed-off-by: Zhilin Wang <[email protected]> * fix type error raised in qa_dataset.py Signed-off-by: Zhilin Wang <[email protected]> * add error checking message Signed-off-by: Zhilin Wang <[email protected]> * revert back to float32 Signed-off-by: Zhilin Wang <[email protected]> * revert back to float32 Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update exp logging Signed-off-by: Zhilin Wang <[email protected]> * update error msgs Signed-off-by: Zhilin Wang <[email protected]> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <[email protected]> * update loading of large file from pickle to json Signed-off-by: Zhilin Wang <[email protected]> * limit number of negative samples Signed-off-by: Zhilin Wang <[email protected]> * revert post processing Signed-off-by: Zhilin Wang <[email protected]> * revert post processing Signed-off-by: Zhilin Wang <[email protected]> * remove unused methods and style fix Signed-off-by: Zhilin Wang <[email protected]> * add more documentation Signed-off-by: Zhilin Wang <[email protected]> * remove unused imports Signed-off-by: Zhilin Wang <[email protected]> * changes base on PR review Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fix bugs in indexed dataset exam script (#4325) * fix the typo Signed-off-by: Yi Dong <[email protected]> * add neighbors option Signed-off-by: Yi Dong <[email protected]> * change the argument name Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Adding docs for ASR SSL (#4303) * Initial commit for SSL docs Signed-off-by: Krishna Puvvada <[email protected]> * ssl docs update-1 Signed-off-by: Krishna Puvvada <[email protected]> * ssl docs update-2 Signed-off-by: Krishna Puvvada <[email protected]> Co-authored-by: Krishna Puvvada <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Fuse grad division into async grad allreduce (#4327) * O2 runs but O1 does not Signed-off-by: ericharper <[email protected]> * disable async for O1 Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update async flag in configure_optimizers Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * update _require if using async Signed-off-by: ericharper <[email protected]> * clean comments Signed-off-by: ericharper <[email protected]> * always all_reduce Signed-off-by: ericharper <[email protected]> * add async grad allreduce and chunk optimization to T5 * push reformatted files after style check * set chunk size as 0 while async grad allreduce is off * more experiments show that 125MB is a better default chunk size for most cases * add grad_allreduce_chunk_size_mb for GPT-3 * at the end of each training step, wait until all async grad allreduce works are done * replace individual allreduce work.wait() with a single dGPU evice synchroonization * add code of fused_bias_geglu * call fused_bias_geglu in ParallelMLP * record the status of each allreduce work seems too much for perf * add more comments * push a reformatted file * fix some bugs * change biad_gelu_activation to bias_activation_fusion * fix the setting of bias_actication_fusion for T5 * delete bias_gelu_fusion from T5 example config * push reformatted files * fuse grad scale with allreduce * push reformatted files * hto4h gemms fusion * remove hto4h gemms fusion * add grad_scale_ar_fusion into GPT-3 * push reformatted files * push reformatted files * rename grad_scale_ar_fusion to grad_div_ar_fusion * disable bias_activation_fusion while activation is not geglu * add bias_activation_fusion in yaml config file * add bias_gelu_fusion in T5 config yaml file to pass CI test * change bias_gelu_fusion to bias_activation_fusion for T5 CI test * recover latest change * add grad_div_ar_fusion in config yaml file * remove a redundant float() Co-authored-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Torchaudio installation fix (#4330) * separate installer added Signed-off-by: Aleksandr Laptev <[email protected]> * apply suggestions, minor fixes Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] enforced pin_memory = True (#4341) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels (#4266) * initial commit Signed-off-by: Akshit Arora <[email protected]> * cleared notebook outputs Signed-off-by: Akshit Arora <[email protected]> * formatting errors Signed-off-by: Akshit Arora <[email protected]> * formatting Signed-off-by: Akshit Arora <[email protected]> * addressed comments Signed-off-by: Akshit Arora <[email protected]> * addressed comments on tutorial Signed-off-by: Akshit Arora <[email protected]> * updated tutorial Signed-off-by: Akshit Arora <[email protected]> * updated grammar and fastpitch description Signed-off-by: Akshit Arora <[email protected]> * updated with feedback Signed-off-by: Akshit Arora <[email protected]> * updated with feedback Signed-off-by: Akshit Arora <[email protected]> * removed unused imports Signed-off-by: Akshit Arora <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Speedup the speech commands dataset processing script (#4347) * Add multiprocessing support to the google speech commands dataset processing script Signed-off-by: Shantanu Acharya <[email protected]> * fix number of args error with __extract_all_files function Signed-off-by: Shantanu Acharya <[email protected]> * fix styling issues Signed-off-by: Shantanu Acharya <[email protected]> * fix bugs with silence set construction and update librosa output write to use soundfile write Signed-off-by: Shantanu Acharya <[email protected]> * add docstrings and return values in __construct_filepaths as dictionary Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix wrong requirement (#4349) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Refactored path to manifest (#4251) Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * IPA support for TTS (#4310) * IPA tokenizer and G2P untested draft Signed-off-by: Jocelyn Huang <[email protected]> * Add IPA CMUdict and new heteronyms list Signed-off-by: Jocelyn Huang <[email protected]> * Add draft FastPitch IPA config Signed-off-by: Jocelyn Huang <[email protected]> * Minor bugfixes for IPA training Signed-off-by: Jocelyn Huang <[email protected]> * Add phoneme_probability to IPA G2P Signed-off-by: Jocelyn Huang <[email protected]> * Updates to IPA FastPitch training config Signed-off-by: Jocelyn Huang <[email protected]> * Update IPA dict and heteronyms file Signed-off-by: Jocelyn Huang <[email protected]> * Adjust default lr for IPA FastPitch to 1e-3 Signed-off-by: Jocelyn Huang <[email protected]> * Rename IPA CMUdict to reflect date Signed-off-by: Jocelyn Huang <[email protected]> * Add docstrings for IPA tokenizer and G2P, update CMUdict path for config Signed-off-by: Jocelyn Huang <[email protected]> * Fix IPA vocab ordering, add options to uppercase graphemes and remove stress symbols Signed-off-by: Jocelyn Huang <[email protected]> * Mark IPA classes as experimental Signed-off-by: Jocelyn Huang <[email protected]> * Update apostrophe-S cases Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Tn install (#4055) * remove conda pynini requirement Signed-off-by: Yang Zhang <[email protected]> * remove remnants Signed-off-by: Yang Zhang <[email protected]> * merge with main Signed-off-by: Yang Zhang <[email protected]> * removing nlp collection dependency from text processing and thus breaking cyclyc imports Signed-off-by: Yang Zhang <[email protected]> * fix Signed-off-by: Yang Zhang <[email protected]> * fix wrong requirement Signed-off-by: Yang Zhang <[email protected]> * fix bug in vi Signed-off-by: Yang Zhang <[email protected]> * update jenkins folders Signed-off-by: ekmb <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ekmb <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix tutorial (#4352) Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * fix the post ln (#4350) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [Fix] Hanging for Fully Randomized Bucketing (#4348) * Update container to 22.05 (#4329) * update container to 22.05 Signed-off-by: ericharper <[email protected]> * try adding safe directory Signed-off-by: ericharper <[email protected]> * try env var Signed-off-by: ericharper <[email protected]> * printenv Signed-off-by: ericharper <[email protected]> * try GIT_BRANCH Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * remove dbug statements Signed-off-by: ericharper <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * Merge r1.9.0 main (#4331) * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * small fix (#4180) Signed-off-by: fayejf <[email protected]> * fix the server key value problem (#4196) Signed-off-by: Yi Dong <[email protected]> * Fix/punctuation/trainer required for setting test data (#4199) * Draft of fix Signed-off-by: PeganovAnton <[email protected]> * Add warnings and replace globa_step with current_epoch Signed-off-by: PeganovAnton <[email protected]> * Small improvements to warnings Signed-off-by: PeganovAnton <[email protected]> * Error and warning messages improvements Signed-off-by: PeganovAnton <[email protected]> * Replace self.trainer with self._trainer Signed-off-by: PeganovAnton <[email protected]> * Update ContextNet version (#4207) Signed-off-by: smajumdar <[email protected]> * fix bugs for dialogue tutorial (#4211) Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4214) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * Add docs for Thutmose Tagger (#4173) * Add docs for Thutmose Tagger Signed-off-by: Alexandra Antonova <[email protected]> * add level in docs Signed-off-by: Alexandra Antonova <[email protected]> * delete folder to avoid error with running when folder exists from previous run Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> * Dialogue tutorial fix (#4218) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * Dialogue tutorial fix (#4221) * fix bugs for dialogue tutorial Signed-off-by: Zhilin Wang <[email protected]> * update path for convert_datasets.py due to conflict PR Signed-off-by: Zhilin Wang <[email protected]> * restore previously deleted files Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * update tutorial Signed-off-by: Zhilin Wang <[email protected]> * fix syntax error in ipynb-file (#4228) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * fix json serialize (#4235) Signed-off-by: Yi Dong <[email protected]> * Prompt Learning Typo Fixes (#4238) * Prompt tuning notebook typo fixes Signed-off-by: Virginia Adams <[email protected]> * Update tutorials.rst * Update prompt_learning.rst * Update prompt_learning.rst * fixing bug 3642622 (#4250) * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> * fixing bug 3642622 Signed-off-by: Ghasem Pasandi <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> * fix broken link in the tutorial (#4257) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * Typo fix, branch change, better download messagae (#4262) Signed-off-by: Virginia Adams <[email protected]> * Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264) * Raise error if bicleaner is not installed Signed-off-by: MaximumEntropy <[email protected]> * Clear cells Signed-off-by: MaximumEntropy <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets (#4269) * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Fix missing validation dataset, whitelist certain keywords for datasets Signed-off-by: smajumdar <[email protected]> * Update asr configs with num_workers and pin_memory (#4270) Signed-off-by: smajumdar <[email protected]> * Fix epoch end (#4265) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Set Save on train end to false (#4274) * Set Save on train end to false Signed-off-by: Virginia Adams <[email protected]> * Update prompt_learning.rst * Update prompt_learning.rst * Update YAML (#4261) Signed-off-by: MaximumEntropy <[email protected]> * Updated config to fix CI test OOM error (#4279) * Updated config to fix CI test issue Signed-off-by: Virginia Adams <[email protected]> * Increased num workers Signed-off-by: Virginia Adams <[email protected]> * verbose k2 install, skip if failed (#4289) Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> * Changed total virtual prompt tokens (#4295) * Changed total virtual prompt tokens Signed-off-by: Virginia Adams <[email protected]> * put number of workers back Signed-off-by: Virginia Adams <[email protected]> * upper bound lightning Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * remove duplicate test Signed-off-by: ericharper <[email protected]> * fix tn test cases Signed-off-by: ericharper <[email protected]> * add another safe.directory Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> * fix full_randn bucket hang Signed-off-by: stevehuang52 <[email protected]> * remove unused variables Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: treacker <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: ekmb <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Ghasem <[email protected]> Co-authored-by: Ghasem Pasandi <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Bits of RADTTS support (#4343) * Bits of RADTTS support Signed-off-by: Boris Fomitchev <[email protected]> * Fixed args mismatch Signed-off-by: Boris Fomitchev <[email protected]> * Style Signed-off-by: Boris Fomitchev <[email protected]> * Addressed review comments Signed-off-by: Boris Fomitchev <[email protected]> * More review comments Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Prompt Learning Pipeline Parallel (#4291) * Added get_forward_output_and_loss_func and updated train/val steps Signed-off-by: Virginia Adams <[email protected]> * Added preprocess flag before prompt table/encoder access Signed-off-by: Virginia Adams <[email protected]> * Made two optimizer groups, one for frozen, one for soft prompt Signed-off-by: Virginia Adams <[email protected]> * Pipeline parallel working Signed-off-by: Virginia Adams <[email protected]> * Still figuring out setting lr/sched for one param group Signed-off-by: Virginia Adams <[email protected]> * Set betas to zero Signed-off-by: Virginia Adams <[email protected]> * Only unfreeze one sublayer with lr 0.0 Signed-off-by: Virginia Adams <[email protected]> * Pipeline parallel working w/ one optimizer Signed-off-by: Virginia Adams <[email protected]> * Trying to fix Jenkins file Signed-off-by: Virginia Adams <[email protected]> * Trying to fix Jenkins file Signed-off-by: Virginia Adams <[email protected]> * Getting updated jenkins test to work Signed-off-by: Virginia Adams <[email protected]> * Getting updated jenkins test to work Signed-off-by: Virginia Adams <[email protected]> * added prompt learning tp and pp CI tests Signed-off-by: Virginia Adams <[email protected]> * Added amp_o2 model compatibility Signed-off-by: Virginia Adams <[email protected]> * Made CI test smaller Signed-off-by: Virginia Adams <[email protected]> * Still trying to get Jenkins to work Signed-off-by: Virginia Adams <[email protected]> * Still trying to get Jenkins to work Signed-off-by: Virginia Adams <[email protected]> * Temporarily moving prompt learning CI test to beginning Signed-off-by: Virginia Adams <[email protected]> * Changing the layer being unfrozen Signed-off-by: Virginia Adams <[email protected]> * debug jenkins Signed-off-by: Virginia Adams <[email protected]> * Move pp unfreeze to init Signed-off-by: Virginia Adams <[email protected]> * Try to make Jenkins test parallel Signed-off-by: Virginia Adams <[email protected]> * Fix python formatting Signed-off-by: Virginia Adams <[email protected]> * Moved prompt learning tests back to where they belong Signed-off-by: Virginia Adams <[email protected]> * add back checkpoint convertion CI test Signed-off-by: Virginia Adams <[email protected]> * Revert "add back checkpoint convertion CI test" This reverts commit 61e2ffcdefe964c8e74b74d8c10906ae29f32b6d. * Add back checkpoint conversion test Signed-off-by: Virginia Adams <[email protected]> * Setting requires grad to True everywhere Signed-off-by: Virginia Adams <[email protected]> * Updated config comments and simplified param group code Signed-off-by: Virginia Adams <[email protected]> * Added comment on frozen_model having lr=0.0 Signed-off-by: Virginia Adams <[email protected]> * Added configure optimizers methods Signed-off-by: Virginia Adams <[email protected]> * Set amp_o2 to false Signed-off-by: Virginia Adams <[email protected]> * removed o2 code Signed-off-by: Virginia Adams <[email protected]> * Python formatting fix Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * auto switch conformer encoder adapter in_features (#4354) Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Dataloader, collector, loss and metric for multiscale diarization decoder (#4187) * First commit Signed-off-by: Taejin Park <[email protected]> * Checked funtionality and imports Signed-off-by: Taejin Park <[email protected]> * fixed import issues Signed-off-by: Taejin Park <[email protected]> * Removed the changed made by mistake Signed-off-by: Taejin Park <[email protected]> * Style fix Signed-off-by: Taejin Park <[email protected]> * Fixed LGTM errors 001 Signed-off-by: Taejin Park <[email protected]> * Fixed LGTM and style fix Signed-off-by: Taejin Park <[email protected]> * Changed docstrings Signed-off-by: Taejin Park <[email protected]> * LGTM again Signed-off-by: Taejin Park <[email protected]> * Removed unnecessary torch setting lines Signed-off-by: Taejin Park <[email protected]> * Style fix and isort Signed-off-by: Taejin Park <[email protected]> * jbalam-nv comments reflected Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * Reflected comments and created _diar_label.py Signed-off-by: Taejin Park <[email protected]> * Typo fix and style fix Signed-off-by: Taejin Park <[email protected]> * Fixed target_spks[0] index error Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * LGTM unused import IterDataset Signed-off-by: Taejin Park <[email protected]> * revert collection doc year Signed-off-by: Taejin Park <[email protected]> * Code format error in collections.py Signed-off-by: Taejin Park <[email protected]> * fix collections space format error Signed-off-by: Taejin Park <[email protected]> * merged main correctly Signed-off-by: Taejin Park <[email protected]> * style fix Signed-off-by: Taejin Park <[email protected]> * Reflected all comments and tested Signed-off-by: Taejin Park <[email protected]> * style fix and LGTM Signed-off-by: Taejin Park <[email protected]> * rttm_filepath to rttm_file and removed self included funcs, tested Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Add ASR CTC Decoding module (#4342) * Initial commit Signed-off-by: smajumdar <[email protected]> * Full support for decoding strategy Signed-off-by: smajumdar <[email protected]> * Temp Signed-off-by: smajumdar <[email protected]> * Fix labels of y_sequence Signed-off-by: smajumdar <[email protected]> * Set support for sentencepiece subword merging Signed-off-by: smajumdar <[email protected]> * Fix char and word based token merge alignment Signed-off-by: smajumdar <[email protected]> * Revert incorrect change Signed-off-by: smajumdar <[email protected]> * Update docstring Signed-off-by: smajumdar <[email protected]> * Improve compatibility with greedy tokens and log probs Signed-off-by: smajumdar <[email protected]> * Update scripts to use decoding strategy Signed-off-by: smajumdar <[email protected]> * Add tests and docs Signed-off-by: smajumdar <[email protected]> * Add tests and docs Signed-off-by: smajumdar <[email protected]> * Fix speaker decoder timestamps Signed-off-by: smajumdar <[email protected]> * Fix speaker decoder timestamps Signed-off-by: smajumdar <[email protected]> * Fix decoding of ctc models Signed-off-by: smajumdar <[email protected]> * Address reviewer comments Signed-off-by: smajumdar <[email protected]> * Address reviewer comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Option to disable mp in VAD via num_workers=1 (#4317) * Option to disable mp in VAD via num_workers=1 In certain environments python multiprocessing can deadlock. This adds a convenient version to disable by setting num_workers to 1. Signed-off-by: Georg Kucsko <[email protected]> * add none handling Signed-off-by: Georg Kucsko <[email protected]> * additional none handling Signed-off-by: Georg Kucsko <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * remove redundant bias expand (#4382) * remove redundant bias expand Signed-off-by: Xiaowei Ren <[email protected]> * delete redundant code Signed-off-by: Xiaowei Ren <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Add option for specifying wandb save_dir from config (#4379) * give option to user to specify wandb save dir via config Signed-off-by: Shantanu Acharya <[email protected]> * create save_dir directory for wandb logger if not exists Signed-off-by: Shantanu Acharya <[email protected]> * update save_dir get method with a default value Signed-off-by: Shantanu Acharya <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Quick wav2vec fix. In-place operation adding convolutional positions to encoder was overwriting leaf history. Wasn't caught on previous torch versions. (#4383) Signed-off-by: tbartley94 <[email protected]> Co-authored-by: tbartley94 <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. (#4388) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Hainan Xu <[email protected]> * Merge r1.10.0 main (#4398) * update branch Signed-off-by: ericharper <[email protected]> * Set headscale false (#4364) Signed-off-by: MaximumEntropy <[email protected]> * Add wandb as dependency (#4365) Signed-off-by: smajumdar <[email protected]> * Raise trainer error (#4356) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Set headscale false (#4364) (#4366) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: smajumdar <[email protected]> * Finetuning changes for BART (#4003) * Temp Signed-off-by: MaximumEntropy <[email protected]> * Checkpoint converter to nemo for bart Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Make position embedding expansion specific to a batch to avoid checkpoint size mismatches (#4357) * Style Signed-off-by: MaximumEntropy <[email protected]> * Fix logging warning Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Micha Livne <[email protected]> * Fix electronic bug, new time ITN rule (#4355) * fix electronic bug Signed-off-by: ekmb <[email protected]> * add new itn time rule Signed-off-by: ekmb <[email protected]> * revert domain changes Signed-off-by: ekmb <[email protected]> * remove repetition Signed-off-by: ekmb <[email protected]> * Correct support for dataclasses in default module dim (#4372) * Correct support for dataclasses in default module dim Signed-off-by: smajumdar <[email protected]> * Fix path for save of results Signed-off-by: smajumdar <[email protected]> * fix pad id bug (#4377) Signed-off-by: Yi Dong <[email protected]> * Question answering bug fix (#4381) * refactor dialogue state tracking for modelling/dataset interoperability Signed-off-by: Zhilin Wang <[email protected]> * fix style changes Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * fix style raised by lgtm Signed-off-by: Zhilin Wang <[email protected]> * fix style formatting Signed-off-by: Zhilin Wang <[email protected]> * update template to include description of intent Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * changes based on requests in review Signed-off-by: Zhilin Wang <[email protected]> * add compatibility with assistant dataset Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins Signed-off-by: Zhilin Wang <[email protected]> * remove dialogue_state_tracking Signed-off-by: Zhilin Wang <[email protected]> * update huggingface utils for dialogue Signed-off-by: Zhilin Wang <[email protected]> * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang <[email protected]> * style fix Signed-off-by: Zhilin Wang <[email protected]> * fix style Signed-off-by: Zhilin Wang <[email protected]> * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang <[email protected]> * fix typo Signed-off-by: Zhilin Wang <[email protected]> * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang <[email protected]> * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang <[email protected]> * update style Signed-off-by: Zhilin Wang <[email protected]> * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang <[email protected]> * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang <[email protected]> * add slot filling prediction and metrics Signed-off-by: Zhilin Wang <[email protected]> * remove unused code Signed-off-by: Zhilin Wang <[email protected]> * style fix …

* Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount * new structure for tts datasets in script folder Signed-off-by: Oktai Tatanov <[email protected]> * remove cmudict downloading Signed-off-by: Oktai Tatanov <[email protected]> * rename mixertts dataset, add vocoder dataset Signed-off-by: Oktai Tatanov <[email protected]> * add libritts processing Signed-off-by: Oktai Tatanov <[email protected]> * update tts dataset and libritts get data Signed-off-by: Oktai Tatanov <[email protected]> * fix bugs in vocoder ds Signed-off-by: Oktai Tatanov <[email protected]> * add ds * changed vits yaml * rm yaml * fix yaml and model * Added scaler * refactored yaml * managed to run in fp16 * refactoring Signed-off-by: Oktai Tatanov <[email protected]> * fix small bugs and add new todos Signed-off-by: Oktai Tatanov <[email protected]> * fix optimizers Signed-off-by: Oktai Tatanov <[email protected]> * Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (#6) * Add vits files Add vits_losses.py, vits_modules.py and vits.py. * Move non-vits models to modules * Add vits.yaml * Add _loader to vits.py * Add basic template for vits * Update vits.yaml with vits parameters * Remove extra space * Add top level training script * Add some variables to vits yaml * Add forward and training methods * Fix imports * Added validation step * Log training losses * Update loss calls to use class attributes * Add VITS to models list * Fix all imports * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Fix imports for VITS * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Add parameters from original VITS config * Fix config file * Fix imports and generate spec from audio * Fix incorrect dimensions * Progress update * Fix loss * Fix cuda thing * Fix monotonic align import * Fix typos in vits.py * Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Jason <[email protected]> * make new commit Signed-off-by: Jason <[email protected]> * add copyright headers Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * rename README Signed-off-by: Oktai Tatanov <[email protected]> * fix style without vits_modules Signed-off-by: Oktai Tatanov <[email protected]> * add numba code, fix style and add todos Signed-off-by: Oktai Tatanov <[email protected]> * small fix * fix some todos * added numba mas * added DDP sampler * specified versions * fixed for new librosa version * added feature loss * added IPA phonemizer * refactored IPA g2p * added vits losses * some ref * fix * added checkpointing * cp * cfg * merged some 1.8.0 fixes * plt fix * fix logging * fix checkpoint loading * refactored inference * fp32 run * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * new exp * update branch Signed-off-by: ericharper <[email protected]> * Restored tests previously disabled for 22.03 base (#4109) Signed-off-by: Boris Fomitchev <[email protected]> * add augmentation to label models (#4113) * add augmentation to label models Signed-off-by: nithinraok <[email protected]> * duration fix Signed-off-by: nithinraok <[email protected]> * Call register_bert_model after assigning self.bert_model variable (#4116) Signed-off-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> * Tutorial on ITN with Thutmose tagger and small fixes (#4117) * 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output Signed-off-by: Alexandra Antonova <[email protected]> * fixes for code review Signed-off-by: Alexandra Antonova <[email protected]> * Add tutorial to tutorials.rst Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> * update the default (#4135) Signed-off-by: ekmb <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix typo (#4140) Signed-off-by: Yang Zhang <[email protected]> * Fix/punctuation avoid overwritting tmp files (#4144) * Add draft of fixing tmp files overwritting Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Use built-in tempfile library Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * bug_fix_diarization_manifest_creation (#4125) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * WaveGlow input type fixes (#4151) Signed-off-by: Jocelyn Huang <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * Thutmose tagger bug fixes (#4162) * add pretrained ngc model, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * 1. fix typos. 2. write magic functions without space Signed-off-by: Alexandra Antonova <[email protected]> * add example of inference with pretrained model Signed-off-by: Alexandra Antonova <[email protected]> * changed model location to nemo Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * fix space Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * changed to vits g2p * refactoring * added cosineLR * Updated whitelist path * added vanilla torch grad scaler * Fixed lightning version * added warmup and wd * switched to cosineLR * refactored data classes for vits * some fixes * fixed import * changeg train loop * fixed scheduler bug * refactoring for exps * Refactored loss logic * Ref for exps * added coqui stuff * exps * bugfix * added side file * bugfix * reverted * fixed sampler behaviour * updated for ptl 1.7.2 * refactored dataloader func * some cleaning * reverted to vanilla loss * modified for pickling * added dataset class * fixed torch version * added autocast for fp training * removed coqui files * 'Fixed tokenizer' * Fix tokenizer * update branch Signed-off-by: ericharper <[email protected]> * Fix link to inference notebook (#5247) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Update ASR scores table (#5254) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Fix links to speaker identification notebook (#5260) Signed-off-by: SeanNaren <[email protected]> Signed-off-by: SeanNaren <[email protected]> * Minor typo fixes in TTS tutorial (#5266) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Pcla tutorial fixes (#5271) * Fixed typos Signed-off-by: Matvei Novikov <[email protected]> * Fixed cell type and tatoeba reference Signed-off-by: Matvei Novikov <[email protected]> * Fixed typo Signed-off-by: Matvei Novikov <[email protected]> * Fixed branch variable Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix bug into Dialogue tutorial (#5277) * Typo fix (#5288) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix dialogue tutorial bug (#5297) * set add_pooling_layer=False for huggingface bert model * remove add_pooling_layer=False and set find_unused_parameters=True * set num_prompt_tokens to 0 for huggingface * small bugfix for r1.13.0 (#5310) * typo fix Signed-off-by: fayejf <[email protected]> * udpate transcribe Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> * Add italian model checkpoints (#5316) Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Igor Gitman <[email protected]> * [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (#5340) * [STT] Add stt_ru_conformer_ctc_large Signed-off-by: Sasha Meister <[email protected]> * [STT] Add stt_ru_conformer_transducer_large Add stt_ru_conformer_transducer_large Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pcla tutorial fixes (#5313) * fixes Signed-off-by: Matvei Novikov <[email protected]> * fixes Signed-off-by: Matvei Novikov <[email protected]> * moved `create_text_and_labels` to token_classification_utils.py Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * a lot of refactoring * strict ptl version * strict ptl version * reverted plt version * Added base text2audio class * Fix issue with HF Model upload tutorial (#5359) * Add Gradio App to ASR Docs (#5270) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> (cherry picked from commit e4b6a38) * Fix issue with normalized config for dataset name Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * tutorial fixes (#5354) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Add SDP documentation (#5274) * Add details to SDP README.md Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to WriteManifest processor Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to CreateInitialManifestMLS Signed-off-by: Elena Rastorgueva <[email protected]> * Add ModifyManifestTextProcessor docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add ASRInference docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add base_processor docstrings Signed-off-by: Elena Rastorgueva <[email protected]> * Add minimal SDP docs page Signed-off-by: Elena Rastorgueva <[email protected]> * Update tools/speech_dataset_processor/README.md Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Write simple README for SDP and move complex explanations to docs Signed-off-by: Elena Rastorgueva <[email protected]> * Remove incorrect type hints Signed-off-by: Elena Rastorgueva <[email protected]> * Make config example less confusing Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typo Signed-off-by: Elena Rastorgueva <[email protected]> * Clarify that YAML file is config file in README Signed-off-by: Elena Rastorgueva <[email protected]> * Remove unused imports Signed-off-by: Elena Rastorgueva <[email protected]> * Remove SDP docs for now Signed-off-by: Elena Rastorgueva <[email protected]> * Remove links to docs in SDP README Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Igor Gitman <[email protected]> * [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (#5375) * Fix minor error in notebook Signed-off-by: Taejin Park <[email protected]> * changed branch name in tutorial notebook Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Rename Speech Dataset Processor to Speech Data Processor (#5378) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix for num worker 0 causing issues in losses after 1 epoch (#5379) * Fixed bug in notebook (#5382) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Virginia Adams <[email protected]> * Force MHA QKV onto fp32 (#5391) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Added scheduling variety * ref * Fix for prompt table restore error (#5393) * Fix for prompt table restore error Signed-off-by: Virginia Adams <[email protected]> * Added more saftey checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added more condition checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix args (#5410) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * bugfix * import tests * Add temporary fix for CUDA issue in Dockerfile (#5421) Signed-off-by: Yu Yao <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Megatron Export Update (#5343) * export update for Megatron + change ORT optimization Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated export_utils to use autocast instead of manually casting >:/ Signed-off-by: David Mosallanezhad <[email protected]> * removed dtype from LayerNorm Signed-off-by: David Mosallanezhad <[email protected]> * added comment Signed-off-by: David Mosallanezhad <[email protected]> * reverting changes on FloatCast Signed-off-by: David Mosallanezhad <[email protected]> * Cherry-picked changes from megatron-norm Signed-off-by: Boris Fomitchev <[email protected]> * updated asr_model import to cast_utils Signed-off-by: David Mosallanezhad <[email protected]> * updated del onnx_model place Signed-off-by: David Mosallanezhad <[email protected]> * changed ort optimization to basic -> temp fix Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev <[email protected]> * disable pc test (#5426) Signed-off-by: ekmb <[email protected]> Signed-off-by: ekmb <[email protected]> * Fix GPT generation when using sentencepiece tokenizer (#5413) * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Disable sync_batch_comm in validation_step for GPT (#5397) * disable sync_batch_comm in validation_step Signed-off-by: ericharper <[email protected]> * Read sync_batch_comm from config or default to False Signed-off-by: Markel Sanz Ausin <[email protected]> * Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error Signed-off-by: Markel Sanz Ausin <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Comment out test Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Revert "Add temporary fix for CUDA issue in Dockerfile (#5421)" (#5431) This reverts commit 0718b17. * Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (#5420) * Revert workers workaround Signed-off-by: MaximumEntropy <[email protected]> * Fix in config Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Fixed discrepancies * updated Jenkisfile * updated Jenkisfile * Cleaning * fixed the onnx bug in conformer for non-streaming models. (#5242) (#5446) Signed-off-by: Vahid <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> * Set sync_batch_comm in other places (#5448) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Radtts 1.13 (#5451) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358) * [TTS] add CI test for RADTTS training recipe. Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Radtts 1.13 plus (#5457) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358) * Fixing RADTTS training - removing view buffer and fixing accuracy issue * Fixes for Torchscript/Triton * Added autocast to radtts UT * using cuda() for training example Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Add num layers check (#5470) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Change to kwargs (#5475) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478) * Initial refactor Signed-off-by: MaximumEntropy <[email protected]> * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy <[email protected]> * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy <[email protected]> * Fixes for eval Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy <[email protected]> * Remove comments Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Remove old comment Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * export_utils bugfix (#5480) * updated export_utils Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Export fixes for Riva (#5496) * Export fixes for Riva Signed-off-by: Boris Fomitchev <[email protected]> * Cleaning up training_utils Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> * minor bug fix (#5521) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * added set_start_method + function param bugfix (#5539) * added set_start_method + function param bugfix Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upper bound torchmetrics Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> * remove notebook (#5548) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * Remove broadcast (#5558) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * cleaning * Fix all gather while writing to a file during T5 finetuning (#5561) * Gather from data parallel only instead of all ranks Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added copyright * fixed imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed filesize check * last cleaning Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated cmudict path * fixed merge bug Signed-off-by: Evgeniy Shabalin <[email protected]> * warnings fix * fix warnings Signed-off-by: Evgeniy Shabalin <[email protected]> * storing * updated version Signed-off-by: Evgeniy Shabalin <[email protected]> * update Jenkinsfile versions Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed issues Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed more issues * more fixes Signed-off-by: Evgeniy Shabalin <[email protected]> * added experimental tag * Clarification updates Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * imports fix Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Evgeniy Shabalin <[email protected]> * excessive comtutations fix Signed-off-by: Evgeniy Shabalin <[email protected]> * typecheck fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Small refactoring * Small refactoring Signed-off-by: Evgeniy Shabalin <[email protected]> * reversed exp_manager params Signed-off-by: Evgeniy Shabalin <[email protected]> * Fixed call for new function signature Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: PeganovAnton <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Yu Yao <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount * new structure for tts datasets in script folder Signed-off-by: Oktai Tatanov <[email protected]> * remove cmudict downloading Signed-off-by: Oktai Tatanov <[email protected]> * rename mixertts dataset, add vocoder dataset Signed-off-by: Oktai Tatanov <[email protected]> * add libritts processing Signed-off-by: Oktai Tatanov <[email protected]> * update tts dataset and libritts get data Signed-off-by: Oktai Tatanov <[email protected]> * fix bugs in vocoder ds Signed-off-by: Oktai Tatanov <[email protected]> * add ds * changed vits yaml * rm yaml * fix yaml and model * Added scaler * refactored yaml * managed to run in fp16 * refactoring Signed-off-by: Oktai Tatanov <[email protected]> * fix small bugs and add new todos Signed-off-by: Oktai Tatanov <[email protected]> * fix optimizers Signed-off-by: Oktai Tatanov <[email protected]> * Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (NVIDIA#6) * Add vits files Add vits_losses.py, vits_modules.py and vits.py. * Move non-vits models to modules * Add vits.yaml * Add _loader to vits.py * Add basic template for vits * Update vits.yaml with vits parameters * Remove extra space * Add top level training script * Add some variables to vits yaml * Add forward and training methods * Fix imports * Added validation step * Log training losses * Update loss calls to use class attributes * Add VITS to models list * Fix all imports * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Fix imports for VITS * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Add parameters from original VITS config * Fix config file * Fix imports and generate spec from audio * Fix incorrect dimensions * Progress update * Fix loss * Fix cuda thing * Fix monotonic align import * Fix typos in vits.py * Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Jason <[email protected]> * make new commit Signed-off-by: Jason <[email protected]> * add copyright headers Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * rename README Signed-off-by: Oktai Tatanov <[email protected]> * fix style without vits_modules Signed-off-by: Oktai Tatanov <[email protected]> * add numba code, fix style and add todos Signed-off-by: Oktai Tatanov <[email protected]> * small fix * fix some todos * added numba mas * added DDP sampler * specified versions * fixed for new librosa version * added feature loss * added IPA phonemizer * refactored IPA g2p * added vits losses * some ref * fix * added checkpointing * cp * cfg * merged some 1.8.0 fixes * plt fix * fix logging * fix checkpoint loading * refactored inference * fp32 run * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * new exp * update branch Signed-off-by: ericharper <[email protected]> * Restored tests previously disabled for 22.03 base (NVIDIA#4109) Signed-off-by: Boris Fomitchev <[email protected]> * add augmentation to label models (NVIDIA#4113) * add augmentation to label models Signed-off-by: nithinraok <[email protected]> * duration fix Signed-off-by: nithinraok <[email protected]> * Call register_bert_model after assigning self.bert_model variable (NVIDIA#4116) Signed-off-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> * Tutorial on ITN with Thutmose tagger and small fixes (NVIDIA#4117) * 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output Signed-off-by: Alexandra Antonova <[email protected]> * fixes for code review Signed-off-by: Alexandra Antonova <[email protected]> * Add tutorial to tutorials.rst Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> * update the default (NVIDIA#4135) Signed-off-by: ekmb <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix typo (NVIDIA#4140) Signed-off-by: Yang Zhang <[email protected]> * Fix/punctuation avoid overwritting tmp files (NVIDIA#4144) * Add draft of fixing tmp files overwritting Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Use built-in tempfile library Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * bug_fix_diarization_manifest_creation (NVIDIA#4125) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * WaveGlow input type fixes (NVIDIA#4151) Signed-off-by: Jocelyn Huang <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * Thutmose tagger bug fixes (NVIDIA#4162) * add pretrained ngc model, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * 1. fix typos. 2. write magic functions without space Signed-off-by: Alexandra Antonova <[email protected]> * add example of inference with pretrained model Signed-off-by: Alexandra Antonova <[email protected]> * changed model location to nemo Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * fix space Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * changed to vits g2p * refactoring * added cosineLR * Updated whitelist path * added vanilla torch grad scaler * Fixed lightning version * added warmup and wd * switched to cosineLR * refactored data classes for vits * some fixes * fixed import * changeg train loop * fixed scheduler bug * refactoring for exps * Refactored loss logic * Ref for exps * added coqui stuff * exps * bugfix * added side file * bugfix * reverted * fixed sampler behaviour * updated for ptl 1.7.2 * refactored dataloader func * some cleaning * reverted to vanilla loss * modified for pickling * added dataset class * fixed torch version * added autocast for fp training * removed coqui files * 'Fixed tokenizer' * Fix tokenizer * update branch Signed-off-by: ericharper <[email protected]> * Fix link to inference notebook (NVIDIA#5247) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Update ASR scores table (NVIDIA#5254) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Fix links to speaker identification notebook (NVIDIA#5260) Signed-off-by: SeanNaren <[email protected]> Signed-off-by: SeanNaren <[email protected]> * Minor typo fixes in TTS tutorial (NVIDIA#5266) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Pcla tutorial fixes (NVIDIA#5271) * Fixed typos Signed-off-by: Matvei Novikov <[email protected]> * Fixed cell type and tatoeba reference Signed-off-by: Matvei Novikov <[email protected]> * Fixed typo Signed-off-by: Matvei Novikov <[email protected]> * Fixed branch variable Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix bug into Dialogue tutorial (NVIDIA#5277) * Typo fix (NVIDIA#5288) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix dialogue tutorial bug (NVIDIA#5297) * set add_pooling_layer=False for huggingface bert model * remove add_pooling_layer=False and set find_unused_parameters=True * set num_prompt_tokens to 0 for huggingface * small bugfix for r1.13.0 (NVIDIA#5310) * typo fix Signed-off-by: fayejf <[email protected]> * udpate transcribe Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> * Add italian model checkpoints (NVIDIA#5316) Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Igor Gitman <[email protected]> * [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (NVIDIA#5340) * [STT] Add stt_ru_conformer_ctc_large Signed-off-by: Sasha Meister <[email protected]> * [STT] Add stt_ru_conformer_transducer_large Add stt_ru_conformer_transducer_large Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pcla tutorial fixes (NVIDIA#5313) * fixes Signed-off-by: Matvei Novikov <[email protected]> * fixes Signed-off-by: Matvei Novikov <[email protected]> * moved `create_text_and_labels` to token_classification_utils.py Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * a lot of refactoring * strict ptl version * strict ptl version * reverted plt version * Added base text2audio class * Fix issue with HF Model upload tutorial (NVIDIA#5359) * Add Gradio App to ASR Docs (NVIDIA#5270) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> (cherry picked from commit e4b6a38) * Fix issue with normalized config for dataset name Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * tutorial fixes (NVIDIA#5354) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Add SDP documentation (NVIDIA#5274) * Add details to SDP README.md Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to WriteManifest processor Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to CreateInitialManifestMLS Signed-off-by: Elena Rastorgueva <[email protected]> * Add ModifyManifestTextProcessor docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add ASRInference docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add base_processor docstrings Signed-off-by: Elena Rastorgueva <[email protected]> * Add minimal SDP docs page Signed-off-by: Elena Rastorgueva <[email protected]> * Update tools/speech_dataset_processor/README.md Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Write simple README for SDP and move complex explanations to docs Signed-off-by: Elena Rastorgueva <[email protected]> * Remove incorrect type hints Signed-off-by: Elena Rastorgueva <[email protected]> * Make config example less confusing Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typo Signed-off-by: Elena Rastorgueva <[email protected]> * Clarify that YAML file is config file in README Signed-off-by: Elena Rastorgueva <[email protected]> * Remove unused imports Signed-off-by: Elena Rastorgueva <[email protected]> * Remove SDP docs for now Signed-off-by: Elena Rastorgueva <[email protected]> * Remove links to docs in SDP README Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Igor Gitman <[email protected]> * [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (NVIDIA#5375) * Fix minor error in notebook Signed-off-by: Taejin Park <[email protected]> * changed branch name in tutorial notebook Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Rename Speech Dataset Processor to Speech Data Processor (NVIDIA#5378) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix for num worker 0 causing issues in losses after 1 epoch (NVIDIA#5379) * Fixed bug in notebook (NVIDIA#5382) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Virginia Adams <[email protected]> * Force MHA QKV onto fp32 (NVIDIA#5391) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Added scheduling variety * ref * Fix for prompt table restore error (NVIDIA#5393) * Fix for prompt table restore error Signed-off-by: Virginia Adams <[email protected]> * Added more saftey checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added more condition checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix args (NVIDIA#5410) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * bugfix * import tests * Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) Signed-off-by: Yu Yao <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Megatron Export Update (NVIDIA#5343) * export update for Megatron + change ORT optimization Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated export_utils to use autocast instead of manually casting >:/ Signed-off-by: David Mosallanezhad <[email protected]> * removed dtype from LayerNorm Signed-off-by: David Mosallanezhad <[email protected]> * added comment Signed-off-by: David Mosallanezhad <[email protected]> * reverting changes on FloatCast Signed-off-by: David Mosallanezhad <[email protected]> * Cherry-picked changes from megatron-norm Signed-off-by: Boris Fomitchev <[email protected]> * updated asr_model import to cast_utils Signed-off-by: David Mosallanezhad <[email protected]> * updated del onnx_model place Signed-off-by: David Mosallanezhad <[email protected]> * changed ort optimization to basic -> temp fix Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev <[email protected]> * disable pc test (NVIDIA#5426) Signed-off-by: ekmb <[email protected]> Signed-off-by: ekmb <[email protected]> * Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Disable sync_batch_comm in validation_step for GPT (NVIDIA#5397) * disable sync_batch_comm in validation_step Signed-off-by: ericharper <[email protected]> * Read sync_batch_comm from config or default to False Signed-off-by: Markel Sanz Ausin <[email protected]> * Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error Signed-off-by: Markel Sanz Ausin <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Comment out test Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) This reverts commit 0718b17. * Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) * Revert workers workaround Signed-off-by: MaximumEntropy <[email protected]> * Fix in config Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Fixed discrepancies * updated Jenkisfile * updated Jenkisfile * Cleaning * fixed the onnx bug in conformer for non-streaming models. (NVIDIA#5242) (NVIDIA#5446) Signed-off-by: Vahid <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> * Set sync_batch_comm in other places (NVIDIA#5448) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Radtts 1.13 (NVIDIA#5451) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358) * [TTS] add CI test for RADTTS training recipe. Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Radtts 1.13 plus (NVIDIA#5457) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358) * Fixing RADTTS training - removing view buffer and fixing accuracy issue * Fixes for Torchscript/Triton * Added autocast to radtts UT * using cuda() for training example Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Add num layers check (NVIDIA#5470) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Change to kwargs (NVIDIA#5475) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339) (NVIDIA#5478) * Initial refactor Signed-off-by: MaximumEntropy <[email protected]> * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy <[email protected]> * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy <[email protected]> * Fixes for eval Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy <[email protected]> * Remove comments Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Remove old comment Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * export_utils bugfix (NVIDIA#5480) * updated export_utils Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Export fixes for Riva (NVIDIA#5496) * Export fixes for Riva Signed-off-by: Boris Fomitchev <[email protected]> * Cleaning up training_utils Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> * minor bug fix (NVIDIA#5521) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * added set_start_method + function param bugfix (NVIDIA#5539) * added set_start_method + function param bugfix Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upper bound torchmetrics Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> * remove notebook (NVIDIA#5548) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * Remove broadcast (NVIDIA#5558) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * cleaning * Fix all gather while writing to a file during T5 finetuning (NVIDIA#5561) * Gather from data parallel only instead of all ranks Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added copyright * fixed imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed filesize check * last cleaning Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated cmudict path * fixed merge bug Signed-off-by: Evgeniy Shabalin <[email protected]> * warnings fix * fix warnings Signed-off-by: Evgeniy Shabalin <[email protected]> * storing * updated version Signed-off-by: Evgeniy Shabalin <[email protected]> * update Jenkinsfile versions Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed issues Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed more issues * more fixes Signed-off-by: Evgeniy Shabalin <[email protected]> * added experimental tag * Clarification updates Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * imports fix Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Evgeniy Shabalin <[email protected]> * excessive comtutations fix Signed-off-by: Evgeniy Shabalin <[email protected]> * typecheck fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Small refactoring * Small refactoring Signed-off-by: Evgeniy Shabalin <[email protected]> * reversed exp_manager params Signed-off-by: Evgeniy Shabalin <[email protected]> * Fixed call for new function signature Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: PeganovAnton <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Yu Yao <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount * new structure for tts datasets in script folder Signed-off-by: Oktai Tatanov <[email protected]> * remove cmudict downloading Signed-off-by: Oktai Tatanov <[email protected]> * rename mixertts dataset, add vocoder dataset Signed-off-by: Oktai Tatanov <[email protected]> * add libritts processing Signed-off-by: Oktai Tatanov <[email protected]> * update tts dataset and libritts get data Signed-off-by: Oktai Tatanov <[email protected]> * fix bugs in vocoder ds Signed-off-by: Oktai Tatanov <[email protected]> * add ds * changed vits yaml * rm yaml * fix yaml and model * Added scaler * refactored yaml * managed to run in fp16 * refactoring Signed-off-by: Oktai Tatanov <[email protected]> * fix small bugs and add new todos Signed-off-by: Oktai Tatanov <[email protected]> * fix optimizers Signed-off-by: Oktai Tatanov <[email protected]> * Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (#6) * Add vits files Add vits_losses.py, vits_modules.py and vits.py. * Move non-vits models to modules * Add vits.yaml * Add _loader to vits.py * Add basic template for vits * Update vits.yaml with vits parameters * Remove extra space * Add top level training script * Add some variables to vits yaml * Add forward and training methods * Fix imports * Added validation step * Log training losses * Update loss calls to use class attributes * Add VITS to models list * Fix all imports * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Fix imports for VITS * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Add parameters from original VITS config * Fix config file * Fix imports and generate spec from audio * Fix incorrect dimensions * Progress update * Fix loss * Fix cuda thing * Fix monotonic align import * Fix typos in vits.py * Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Jason <[email protected]> * make new commit Signed-off-by: Jason <[email protected]> * add copyright headers Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * rename README Signed-off-by: Oktai Tatanov <[email protected]> * fix style without vits_modules Signed-off-by: Oktai Tatanov <[email protected]> * add numba code, fix style and add todos Signed-off-by: Oktai Tatanov <[email protected]> * small fix * fix some todos * added numba mas * added DDP sampler * specified versions * fixed for new librosa version * added feature loss * added IPA phonemizer * refactored IPA g2p * added vits losses * some ref * fix * added checkpointing * cp * cfg * merged some 1.8.0 fixes * plt fix * fix logging * fix checkpoint loading * refactored inference * fp32 run * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * new exp * update branch Signed-off-by: ericharper <[email protected]> * Restored tests previously disabled for 22.03 base (#4109) Signed-off-by: Boris Fomitchev <[email protected]> * add augmentation to label models (#4113) * add augmentation to label models Signed-off-by: nithinraok <[email protected]> * duration fix Signed-off-by: nithinraok <[email protected]> * Call register_bert_model after assigning self.bert_model variable (#4116) Signed-off-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> * Tutorial on ITN with Thutmose tagger and small fixes (#4117) * 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output Signed-off-by: Alexandra Antonova <[email protected]> * fixes for code review Signed-off-by: Alexandra Antonova <[email protected]> * Add tutorial to tutorials.rst Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> * update the default (#4135) Signed-off-by: ekmb <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix typo (#4140) Signed-off-by: Yang Zhang <[email protected]> * Fix/punctuation avoid overwritting tmp files (#4144) * Add draft of fixing tmp files overwritting Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Use built-in tempfile library Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * bug_fix_diarization_manifest_creation (#4125) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * fix doc (#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * WaveGlow input type fixes (#4151) Signed-off-by: Jocelyn Huang <[email protected]> * notebooks' link, typo and import fix (#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * Thutmose tagger bug fixes (#4162) * add pretrained ngc model, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * 1. fix typos. 2. write magic functions without space Signed-off-by: Alexandra Antonova <[email protected]> * add example of inference with pretrained model Signed-off-by: Alexandra Antonova <[email protected]> * changed model location to nemo Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * fix space Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * update speaker docs (#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * changed to vits g2p * refactoring * added cosineLR * Updated whitelist path * added vanilla torch grad scaler * Fixed lightning version * added warmup and wd * switched to cosineLR * refactored data classes for vits * some fixes * fixed import * changeg train loop * fixed scheduler bug * refactoring for exps * Refactored loss logic * Ref for exps * added coqui stuff * exps * bugfix * added side file * bugfix * reverted * fixed sampler behaviour * updated for ptl 1.7.2 * refactored dataloader func * some cleaning * reverted to vanilla loss * modified for pickling * added dataset class * fixed torch version * added autocast for fp training * removed coqui files * 'Fixed tokenizer' * Fix tokenizer * update branch Signed-off-by: ericharper <[email protected]> * Fix link to inference notebook (#5247) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Update ASR scores table (#5254) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Fix links to speaker identification notebook (#5260) Signed-off-by: SeanNaren <[email protected]> Signed-off-by: SeanNaren <[email protected]> * Minor typo fixes in TTS tutorial (#5266) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Pcla tutorial fixes (#5271) * Fixed typos Signed-off-by: Matvei Novikov <[email protected]> * Fixed cell type and tatoeba reference Signed-off-by: Matvei Novikov <[email protected]> * Fixed typo Signed-off-by: Matvei Novikov <[email protected]> * Fixed branch variable Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix bug into Dialogue tutorial (#5277) * Typo fix (#5288) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix dialogue tutorial bug (#5297) * set add_pooling_layer=False for huggingface bert model * remove add_pooling_layer=False and set find_unused_parameters=True * set num_prompt_tokens to 0 for huggingface * small bugfix for r1.13.0 (#5310) * typo fix Signed-off-by: fayejf <[email protected]> * udpate transcribe Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> * Add italian model checkpoints (#5316) Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Igor Gitman <[email protected]> * [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (#5340) * [STT] Add stt_ru_conformer_ctc_large Signed-off-by: Sasha Meister <[email protected]> * [STT] Add stt_ru_conformer_transducer_large Add stt_ru_conformer_transducer_large Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pcla tutorial fixes (#5313) * fixes Signed-off-by: Matvei Novikov <[email protected]> * fixes Signed-off-by: Matvei Novikov <[email protected]> * moved `create_text_and_labels` to token_classification_utils.py Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * a lot of refactoring * strict ptl version * strict ptl version * reverted plt version * Added base text2audio class * Fix issue with HF Model upload tutorial (#5359) * Add Gradio App to ASR Docs (#5270) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> (cherry picked from commit e4b6a38) * Fix issue with normalized config for dataset name Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * tutorial fixes (#5354) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Add SDP documentation (#5274) * Add details to SDP README.md Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to WriteManifest processor Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to CreateInitialManifestMLS Signed-off-by: Elena Rastorgueva <[email protected]> * Add ModifyManifestTextProcessor docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add ASRInference docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add base_processor docstrings Signed-off-by: Elena Rastorgueva <[email protected]> * Add minimal SDP docs page Signed-off-by: Elena Rastorgueva <[email protected]> * Update tools/speech_dataset_processor/README.md Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Write simple README for SDP and move complex explanations to docs Signed-off-by: Elena Rastorgueva <[email protected]> * Remove incorrect type hints Signed-off-by: Elena Rastorgueva <[email protected]> * Make config example less confusing Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typo Signed-off-by: Elena Rastorgueva <[email protected]> * Clarify that YAML file is config file in README Signed-off-by: Elena Rastorgueva <[email protected]> * Remove unused imports Signed-off-by: Elena Rastorgueva <[email protected]> * Remove SDP docs for now Signed-off-by: Elena Rastorgueva <[email protected]> * Remove links to docs in SDP README Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Igor Gitman <[email protected]> * [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (#5375) * Fix minor error in notebook Signed-off-by: Taejin Park <[email protected]> * changed branch name in tutorial notebook Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Rename Speech Dataset Processor to Speech Data Processor (#5378) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix for num worker 0 causing issues in losses after 1 epoch (#5379) * Fixed bug in notebook (#5382) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Virginia Adams <[email protected]> * Force MHA QKV onto fp32 (#5391) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Added scheduling variety * ref * Fix for prompt table restore error (#5393) * Fix for prompt table restore error Signed-off-by: Virginia Adams <[email protected]> * Added more saftey checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added more condition checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix args (#5410) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * bugfix * import tests * Add temporary fix for CUDA issue in Dockerfile (#5421) Signed-off-by: Yu Yao <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Megatron Export Update (#5343) * export update for Megatron + change ORT optimization Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated export_utils to use autocast instead of manually casting >:/ Signed-off-by: David Mosallanezhad <[email protected]> * removed dtype from LayerNorm Signed-off-by: David Mosallanezhad <[email protected]> * added comment Signed-off-by: David Mosallanezhad <[email protected]> * reverting changes on FloatCast Signed-off-by: David Mosallanezhad <[email protected]> * Cherry-picked changes from megatron-norm Signed-off-by: Boris Fomitchev <[email protected]> * updated asr_model import to cast_utils Signed-off-by: David Mosallanezhad <[email protected]> * updated del onnx_model place Signed-off-by: David Mosallanezhad <[email protected]> * changed ort optimization to basic -> temp fix Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev <[email protected]> * disable pc test (#5426) Signed-off-by: ekmb <[email protected]> Signed-off-by: ekmb <[email protected]> * Fix GPT generation when using sentencepiece tokenizer (#5413) * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Disable sync_batch_comm in validation_step for GPT (#5397) * disable sync_batch_comm in validation_step Signed-off-by: ericharper <[email protected]> * Read sync_batch_comm from config or default to False Signed-off-by: Markel Sanz Ausin <[email protected]> * Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error Signed-off-by: Markel Sanz Ausin <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Comment out test Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Revert "Add temporary fix for CUDA issue in Dockerfile (#5421)" (#5431) This reverts commit 0718b17. * Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (#5420) * Revert workers workaround Signed-off-by: MaximumEntropy <[email protected]> * Fix in config Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Fixed discrepancies * updated Jenkisfile * updated Jenkisfile * Cleaning * fixed the onnx bug in conformer for non-streaming models. (#5242) (#5446) Signed-off-by: Vahid <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> * Set sync_batch_comm in other places (#5448) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Radtts 1.13 (#5451) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358) * [TTS] add CI test for RADTTS training recipe. Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Radtts 1.13 plus (#5457) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (#5358) * Fixing RADTTS training - removing view buffer and fixing accuracy issue * Fixes for Torchscript/Triton * Added autocast to radtts UT * using cuda() for training example Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Add num layers check (#5470) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Change to kwargs (#5475) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) (#5478) * Initial refactor Signed-off-by: MaximumEntropy <[email protected]> * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy <[email protected]> * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy <[email protected]> * Fixes for eval Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy <[email protected]> * Remove comments Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Remove old comment Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * export_utils bugfix (#5480) * updated export_utils Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Export fixes for Riva (#5496) * Export fixes for Riva Signed-off-by: Boris Fomitchev <[email protected]> * Cleaning up training_utils Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> * minor bug fix (#5521) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * added set_start_method + function param bugfix (#5539) * added set_start_method + function param bugfix Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upper bound torchmetrics Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> * remove notebook (#5548) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * Remove broadcast (#5558) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * cleaning * Fix all gather while writing to a file during T5 finetuning (#5561) * Gather from data parallel only instead of all ranks Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added copyright * fixed imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed filesize check * last cleaning Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated cmudict path * fixed merge bug Signed-off-by: Evgeniy Shabalin <[email protected]> * warnings fix * fix warnings Signed-off-by: Evgeniy Shabalin <[email protected]> * storing * updated version Signed-off-by: Evgeniy Shabalin <[email protected]> * update Jenkinsfile versions Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed issues Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed more issues * more fixes Signed-off-by: Evgeniy Shabalin <[email protected]> * added experimental tag * Clarification updates Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * imports fix Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Evgeniy Shabalin <[email protected]> * excessive comtutations fix Signed-off-by: Evgeniy Shabalin <[email protected]> * typecheck fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Small refactoring * Small refactoring Signed-off-by: Evgeniy Shabalin <[email protected]> * reversed exp_manager params Signed-off-by: Evgeniy Shabalin <[email protected]> * Fixed call for new function signature Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: PeganovAnton <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Yu Yao <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

* Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount * new structure for tts datasets in script folder Signed-off-by: Oktai Tatanov <[email protected]> * remove cmudict downloading Signed-off-by: Oktai Tatanov <[email protected]> * rename mixertts dataset, add vocoder dataset Signed-off-by: Oktai Tatanov <[email protected]> * add libritts processing Signed-off-by: Oktai Tatanov <[email protected]> * update tts dataset and libritts get data Signed-off-by: Oktai Tatanov <[email protected]> * fix bugs in vocoder ds Signed-off-by: Oktai Tatanov <[email protected]> * add ds * changed vits yaml * rm yaml * fix yaml and model * Added scaler * refactored yaml * managed to run in fp16 * refactoring Signed-off-by: Oktai Tatanov <[email protected]> * fix small bugs and add new todos Signed-off-by: Oktai Tatanov <[email protected]> * fix optimizers Signed-off-by: Oktai Tatanov <[email protected]> * Port Variational Inference with Adversarial Learning (VITS) to NeMo TTS (NVIDIA#6) * Add vits files Add vits_losses.py, vits_modules.py and vits.py. * Move non-vits models to modules * Add vits.yaml * Add _loader to vits.py * Add basic template for vits * Update vits.yaml with vits parameters * Remove extra space * Add top level training script * Add some variables to vits yaml * Add forward and training methods * Fix imports * Added validation step * Log training losses * Update loss calls to use class attributes * Add VITS to models list * Fix all imports * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Fix imports for VITS * Remove old module calls * Fix typo in monotonic align import * Modified validation step 1. reverted to tensorboard 2. validation_step logs audio, mel-spec for batch 0 3. validation_step_alt logs audio, mel-spec for batch 0 and loss_mel * Add parameters from original VITS config * Fix config file * Fix imports and generate spec from audio * Fix incorrect dimensions * Progress update * Fix loss * Fix cuda thing * Fix monotonic align import * Fix typos in vits.py * Disable loss typecheck * Fix spectrogram lengths * Remove Precision 16 requirement * Address lgtm alerts * clean up unused code * Address lgtm alerts * Refactor audio_to_mel_torch method * Use NeMo FilterBank to get melspec Todo: set self.fb * Fix filterbank max frequency to match with original VITS * Fix filterbank features correct length * Address lgtm issues * Remove print statements * Remove stft_pad_amount Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Jason <[email protected]> * make new commit Signed-off-by: Jason <[email protected]> * add copyright headers Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * rename README Signed-off-by: Oktai Tatanov <[email protected]> * fix style without vits_modules Signed-off-by: Oktai Tatanov <[email protected]> * add numba code, fix style and add todos Signed-off-by: Oktai Tatanov <[email protected]> * small fix * fix some todos * added numba mas * added DDP sampler * specified versions * fixed for new librosa version * added feature loss * added IPA phonemizer * refactored IPA g2p * added vits losses * some ref * fix * added checkpointing * cp * cfg * merged some 1.8.0 fixes * plt fix * fix logging * fix checkpoint loading * refactored inference * fp32 run * update branch Signed-off-by: ericharper <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * new exp * update branch Signed-off-by: ericharper <[email protected]> * Restored tests previously disabled for 22.03 base (NVIDIA#4109) Signed-off-by: Boris Fomitchev <[email protected]> * add augmentation to label models (NVIDIA#4113) * add augmentation to label models Signed-off-by: nithinraok <[email protected]> * duration fix Signed-off-by: nithinraok <[email protected]> * Call register_bert_model after assigning self.bert_model variable (NVIDIA#4116) Signed-off-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> * Tutorial on ITN with Thutmose tagger and small fixes (NVIDIA#4117) * 1. Add tutorial. 2. Move a function to fix import in tutorial. 3. Merge multiple spaces into one space in the final output Signed-off-by: Alexandra Antonova <[email protected]> * fixes for code review Signed-off-by: Alexandra Antonova <[email protected]> * Add tutorial to tutorials.rst Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> * update the default (NVIDIA#4135) Signed-off-by: ekmb <[email protected]> * Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (NVIDIA#4136) * Fix restoring from checkpoint with label vocab dir Signed-off-by: PeganovAnton <[email protected]> * Add tests for various ways to pass label ids to model Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Do not create tmp directory Signed-off-by: PeganovAnton <[email protected]> * Fix parameter name Signed-off-by: PeganovAnton <[email protected]> * finish cherry-pick op Signed-off-by: PeganovAnton <[email protected]> * Fix labels errors Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate stage Signed-off-by: PeganovAnton <[email protected]> * Change target branch Signed-off-by: PeganovAnton <[email protected]> * fix typo (NVIDIA#4140) Signed-off-by: Yang Zhang <[email protected]> * Fix/punctuation avoid overwritting tmp files (NVIDIA#4144) * Add draft of fixing tmp files overwritting Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Remove accidental changes Signed-off-by: PeganovAnton <[email protected]> * Use built-in tempfile library Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * bug_fix_diarization_manifest_creation (NVIDIA#4125) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * fix doc (NVIDIA#4146) Signed-off-by: Yang Zhang <[email protected]> * Tacotron2 retrain (NVIDIA#4103) * fix yaml Signed-off-by: treacker <[email protected]> * Fix for new TTSDataset class Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * added wandb logging Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * fix numpy version Signed-off-by: treacker <[email protected]> * inference fix Signed-off-by: treacker <[email protected]> * removed old code Signed-off-by: treacker <[email protected]> * updated parser logic Signed-off-by: treacker <[email protected]> * reverted version update Signed-off-by: treacker <[email protected]> * refactored parser logic Signed-off-by: treacker <[email protected]> * Updated Jenkinsfile Signed-off-by: treacker <[email protected]> * Refactored tutorial for Tacotron2 Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Made backward compatibility Signed-off-by: treacker <[email protected]> * Update Jenkinsfile Signed-off-by: treacker <[email protected]> * Update tacotron.yaml Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * cleaned up TN/ ITN doc (NVIDIA#4119) * cleaned up TN/ ITN doc Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> * fix image Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: treacker <[email protected]> * Check implicit grad acc in GLUE dataset building (NVIDIA#4123) * Check implicit grad acc in GLUE dataset building Signed-off-by: MaximumEntropy <[email protected]> * Fix jenkins test for GLUE/XNLI Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Fixed jenkins Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> * Refactoring Signed-off-by: treacker <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Multiprocess improvements (NVIDIA#4127) * initial commit Signed-off-by: nithinraok <[email protected]> * start fix Signed-off-by: nithinraok <[email protected]> * improve multiprocessing speed while creating speaker dataset Signed-off-by: nithinraok <[email protected]> * updated scp to filelist Signed-off-by: nithinraok <[email protected]> * WaveGlow input type fixes (NVIDIA#4151) Signed-off-by: Jocelyn Huang <[email protected]> * notebooks' link, typo and import fix (NVIDIA#4158) * redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]> * Thutmose tagger bug fixes (NVIDIA#4162) * add pretrained ngc model, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * fix model location Signed-off-by: Alexandra Antonova <[email protected]> * 1. fix typos. 2. write magic functions without space Signed-off-by: Alexandra Antonova <[email protected]> * add example of inference with pretrained model Signed-off-by: Alexandra Antonova <[email protected]> * changed model location to nemo Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * fix space Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> * update speaker docs (NVIDIA#4164) * update speaker docs Signed-off-by: nithinraok <[email protected]> * chunks -> segments Signed-off-by: nithinraok <[email protected]> * Khz -> kHz Signed-off-by: nithinraok <[email protected]> * changed to vits g2p * refactoring * added cosineLR * Updated whitelist path * added vanilla torch grad scaler * Fixed lightning version * added warmup and wd * switched to cosineLR * refactored data classes for vits * some fixes * fixed import * changeg train loop * fixed scheduler bug * refactoring for exps * Refactored loss logic * Ref for exps * added coqui stuff * exps * bugfix * added side file * bugfix * reverted * fixed sampler behaviour * updated for ptl 1.7.2 * refactored dataloader func * some cleaning * reverted to vanilla loss * modified for pickling * added dataset class * fixed torch version * added autocast for fp training * removed coqui files * 'Fixed tokenizer' * Fix tokenizer * update branch Signed-off-by: ericharper <[email protected]> * Fix link to inference notebook (NVIDIA#5247) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Update ASR scores table (NVIDIA#5254) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Fix links to speaker identification notebook (NVIDIA#5260) Signed-off-by: SeanNaren <[email protected]> Signed-off-by: SeanNaren <[email protected]> * Minor typo fixes in TTS tutorial (NVIDIA#5266) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> * Pcla tutorial fixes (NVIDIA#5271) * Fixed typos Signed-off-by: Matvei Novikov <[email protected]> * Fixed cell type and tatoeba reference Signed-off-by: Matvei Novikov <[email protected]> * Fixed typo Signed-off-by: Matvei Novikov <[email protected]> * Fixed branch variable Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix bug into Dialogue tutorial (NVIDIA#5277) * Typo fix (NVIDIA#5288) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Fix dialogue tutorial bug (NVIDIA#5297) * set add_pooling_layer=False for huggingface bert model * remove add_pooling_layer=False and set find_unused_parameters=True * set num_prompt_tokens to 0 for huggingface * small bugfix for r1.13.0 (NVIDIA#5310) * typo fix Signed-off-by: fayejf <[email protected]> * udpate transcribe Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> * Add italian model checkpoints (NVIDIA#5316) Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Igor Gitman <[email protected]> * [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer (NVIDIA#5340) * [STT] Add stt_ru_conformer_ctc_large Signed-off-by: Sasha Meister <[email protected]> * [STT] Add stt_ru_conformer_transducer_large Add stt_ru_conformer_transducer_large Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pcla tutorial fixes (NVIDIA#5313) * fixes Signed-off-by: Matvei Novikov <[email protected]> * fixes Signed-off-by: Matvei Novikov <[email protected]> * moved `create_text_and_labels` to token_classification_utils.py Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * a lot of refactoring * strict ptl version * strict ptl version * reverted plt version * Added base text2audio class * Fix issue with HF Model upload tutorial (NVIDIA#5359) * Add Gradio App to ASR Docs (NVIDIA#5270) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> (cherry picked from commit e4b6a38) * Fix issue with normalized config for dataset name Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * tutorial fixes (NVIDIA#5354) Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> * Add SDP documentation (NVIDIA#5274) * Add details to SDP README.md Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to WriteManifest processor Signed-off-by: Elena Rastorgueva <[email protected]> * Add docstring to CreateInitialManifestMLS Signed-off-by: Elena Rastorgueva <[email protected]> * Add ModifyManifestTextProcessor docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add ASRInference docstring Signed-off-by: Elena Rastorgueva <[email protected]> * Add base_processor docstrings Signed-off-by: Elena Rastorgueva <[email protected]> * Add minimal SDP docs page Signed-off-by: Elena Rastorgueva <[email protected]> * Update tools/speech_dataset_processor/README.md Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Write simple README for SDP and move complex explanations to docs Signed-off-by: Elena Rastorgueva <[email protected]> * Remove incorrect type hints Signed-off-by: Elena Rastorgueva <[email protected]> * Make config example less confusing Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typo Signed-off-by: Elena Rastorgueva <[email protected]> * Clarify that YAML file is config file in README Signed-off-by: Elena Rastorgueva <[email protected]> * Remove unused imports Signed-off-by: Elena Rastorgueva <[email protected]> * Remove SDP docs for now Signed-off-by: Elena Rastorgueva <[email protected]> * Remove links to docs in SDP README Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Igor Gitman <[email protected]> * [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 (NVIDIA#5375) * Fix minor error in notebook Signed-off-by: Taejin Park <[email protected]> * changed branch name in tutorial notebook Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Rename Speech Dataset Processor to Speech Data Processor (NVIDIA#5378) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix for num worker 0 causing issues in losses after 1 epoch (NVIDIA#5379) * Fixed bug in notebook (NVIDIA#5382) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Virginia Adams <[email protected]> * Force MHA QKV onto fp32 (NVIDIA#5391) Signed-off-by: smajumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> * Added scheduling variety * ref * Fix for prompt table restore error (NVIDIA#5393) * Fix for prompt table restore error Signed-off-by: Virginia Adams <[email protected]> * Added more saftey checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added more condition checks Signed-off-by: Virginia Adams <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix args (NVIDIA#5410) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * bugfix * import tests * Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) Signed-off-by: Yu Yao <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Megatron Export Update (NVIDIA#5343) * export update for Megatron + change ORT optimization Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated export_utils to use autocast instead of manually casting >:/ Signed-off-by: David Mosallanezhad <[email protected]> * removed dtype from LayerNorm Signed-off-by: David Mosallanezhad <[email protected]> * added comment Signed-off-by: David Mosallanezhad <[email protected]> * reverting changes on FloatCast Signed-off-by: David Mosallanezhad <[email protected]> * Cherry-picked changes from megatron-norm Signed-off-by: Boris Fomitchev <[email protected]> * updated asr_model import to cast_utils Signed-off-by: David Mosallanezhad <[email protected]> * updated del onnx_model place Signed-off-by: David Mosallanezhad <[email protected]> * changed ort optimization to basic -> temp fix Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Boris Fomitchev <[email protected]> * disable pc test (NVIDIA#5426) Signed-off-by: ekmb <[email protected]> Signed-off-by: ekmb <[email protected]> * Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Disable sync_batch_comm in validation_step for GPT (NVIDIA#5397) * disable sync_batch_comm in validation_step Signed-off-by: ericharper <[email protected]> * Read sync_batch_comm from config or default to False Signed-off-by: Markel Sanz Ausin <[email protected]> * Update megatron_gpt_config to default sync_batch_comm to False to avoid CUDA error Signed-off-by: Markel Sanz Ausin <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Comment out test Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) This reverts commit 0718b17. * Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) * Revert workers workaround Signed-off-by: MaximumEntropy <[email protected]> * Fix in config Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Fixed discrepancies * updated Jenkisfile * updated Jenkisfile * Cleaning * fixed the onnx bug in conformer for non-streaming models. (NVIDIA#5242) (NVIDIA#5446) Signed-off-by: Vahid <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> * Set sync_batch_comm in other places (NVIDIA#5448) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Radtts 1.13 (NVIDIA#5451) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358) * [TTS] add CI test for RADTTS training recipe. Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Radtts 1.13 plus (NVIDIA#5457) * [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue (NVIDIA#5358) * Fixing RADTTS training - removing view buffer and fixing accuracy issue * Fixes for Torchscript/Triton * Added autocast to radtts UT * using cuda() for training example Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * Add num layers check (NVIDIA#5470) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Change to kwargs (NVIDIA#5475) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339) (NVIDIA#5478) * Initial refactor Signed-off-by: MaximumEntropy <[email protected]> * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy <[email protected]> * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy <[email protected]> * Fixes for eval Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy <[email protected]> * Remove comments Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Remove old comment Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * export_utils bugfix (NVIDIA#5480) * updated export_utils Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Export fixes for Riva (NVIDIA#5496) * Export fixes for Riva Signed-off-by: Boris Fomitchev <[email protected]> * Cleaning up training_utils Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> * minor bug fix (NVIDIA#5521) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * added set_start_method + function param bugfix (NVIDIA#5539) * added set_start_method + function param bugfix Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upper bound torchmetrics Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> * remove notebook (NVIDIA#5548) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * Remove broadcast (NVIDIA#5558) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * cleaning * Fix all gather while writing to a file during T5 finetuning (NVIDIA#5561) * Gather from data parallel only instead of all ranks Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * update readme Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added copyright * fixed imports * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed filesize check * last cleaning Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated cmudict path * fixed merge bug Signed-off-by: Evgeniy Shabalin <[email protected]> * warnings fix * fix warnings Signed-off-by: Evgeniy Shabalin <[email protected]> * storing * updated version Signed-off-by: Evgeniy Shabalin <[email protected]> * update Jenkinsfile versions Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed issues Signed-off-by: Evgeniy Shabalin <[email protected]> * fixed more issues * more fixes Signed-off-by: Evgeniy Shabalin <[email protected]> * added experimental tag * Clarification updates Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * remove old cython code Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstring fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * Enhancements Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * imports fix Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Evgeniy Shabalin <[email protected]> * excessive comtutations fix Signed-off-by: Evgeniy Shabalin <[email protected]> * typecheck fix Signed-off-by: Evgeniy Shabalin <[email protected]> * Small refactoring * Small refactoring Signed-off-by: Evgeniy Shabalin <[email protected]> * reversed exp_manager params Signed-off-by: Evgeniy Shabalin <[email protected]> * Fixed call for new function signature Signed-off-by: Evgeniy Shabalin <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: PeganovAnton <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Yu Yao <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Markel Sanz Ausin <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Oktai Tatanov <[email protected]> Co-authored-by: jasonjjl1999 <[email protected]> Co-authored-by: martynwei <[email protected]> Co-authored-by: Ryan Hong <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: Ramanathan Arunachalam <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Sasha Meister <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

fayejf added 2 commits May 11, 2022 22:48

redo missing pr 4007

70aeaaa

Signed-off-by: fayejf <[email protected]>

remove extremely unreliable links

f743115

Signed-off-by: fayejf <[email protected]>

fayejf requested review from titu1994 and nithinraok May 12, 2022 07:02

titu1994 approved these changes May 12, 2022

View reviewed changes

nithinraok approved these changes May 12, 2022

View reviewed changes

fayejf merged commit b34609f into r1.9.0 May 12, 2022

fayejf deleted the nb_fix branch May 12, 2022 17:32

ericharper pushed a commit that referenced this pull request May 18, 2022

notebooks' link, typo and import fix (#4158)

c3b7d33

* redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]>

ericharper pushed a commit that referenced this pull request Jun 3, 2022

notebooks' link, typo and import fix (#4158)

afca46a

* redo missing pr 4007 Signed-off-by: fayejf <[email protected]> * remove extremely unreliable links Signed-off-by: fayejf <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks' link, typo and import fix #4158

notebooks' link, typo and import fix #4158

fayejf commented May 12, 2022 •

edited

Loading

fayejf commented May 12, 2022

notebooks' link, typo and import fix #4158

notebooks' link, typo and import fix #4158

Conversation

fayejf commented May 12, 2022 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

fayejf commented May 12, 2022

fayejf commented May 12, 2022 •

edited

Loading