Skip to content

Commit

Permalink
Stateless decoder for RNN-T (#4710)
Browse files Browse the repository at this point in the history
* stateless RNNT working

Signed-off-by: Hainan Xu <[email protected]>

* batch decode working

Signed-off-by: Hainan Xu <[email protected]>

* working backup

Signed-off-by: Hainan Xu <[email protected]>

* good working version

Signed-off-by: Hainan Xu <[email protected]>

* temporarily make norm layer have affine

Signed-off-by: Hainan Xu <[email protected]>

* temp

Signed-off-by: Hainan Xu <[email protected]>

* temp

Signed-off-by: Hainan Xu <[email protected]>

* [TTS] add staticmethod decoration for BetaBinomialInterpolator (#4319)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* [TTS] remove redundant lines and declare global variables and capture (#4320)

exception of non-supported windows.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Finetune T5 on the prefix-lm objective (#4328)

* Add script and yaml config

Signed-off-by: MaximumEntropy <[email protected]>

* Fix yaml config

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Update yaml to remove hardcoded model path

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Fuse bias with geglu in ParallelMLP (#4213)

* add code of fused_bias_geglu

* call fused_bias_geglu in ParallelMLP

* fix some bugs

* change biad_gelu_activation to bias_activation_fusion

* fix the setting of bias_actication_fusion for T5

* delete bias_gelu_fusion from T5 example config

* push reformatted files

* hto4h gemms fusion

* remove hto4h gemms fusion

* push reformatted files

* disable bias_activation_fusion while activation is not geglu

* add bias_activation_fusion in yaml config file

* add bias_gelu_fusion in T5 config yaml file to pass CI test

* change bias_gelu_fusion to bias_activation_fusion for T5 CI test

* recover latest change

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Support larger datasets for question answering  (#4205)

* refactor dialogue state tracking for modelling/dataset interoperability

Signed-off-by: Zhilin Wang <[email protected]>

* fix style changes

Signed-off-by: Zhilin Wang <[email protected]>

* fix typo

Signed-off-by: Zhilin Wang <[email protected]>

* fix style raised by lgtm

Signed-off-by: Zhilin Wang <[email protected]>

* fix style formatting

Signed-off-by: Zhilin Wang <[email protected]>

* update template to include description of intent

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* changes based on requests in review

Signed-off-by: Zhilin Wang <[email protected]>

* add compatibility with assistant dataset

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* remove dialogue_state_tracking

Signed-off-by: Zhilin Wang <[email protected]>

* update huggingface utils for dialogue

Signed-off-by: Zhilin Wang <[email protected]>

* rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* fix style

Signed-off-by: Zhilin Wang <[email protected]>

* style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* fix typo

Signed-off-by: Zhilin Wang <[email protected]>

* add docstrings for assistant data processsor

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins for SGDGEN local checkpoint

Signed-off-by: Zhilin Wang <[email protected]>

* update style

Signed-off-by: Zhilin Wang <[email protected]>

* use local vocab file for Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* patch for Jenkins CI using local file

Signed-off-by: Zhilin Wang <[email protected]>

* add slot filling prediction and metrics

Signed-off-by: Zhilin Wang <[email protected]>

* remove unused code

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* refactor metrics code out of Dialogue GPT Model

Signed-off-by: Zhilin Wang <[email protected]>

* integrate backward compatible support for IntentSlotClassificationModel (bert model)

Signed-off-by: Zhilin Wang <[email protected]>

* save prediction file for IntentSlotClassification

Signed-off-by: Zhilin Wang <[email protected]>

* update dialogue gpt model training for megatron gpt

Signed-off-by: Zhilin Wang <[email protected]>

* remove batch generate for HF GPT2, which causes lower performance

Signed-off-by: Zhilin Wang <[email protected]>

* add few shot capability to dialogue gpt model

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile and remove unused import

Signed-off-by: Zhilin Wang <[email protected]>

* update code description and clarity

Signed-off-by: Zhilin Wang <[email protected]>

* address PR comments

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* integrate compatibility with ZeroShotIntentModel

Signed-off-by: Zhilin Wang <[email protected]>

* rename folder to dialogue due to increased scope and further refactor for clarity

Signed-off-by: Zhilin Wang <[email protected]>

* added dialogue GPT for sequence generation task (e.g. answer extender)

Signed-off-by: Zhilin Wang <[email protected]>

* add CI test for DialogueGPTGenerationModel

Signed-off-by: Zhilin Wang <[email protected]>

* integrate DialogueS2SGenerationModel for generation task (e.g. answer extender)

Signed-off-by: Zhilin Wang <[email protected]>

* modify huggingface utils to support HF t5/BART models

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* remove unused imports

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* update bleu metric

Signed-off-by: Zhilin Wang <[email protected]>

* fix bleu metric style

Signed-off-by: Zhilin Wang <[email protected]>

* debug bleu metric

Signed-off-by: Zhilin Wang <[email protected]>

* debug bleu metric

Signed-off-by: Zhilin Wang <[email protected]>

* update based on PR #3893

Signed-off-by: Zhilin Wang <[email protected]>

* update 2 based on PR #3893

Signed-off-by: Zhilin Wang <[email protected]>

* update 3 based on PR #3893

Signed-off-by: Zhilin Wang <[email protected]>

* integrate sgd generation based on user user utterance and system slot-values to generate system utterance

Signed-off-by: Zhilin Wang <[email protected]>

* add validation model saving capabilities

Signed-off-by: Zhilin Wang <[email protected]>

* cleaned up code for SGD Based Answer extender

Signed-off-by: Zhilin Wang <[email protected]>

* update Dialogue Generation CI

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* fix Jenkins CI issue"

Signed-off-by: Zhilin Wang <[email protected]>

* add support for design dataset

Signed-off-by: Zhilin Wang <[email protected]>

* remove unnecessary imports

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* update jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* update jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* support megatron for dialogue_s2s_generation_model

Signed-off-by: Zhilin Wang <[email protected]>

* reduce loaded samples in MSMarcoDataProcessor to 64 when cfg.model.dataset.debug_mode=True

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update CI

Signed-off-by: Zhilin Wang <[email protected]>

* update checkpoint and predictions filename to include epoch number

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* integrate HF BART MNLI into zero shot intent model

Signed-off-by: Zhilin Wang <[email protected]>

* integrate Dialogue Nearest Neighbour Model

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* refactor Dialogue SGD Data Processor to make interface for models cleaner

Signed-off-by: Zhilin Wang <[email protected]>

* update jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* update Dialogue S2S Generation model for DialogueSGDDataProcessor interface

Signed-off-by: Zhilin Wang <[email protected]>

* update jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* update jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* support sgd and drive thru datasets by zero shot model and nearest neighbour model

Signed-off-by: Zhilin Wang <[email protected]>

* add prediction saving code to nearest neighbour and zero shot intent models

Signed-off-by: Zhilin Wang <[email protected]>

* fix typo in sgd data processor

Signed-off-by: Zhilin Wang <[email protected]>

* integrate Dialogue Mellon QA Data Processor

Signed-off-by: Zhilin Wang <[email protected]>

* update mellon qa

Signed-off-by: Zhilin Wang <[email protected]>

* update dialogue.py to remove outdated info

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update dialogue_config.yaml

Signed-off-by: Zhilin Wang <[email protected]>

* update dialogue_config.yaml

Signed-off-by: Zhilin Wang <[email protected]>

* add dialogue docs

Signed-off-by: Zhilin Wang <[email protected]>

* address review comments

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix for cfg

Signed-off-by: Zhilin Wang <[email protected]>

* make dependency on apex optional

Signed-off-by: Zhilin Wang <[email protected]>

* change NLPDDPluggin calling logic to make it possible to run without apex

Signed-off-by: Zhilin Wang <[email protected]>

* add first draft of tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* reduce ms marco size by removing lines without wellFormedAnswers

Signed-off-by: Zhilin Wang <[email protected]>

* address pr comments

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update colab tutorial link in dialogue docs

Signed-off-by: Zhilin Wang <[email protected]>

* include unit test and some refactor to facilitate unit test

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* address pr issues

Signed-off-by: Zhilin Wang <[email protected]>

* remove typos in dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* support larger files for question answering

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* remove unnecessary artifacts to reduce memory use

Signed-off-by: Zhilin Wang <[email protected]>

* put 0 tensor to device

Signed-off-by: Zhilin Wang <[email protected]>

* update link within dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* restore previously delete files

Signed-off-by: Zhilin Wang <[email protected]>

* update error handling when loss = nan

Signed-off-by: Zhilin Wang <[email protected]>

* update nan handling

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update spanning loss func

Signed-off-by: Zhilin Wang <[email protected]>

* update spanning loss

Signed-off-by: Zhilin Wang <[email protected]>

* fix type error raised in qa_dataset.py

Signed-off-by: Zhilin Wang <[email protected]>

* add error checking message

Signed-off-by: Zhilin Wang <[email protected]>

* revert back to float32

Signed-off-by: Zhilin Wang <[email protected]>

* revert back to float32

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update exp logging

Signed-off-by: Zhilin Wang <[email protected]>

* update error msgs

Signed-off-by: Zhilin Wang <[email protected]>

* update loading of large file from pickle to json

Signed-off-by: Zhilin Wang <[email protected]>

* update loading of large file from pickle to json

Signed-off-by: Zhilin Wang <[email protected]>

* limit number of negative samples

Signed-off-by: Zhilin Wang <[email protected]>

* revert post processing

Signed-off-by: Zhilin Wang <[email protected]>

* revert post processing

Signed-off-by: Zhilin Wang <[email protected]>

* remove unused methods and style fix

Signed-off-by: Zhilin Wang <[email protected]>

* add more documentation

Signed-off-by: Zhilin Wang <[email protected]>

* remove unused imports

Signed-off-by: Zhilin Wang <[email protected]>

* changes base on PR review

Signed-off-by: Zhilin Wang <[email protected]>

Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Fix bugs in indexed dataset exam script (#4325)

* fix the typo

Signed-off-by: Yi Dong <[email protected]>

* add neighbors option

Signed-off-by: Yi Dong <[email protected]>

* change the argument name

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Micha Livne <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Adding docs for ASR SSL (#4303)

* Initial commit for SSL docs

Signed-off-by: Krishna Puvvada <[email protected]>

* ssl docs update-1

Signed-off-by: Krishna Puvvada <[email protected]>

* ssl docs update-2

Signed-off-by: Krishna Puvvada <[email protected]>

Co-authored-by: Krishna Puvvada <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Fuse grad division into async grad allreduce (#4327)

* O2 runs but O1 does not

Signed-off-by: ericharper <[email protected]>

* disable async for O1

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update async flag in configure_optimizers

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* update _require if using async

Signed-off-by: ericharper <[email protected]>

* clean comments

Signed-off-by: ericharper <[email protected]>

* always all_reduce

Signed-off-by: ericharper <[email protected]>

* add async grad allreduce and chunk optimization to T5

* push reformatted files after style check

* set chunk size as 0 while async grad allreduce is off

* more experiments show that 125MB is a better default chunk size for most cases

* add grad_allreduce_chunk_size_mb for GPT-3

* at the end of each training step, wait until all async grad allreduce works are done

* replace individual allreduce work.wait() with a single dGPU evice synchroonization

* add code of fused_bias_geglu

* call fused_bias_geglu in ParallelMLP

* record the status of each allreduce work seems too much for perf

* add more comments

* push a reformatted file

* fix some bugs

* change biad_gelu_activation to bias_activation_fusion

* fix the setting of bias_actication_fusion for T5

* delete bias_gelu_fusion from T5 example config

* push reformatted files

* fuse grad scale with allreduce

* push reformatted files

* hto4h gemms fusion

* remove hto4h gemms fusion

* add grad_scale_ar_fusion into GPT-3

* push reformatted files

* push reformatted files

* rename grad_scale_ar_fusion to grad_div_ar_fusion

* disable bias_activation_fusion while activation is not geglu

* add bias_activation_fusion in yaml config file

* add bias_gelu_fusion in T5 config yaml file to pass CI test

* change bias_gelu_fusion to bias_activation_fusion for T5 CI test

* recover latest change

* add grad_div_ar_fusion in config yaml file

* remove a redundant float()

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Update container to 22.05 (#4329)

* update container to 22.05

Signed-off-by: ericharper <[email protected]>

* try adding safe directory

Signed-off-by: ericharper <[email protected]>

* try env var

Signed-off-by: ericharper <[email protected]>

* printenv

Signed-off-by: ericharper <[email protected]>

* try GIT_BRANCH

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* remove dbug statements

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Torchaudio installation fix (#4330)

* separate installer added

Signed-off-by: Aleksandr Laptev <[email protected]>

* apply suggestions, minor fixes

Signed-off-by: Aleksandr Laptev <[email protected]>

Co-authored-by: Aleksandr Laptev <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* [TTS] enforced pin_memory = True (#4341)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Merge r1.9.0 main (#4331)

* update branch

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* cleaned up TN/ ITN doc (#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136)

* Fix restoring from checkpoint with label vocab dir

Signed-off-by: PeganovAnton <[email protected]>

* Add tests for various ways to pass label ids to model

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Do not create tmp directory

Signed-off-by: PeganovAnton <[email protected]>

* Fix parameter name

Signed-off-by: PeganovAnton <[email protected]>

* finish cherry-pick op

Signed-off-by: PeganovAnton <[email protected]>

* Fix labels errors

Signed-off-by: PeganovAnton <[email protected]>

* Remove duplicate stage

Signed-off-by: PeganovAnton <[email protected]>

* Change target branch

Signed-off-by: PeganovAnton <[email protected]>

* fix doc (#4146)

Signed-off-by: Yang Zhang <[email protected]>

* Tacotron2 retrain (#4103)

* fix yaml

Signed-off-by: treacker <[email protected]>

* Fix for new TTSDataset class

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* inference fix

Signed-off-by: treacker <[email protected]>

* removed old code

Signed-off-by: treacker <[email protected]>

* updated parser logic

Signed-off-by: treacker <[email protected]>

* reverted version update

Signed-off-by: treacker <[email protected]>

* refactored parser logic

Signed-off-by: treacker <[email protected]>

* Updated Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Refactored tutorial for Tacotron2

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Update Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Update tacotron.yaml

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* cleaned up TN/ ITN doc (#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: treacker <[email protected]>

* Check implicit grad acc in GLUE dataset building (#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Fixed jenkins

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Multiprocess improvements (#4127)

* initial commit

Signed-off-by: nithinraok <[email protected]>

* start fix

Signed-off-by: nithinraok <[email protected]>

* improve multiprocessing speed while creating speaker dataset

Signed-off-by: nithinraok <[email protected]>

* updated scp to filelist

Signed-off-by: nithinraok <[email protected]>

* notebooks' link, typo and import  fix  (#4158)

* redo missing pr 4007

Signed-off-by: fayejf <[email protected]>

* remove extremely unreliable links

Signed-off-by: fayejf <[email protected]>

* update speaker docs (#4164)

* update speaker docs

Signed-off-by: nithinraok <[email protected]>

* chunks -> segments

Signed-off-by: nithinraok <[email protected]>

* Khz -> kHz

Signed-off-by: nithinraok <[email protected]>

* small fix (#4180)

Signed-off-by: fayejf <[email protected]>

* fix the server key value problem (#4196)

Signed-off-by: Yi Dong <[email protected]>

* Fix/punctuation/trainer required for setting test data (#4199)

* Draft of fix

Signed-off-by: PeganovAnton <[email protected]>

* Add warnings and replace globa_step with current_epoch

Signed-off-by: PeganovAnton <[email protected]>

* Small improvements to warnings

Signed-off-by: PeganovAnton <[email protected]>

* Error and warning messages improvements

Signed-off-by: PeganovAnton <[email protected]>

* Replace self.trainer with self._trainer

Signed-off-by: PeganovAnton <[email protected]>

* Update ContextNet version (#4207)

Signed-off-by: smajumdar <[email protected]>

* fix bugs for dialogue tutorial (#4211)

Signed-off-by: Zhilin Wang <[email protected]>

* Dialogue tutorial fix (#4214)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* Add docs for Thutmose Tagger (#4173)

* Add docs for Thutmose Tagger

Signed-off-by: Alexandra Antonova <[email protected]>

* add level in docs

Signed-off-by: Alexandra Antonova <[email protected]>

* delete folder to avoid error with running when folder exists from previous run

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: ekmb <[email protected]>

* Dialogue tutorial fix (#4218)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* restore previously deleted files

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* Dialogue tutorial fix (#4221)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* restore previously deleted files

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* fix syntax error in ipynb-file (#4228)

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* fix json serialize (#4235)

Signed-off-by: Yi Dong <[email protected]>

* Prompt Learning Typo Fixes (#4238)

* Prompt tuning notebook typo fixes

Signed-off-by: Virginia Adams <[email protected]>

* Update tutorials.rst

* Update prompt_learning.rst

* Update prompt_learning.rst

* fixing bug 3642622 (#4250)

* fixing bug 3642622

Signed-off-by: Ghasem Pasandi <[email protected]>

* fixing bug 3642622

Signed-off-by: Ghasem Pasandi <[email protected]>

Co-authored-by: Ghasem Pasandi <[email protected]>

* fix broken link in the tutorial (#4257)

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* Typo fix, branch change, better download messagae (#4262)

Signed-off-by: Virginia Adams <[email protected]>

* Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264)

* Raise error if bicleaner is not installed

Signed-off-by: MaximumEntropy <[email protected]>

* Clear cells

Signed-off-by: MaximumEntropy <[email protected]>

* Fix missing validation dataset, whitelist certain keywords for datasets (#4269)

* Fix missing validation dataset, whitelist certain keywords for datasets

Signed-off-by: smajumdar <[email protected]>

* Fix missing validation dataset, whitelist certain keywords for datasets

Signed-off-by: smajumdar <[email protected]>

* Update asr configs with num_workers and pin_memory (#4270)

Signed-off-by: smajumdar <[email protected]>

* Fix epoch end (#4265)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Eric Harper <[email protected]>

* Set Save on train end to false (#4274)

* Set Save on train end to false

Signed-off-by: Virginia Adams <[email protected]>

* Update prompt_learning.rst

* Update prompt_learning.rst

* Update YAML (#4261)

Signed-off-by: MaximumEntropy <[email protected]>

* Updated config to fix CI test OOM error (#4279)

* Updated config to fix CI test issue

Signed-off-by: Virginia Adams <[email protected]>

* Increased num workers

Signed-off-by: Virginia Adams <[email protected]>

* verbose k2 install, skip if failed (#4289)

Signed-off-by: Aleksandr Laptev <[email protected]>

Co-authored-by: Aleksandr Laptev <[email protected]>

* Changed total virtual prompt tokens (#4295)

* Changed total virtual prompt tokens

Signed-off-by: Virginia Adams <[email protected]>

* put number of workers back

Signed-off-by: Virginia Adams <[email protected]>

* upper bound lightning

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* update config

Signed-off-by: ericharper <[email protected]>

* remove duplicate test

Signed-off-by: ericharper <[email protected]>

* fix tn test cases

Signed-off-by: ericharper <[email protected]>

* add another safe.directory

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: PeganovAnton <[email protected]>
Co-authored-by: treacker <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: ekmb <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Ghasem <[email protected]>
Co-authored-by: Ghasem Pasandi <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels (#4266)

* initial commit

Signed-off-by: Akshit Arora <[email protected]>

* cleared notebook outputs

Signed-off-by: Akshit Arora <[email protected]>

* formatting errors

Signed-off-by: Akshit Arora <[email protected]>

* formatting

Signed-off-by: Akshit Arora <[email protected]>

* addressed comments

Signed-off-by: Akshit Arora <[email protected]>

* addressed comments on tutorial

Signed-off-by: Akshit Arora <[email protected]>

* updated tutorial

Signed-off-by: Akshit Arora <[email protected]>

* updated grammar and fastpitch description

Signed-off-by: Akshit Arora <[email protected]>

* updated with feedback

Signed-off-by: Akshit Arora <[email protected]>

* updated with feedback

Signed-off-by: Akshit Arora <[email protected]>

* removed unused imports

Signed-off-by: Akshit Arora <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Speedup the speech commands dataset processing script (#4347)

* Add multiprocessing support to the google speech commands dataset processing script

Signed-off-by: Shantanu Acharya <[email protected]>

* fix number of args error with __extract_all_files function

Signed-off-by: Shantanu Acharya <[email protected]>

* fix styling issues

Signed-off-by: Shantanu Acharya <[email protected]>

* fix bugs with silence set construction and update librosa output write to use soundfile write

Signed-off-by: Shantanu Acharya <[email protected]>

* add docstrings and return values in __construct_filepaths as dictionary

Signed-off-by: Shantanu Acharya <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* fix wrong requirement (#4349)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Refactored path to manifest (#4251)

Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* IPA support for TTS (#4310)

* IPA tokenizer and G2P untested draft

Signed-off-by: Jocelyn Huang <[email protected]>

* Add IPA CMUdict and new heteronyms list

Signed-off-by: Jocelyn Huang <[email protected]>

* Add draft FastPitch IPA config

Signed-off-by: Jocelyn Huang <[email protected]>

* Minor bugfixes for IPA training

Signed-off-by: Jocelyn Huang <[email protected]>

* Add phoneme_probability to IPA G2P

Signed-off-by: Jocelyn Huang <[email protected]>

* Updates to IPA FastPitch training config

Signed-off-by: Jocelyn Huang <[email protected]>

* Update IPA dict and heteronyms file

Signed-off-by: Jocelyn Huang <[email protected]>

* Adjust default lr for IPA FastPitch to 1e-3

Signed-off-by: Jocelyn Huang <[email protected]>

* Rename IPA CMUdict to reflect date

Signed-off-by: Jocelyn Huang <[email protected]>

* Add docstrings for IPA tokenizer and G2P, update CMUdict path for config

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix IPA vocab ordering, add options to uppercase graphemes and remove stress symbols

Signed-off-by: Jocelyn Huang <[email protected]>

* Mark IPA classes as experimental

Signed-off-by: Jocelyn Huang <[email protected]>

* Update apostrophe-S cases

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Tn install (#4055)

* remove conda pynini requirement

Signed-off-by: Yang Zhang <[email protected]>

* remove remnants

Signed-off-by: Yang Zhang <[email protected]>

* merge with main

Signed-off-by: Yang Zhang <[email protected]>

* removing nlp collection dependency from text processing and thus breaking cyclyc imports

Signed-off-by: Yang Zhang <[email protected]>

* fix

Signed-off-by: Yang Zhang <[email protected]>

* fix wrong requirement

Signed-off-by: Yang Zhang <[email protected]>

* fix bug in vi

Signed-off-by: Yang Zhang <[email protected]>

* update jenkins folders

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: ekmb <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* fix tutorial (#4352)

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* fix the post ln (#4350)

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* [Fix] Hanging for Fully Randomized Bucketing (#4348)

* Update container to 22.05 (#4329)

* update container to 22.05

Signed-off-by: ericharper <[email protected]>

* try adding safe directory

Signed-off-by: ericharper <[email protected]>

* try env var

Signed-off-by: ericharper <[email protected]>

* printenv

Signed-off-by: ericharper <[email protected]>

* try GIT_BRANCH

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* remove dbug statements

Signed-off-by: ericharper <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>

* Merge r1.9.0 main (#4331)

* update branch

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* cleaned up TN/ ITN doc (#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* Draft: Fix restoring from checkpoint for case when `model.common_dataset_parameters.label_vocab_dir` is provided (#4136)

* Fix restoring from checkpoint with label vocab dir

Signed-off-by: PeganovAnton <[email protected]>

* Add tests for various ways to pass label ids to model

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Fix typo

Signed-off-by: PeganovAnton <[email protected]>

* Do not create tmp directory

Signed-off-by: PeganovAnton <[email protected]>

* Fix parameter name

Signed-off-by: PeganovAnton <[email protected]>

* finish cherry-pick op

Signed-off-by: PeganovAnton <[email protected]>

* Fix labels errors

Signed-off-by: PeganovAnton <[email protected]>

* Remove duplicate stage

Signed-off-by: PeganovAnton <[email protected]>

* Change target branch

Signed-off-by: PeganovAnton <[email protected]>

* fix doc (#4146)

Signed-off-by: Yang Zhang <[email protected]>

* Tacotron2 retrain (#4103)

* fix yaml

Signed-off-by: treacker <[email protected]>

* Fix for new TTSDataset class

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* added wandb logging

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* fix numpy version

Signed-off-by: treacker <[email protected]>

* inference fix

Signed-off-by: treacker <[email protected]>

* removed old code

Signed-off-by: treacker <[email protected]>

* updated parser logic

Signed-off-by: treacker <[email protected]>

* reverted version update

Signed-off-by: treacker <[email protected]>

* refactored parser logic

Signed-off-by: treacker <[email protected]>

* Updated Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Refactored tutorial for Tacotron2

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Made backward compatibility

Signed-off-by: treacker <[email protected]>

* Update Jenkinsfile

Signed-off-by: treacker <[email protected]>

* Update tacotron.yaml

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* cleaned up TN/ ITN doc (#4119)

* cleaned up TN/ ITN doc

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>

* fix image

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: treacker <[email protected]>

* Check implicit grad acc in GLUE dataset building (#4123)

* Check implicit grad acc in GLUE dataset building

Signed-off-by: MaximumEntropy <[email protected]>

* Fix jenkins test for GLUE/XNLI

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Fixed jenkins

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

* Refactoring

Signed-off-by: treacker <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Multiprocess improvements (#4127)

* initial commit

Signed-off-by: nithinraok <[email protected]>

* start fix

Signed-off-by: nithinraok <[email protected]>

* improve multiprocessing speed while creating speaker dataset

Signed-off-by: nithinraok <[email protected]>

* updated scp to filelist

Signed-off-by: nithinraok <[email protected]>

* notebooks' link, typo and import  fix  (#4158)

* redo missing pr 4007

Signed-off-by: fayejf <[email protected]>

* remove extremely unreliable links

Signed-off-by: fayejf <[email protected]>

* update speaker docs (#4164)

* update speaker docs

Signed-off-by: nithinraok <[email protected]>

* chunks -> segments

Signed-off-by: nithinraok <[email protected]>

* Khz -> kHz

Signed-off-by: nithinraok <[email protected]>

* small fix (#4180)

Signed-off-by: fayejf <[email protected]>

* fix the server key value problem (#4196)

Signed-off-by: Yi Dong <[email protected]>

* Fix/punctuation/trainer required for setting test data (#4199)

* Draft of fix

Signed-off-by: PeganovAnton <[email protected]>

* Add warnings and replace globa_step with current_epoch

Signed-off-by: PeganovAnton <[email protected]>

* Small improvements to warnings

Signed-off-by: PeganovAnton <[email protected]>

* Error and warning messages improvements

Signed-off-by: PeganovAnton <[email protected]>

* Replace self.trainer with self._trainer

Signed-off-by: PeganovAnton <[email protected]>

* Update ContextNet version (#4207)

Signed-off-by: smajumdar <[email protected]>

* fix bugs for dialogue tutorial (#4211)

Signed-off-by: Zhilin Wang <[email protected]>

* Dialogue tutorial fix (#4214)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* Add docs for Thutmose Tagger (#4173)

* Add docs for Thutmose Tagger

Signed-off-by: Alexandra Antonova <[email protected]>

* add level in docs

Signed-off-by: Alexandra Antonova <[email protected]>

* delete folder to avoid error with running when folder exists from previous run

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: ekmb <[email protected]>

* Dialogue tutorial fix (#4218)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* restore previously deleted files

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* Dialogue tutorial fix (#4221)

* fix bugs for dialogue tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* update path for convert_datasets.py due to conflict PR

Signed-off-by: Zhilin Wang <[email protected]>

* restore previously deleted files

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* update tutorial

Signed-off-by: Zhilin Wang <[email protected]>

* fix syntax error in ipynb-file (#4228)

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* fix json serialize (#4235)

Signed-off-by: Yi Dong <[email protected]>

* Prompt Learning Typo Fixes (#4238)

* Prompt tuning notebook typo fixes

Signed-off-by: Virginia Adams <[email protected]>

* Update tutorials.rst

* Update prompt_learning.rst

* Update prompt_learning.rst

* fixing bug 3642622 (#4250)

* fixing bug 3642622

Signed-off-by: Ghasem Pasandi <[email protected]>

* fixing bug 3642622

Signed-off-by: Ghasem Pasandi <[email protected]>

Co-authored-by: Ghasem Pasandi <[email protected]>

* fix broken link in the tutorial (#4257)

Signed-off-by: Alexandra Antonova <[email protected]>

Co-authored-by: Alexandra Antonova <[email protected]>

* Typo fix, branch change, better download messagae (#4262)

Signed-off-by: Virginia Adams <[email protected]>

* Raise error if bicleaner is not installed in NMT Data preprocesing notebook (#4264)

* Raise error if bicleaner is not installed

Signed-off-by: MaximumEntropy <[email protected]>

* Clear cells

Signed-off-by: MaximumEntropy <[email protected]>

* Fix missing validation dataset, whitelist certain keywords for datasets (#4269)

* Fix missing validation dataset, whitelist certain keywords for datasets

Signed-off-by: smajumdar <[email protected]>

* Fix missing validation dataset, whitelist certain keywords for datasets

Signed-off-by: smajumdar <[email protected]>

* Update asr configs with num_workers and pin_memory (#4270)

Signed-off-by: smajumdar <[email protected]>

* Fix epoch end (#4265)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Eric Harper <[email protected]>

* Set Save on train end to false (#4274)

* Set Save on train end to false

Signed-off-by: Virginia Adams <[email protected]>

* Update prompt_learning.rst

* Update prompt_learning.rst

* Update YAML (#4261)

Signed-off-by: MaximumEntropy <[email protected]>

* Updated config to fix CI test OOM error (#4279)

* Updated config to fix CI test issue

Signed-off-by: Virginia Adams <[email protected]>

* Increased num workers

Signed-off-by: Virginia Adams <[email protected]>

* verbose k2 install, skip if failed (#4289)

Signed-off-by: Aleksandr Laptev <[email protected]>

Co-authored-by: Aleksandr Laptev <[email protected]>

* Changed total virtual prompt tokens (#4295)

* Changed total virtual prompt tokens

Signed-off-by: Virginia Adams <[email protected]>

* put number of workers back

Signed-off-by: Virginia Adams <[email protected]>

* upper bound lightning

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* update config

Signed-off-by: ericharper <[email protected]>

* remove duplicate test

Signed-off-by: ericharper <[email protected]>

* fix tn test cases

Signed-off-by: ericharper <[email protected]>

* add another safe.directory

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: PeganovAnton <[email protected]>
Co-authored-by: treacker <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: ekmb <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Ghasem <[email protected]>
Co-authored-by: Ghasem Pasandi <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>

* fix full_randn bucket hang

Signed-off-by: stevehuang52 <[email protected]>

* remove unused variables

Signed-off-by: stevehuang52 <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: PeganovAnton <[email protected]>
Co-authored-by: treacker <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Alexandra Antonova <[email protected]>
Co-authored-by: ekmb <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Ghasem <[email protected]>
Co-authored-by: Ghasem Pasandi <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Bits of RADTTS support (#4343)

* Bits of RADTTS support

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed args mismatch

Signed-off-by: Boris Fomitchev <[email protected]>

* Style

Signed-off-by: Boris Fomitchev <[email protected]>

* Addressed review comments

Signed-off-by: Boris Fomitchev <[email protected]>

* More review comments

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Prompt Learning Pipeline Parallel (#4291)

* Added get_forward_output_and_loss_func and updated train/val steps

Signed-off-by: Virginia Adams <[email protected]>

* Added preprocess flag before prompt table/encoder access

Signed-off-by: Virginia Adams <[email protected]>

* Made two optimizer groups, one for frozen, one for soft prompt

Signed-off-by: Virginia Adams <[email protected]>

* Pipeline parallel working

Signed-off-by: Virginia Adams <[email protected]>

* Still figuring out setting lr/sched for one param group

Signed-off-by: Virginia Adams <[email protected]>

* Set betas to zero

Signed-off-by: Virginia Adams <[email protected]>

* Only unfreeze one sublayer with lr 0.0

Signed-off-by: Virginia Adams <[email protected]>

* Pipeline parallel working w/ one optimizer

Signed-off-by: Virginia Adams <[email protected]>

* Trying to fix Jenkins file

Signed-off-by: Virginia Adams <[email protected]>

* Trying to fix Jenkins file

Signed-off-by: Virginia Adams <[email protected]>

* Getting updated jenkins test to work

Signed-off-by: Virginia Adams <[email protected]>

* Getting updated jenkins test to work

Signed-off-by: Virginia Adams <[email protected]>

* added prompt learning tp and pp CI tests

Signed-off-by: Virginia Adams <[email protected]>

* Added amp_o2 model compatibility

Signed-off-by: Virginia Adams <[email protected]>

* Made CI test smaller

Signed-off-by: Virginia Adams <[email protected]>

* Still trying to get Jenkins to work

Signed-off-by: Virginia Adams <[email protected]>

* Still trying to get Jenkins to work

Signed-off-by: Virginia Adams <[email protected]>

* Temporarily moving prompt learning CI test to beginning

Signed-off-by: Virginia Adams <[email protected]>

* Changing the layer being unfrozen

Signed-off-by: Virginia Adams <[email protected]>

* debug jenkins

Signed-off-by: Virginia Adams <[email protected]>

* Move pp unfreeze to init

Signed-off-by: Virginia Adams <[email protected]>

* Try to make Jenkins test parallel

Signed-off-by: Virginia Adams <[email protected]>

* Fix python formatting

Signed-off-by: Virginia Adams <[email protected]>

* Moved prompt learning tests back to where they belong

Signed-off-by: Virginia Adams <[email protected]>

* add back checkpoint convertion CI test

Signed-off-by: Virginia Adams <[email protected]>

* Revert "add back checkpoint convertion CI test"

This reverts commit 61e2ffcdefe964c8e74b74d8c10906ae29f32b6d.

* Add back checkpoint conversion test

Signed-off-by: Virginia Adams <[email protected]>

* Setting requires grad to True everywhere

Signed-off-by: Virginia Adams <[email protected]>

* Updated config comments and simplified param group code

Signed-off-by: Virginia Adams <[email protected]>

* Added comment on frozen_model having lr=0.0

Signed-off-by: Virginia Adams <[email protected]>

* Added configure optimizers methods

Signed-off-by: Virginia Adams <[email protected]>

* Set amp_o2 to false

Signed-off-by: Virginia Adams <[email protected]>

* removed o2 code

Signed-off-by: Virginia Adams <[email protected]>

* Python formatting fix

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* auto switch conformer encoder adapter in_features (#4354)

Signed-off-by: Shantanu Acharya <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Dataloader, collector, loss and metric for multiscale diarization decoder  (#4187)

* First commit

Signed-off-by: Taejin Park <[email protected]>

* Checked funtionality and imports

Signed-off-by: Taejin Park <[email protected]>

* fixed import issues

Signed-off-by: Taejin Park <[email protected]>

* Removed the changed made by mistake

Signed-off-by: Taejin Park <[email protected]>

* Style fix

Signed-off-by: Taejin Park <[email protected]>

* Fixed LGTM errors 001

Signed-off-by: Taejin Park <[email protected]>

* Fixed LGTM and style fix

Signed-off-by: Taejin Park <[email protected]>

* Changed docstrings

Signed-off-by: Taejin Park <[email protected]>

* LGTM again

Signed-off-by: Taejin Park <[email protected]>

* Removed unnecessary torch setting lines

Signed-off-by: Taejin Park <[email protected]>

* Style fix and isort

Signed-off-by: Taejin Park <[email protected]>

* jbalam-nv comments reflected

Signed-off-by: Taejin Park <[email protected]>

* style fix

Signed-off-by: Taejin Park <[email protected]>

* Reflected comments and created _diar_label.py

Signed-off-by: Taejin Park <[email protected]>

* Typo fix and style fix

Signed-off-by: Taejin Park <[email protected]>

* Fixed target_spks[0] index error

Signed-off-by: Taejin Park <[email protected]>

* style fix

Signed-off-by: Taejin Park <[email protected]>

* LGTM unused import IterDataset

Signed-off-by: Taejin Park <[email protected]>

* revert collection doc year

Signed-off-by: Taejin Park <[email protected]>

* Code format error in collections.py

Signed-off-by: Taejin Park <[email protected]>

* fix collections space format error

Signed-off-by: Taejin Park <[email protected]>

* merged main correctly

Signed-off-by: Taejin Park <[email protected]>

* style fix

Signed-off-by: Taejin Park <[email protected]>

* Reflected all comments and tested

Signed-off-by: Taejin Park <[email protected]>

* style fix and LGTM

Signed-off-by: Taejin Park <[email protected]>

* rttm_filepath to rttm_file and removed self included funcs, tested

Signed-off-by: Taejin Park <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Add ASR CTC Decoding module (#4342)

* Initial commit

Signed-off-by: smajumdar <[email protected]>

* Full support for decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Temp

Signed-off-by: smajumdar <[email protected]>

* Fix labels of y_sequence

Signed-off-by: smajumdar <[email protected]>

* Set support for sentencepiece subword merging

Signed-off-by: smajumdar <[email protected]>

* Fix char and word based token merge alignment

Signed-off-by: smajumdar <[email protected]>

* Revert incorrect change

Signed-off-by: smajumdar <[email protected]>

* Update docstring

Signed-off-by: smajumdar <[email protected]>

* Improve compatibility with greedy tokens and log probs

Signed-off-by: smajumdar <[email protected]>

* Update scripts to use decoding strategy

Signed-off-by: smajumdar <[email protected]>

* Add tests and docs

Signed-off-by: smajumdar <[email protected]>

* Add tests and docs

Signed-off-by: smajumdar <[email protected]>

* Fix speaker decoder timestamps

Signed-off-by: smajumdar <[email protected]>

* Fix speaker decoder timestamps

Signed-off-by: smajumdar <[email protected]>

* Fix decoding of ctc models

Signed-off-by: smajumdar <[email protected]>

* Address reviewer comments

Signed-off-by: smajumdar <[email protected]>

* Address reviewer comments

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Option to disable mp in VAD via num_workers=1 (#4317)

* Option to disable mp in VAD via num_workers=1

In certain environments python multiprocessing can deadlock. This adds a convenient version to disable by setting num_workers to 1.

Signed-off-by: Georg Kucsko <[email protected]>

* add none handling

Signed-off-by: Georg Kucsko <[email protected]>

* additional none handling

Signed-off-by: Georg Kucsko <[email protected]>

Co-authored-by: fayejf <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* remove redundant bias expand (#4382)

* remove redundant bias expand

Signed-off-by: Xiaowei Ren <[email protected]>

* delete redundant code

Signed-off-by: Xiaowei Ren <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Add option for specifying wandb save_dir from config (#4379)

* give option to user to specify wandb save dir via config

Signed-off-by: Shantanu Acharya <[email protected]>

* create save_dir directory for wandb logger if not exists

Signed-off-by: Shantanu Acharya <[email protected]>

* update save_dir get method with a default value

Signed-off-by: Shantanu Acharya <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Quick wav2vec fix. In-place operation adding convolutional positions to encoder was overwriting leaf history. Wasn't caught on previous torch versions. (#4383)

Signed-off-by: tbartley94 <[email protected]>

Co-authored-by: tbartley94 <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. (#4388)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Hainan Xu <[email protected]>

* Merge r1.10.0 main (#4398)

* update branch

Signed-off-by: ericharper <[email protected]>

* Set headscale false (#4364)

Signed-off-by: MaximumEntropy <[email protected]>

* Add wandb as dependency (#4365)

Signed-off-by: smajumdar <[email protected]>

* Raise trainer error (#4356)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Micha Livne <[email protected]>

* Set headscale false (#4364) (#4366)

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: smajumdar <[email protected]>

* Finetuning changes for BART (#4003)

* Temp

Signed-off-by: MaximumEntropy <[email protected]>

* Checkpoint converter to nemo for bart

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Micha Livne <[email protected]>

* Make position embedding expansion specific to a batch to avoid checkpoint size mismatches (#4357)

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Fix logging warning

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Micha Livne <[email protected]>

* Fix electronic bug, new time ITN rule (#4355)

* fix electronic bug

Signed-off-by: ekmb <[email protected]>

* add new itn time rule

Signed-off-by: ekmb <[email protected]>

* revert domain changes

Signed-off-by: ekmb <[email protected]>

* remove repetition

Signed-off-by: ekmb <[email protected]>

* Correct support for dataclasses in default module dim (#4372)

* Correct support for dataclasses in default module dim

Signed-off-by: smajumdar <[email protected]>

* Fix path for save of results

Signed-off-by: smajumdar <[email protected]>

* fix pad id bug (#4377)

Signed-off-by: Yi Dong <[email protected]>

* Question answering bug fix (#4381)

* refactor dialogue state tracking for modelling/dataset interoperability

Signed-off-by: Zhilin Wang <[email protected]>

* fix style changes

Signed-off-by: Zhilin Wang <[email protected]>

* fix typo

Signed-off-by: Zhilin Wang <[email protected]>

* fix style raised by lgtm

Signed-off-by: Zhilin Wang <[email protected]>

* fix style formatting

Signed-off-by: Zhilin Wang <[email protected]>

* update template to include description of intent

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* changes based on requests in review

Signed-off-by: Zhilin Wang <[email protected]>

* add compatibility with assistant dataset

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins

Signed-off-by: Zhilin Wang <[email protected]>

* remove dialogue_state_tracking

Signed-off-by: Zhilin Wang <[email protected]>

* update huggingface utils for dialogue

Signed-off-by: Zhilin Wang <[email protected]>

* rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

Signed-off-by: Zhilin Wang <[email protected]>

* fix style

Signed-off-by: Zhilin Wang <[email protected]>

* style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkinsfile for SGDGEN

Signed-off-by: Zhilin Wang <[email protected]>

* fix typo

Signed-off-by: Zhilin Wang <[email protected]>

* add docstrings for assistant data processsor

Signed-off-by: Zhilin Wang <[email protected]>

* update Jenkins for SGDGEN local checkpoint

Signed-off-by: Zhilin Wang <[email protected]>

* update style

Signed-off-by: Zhilin Wang <[email protected]>

* use local vocab file for Jenkinsfile

Signed-off-by: Zhilin Wang <[email protected]>

* patch for Jenkins CI using local file

Signed-off-by: Zhilin Wang <[email protected]>

* add slot filling prediction and metrics

Signed-off-by: Zhilin Wang <[email protected]>

* remove unused code

Signed-off-by: Zhilin Wang <[email protected]>

* style fix

…
  • Loading branch information
Show file tree
Hide file tree
Showing 5 changed files with 671 additions and 3 deletions.
2 changes: 1 addition & 1 deletion nemo/collections/asr/modules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,5 @@
from nemo.collections.asr.modules.lstm_decoder import LSTMDecoder
from nemo.collections.asr.modules.msdd_diarizer import MSDD_module
from nemo.collections.asr.modules.rnn_encoder import RNNEncoder
from nemo.collections.asr.modules.rnnt import RNNTDecoder, RNNTDecoderJointSSL, RNNTJoint
from nemo.collections.asr.modules.rnnt import RNNTDecoder, RNNTDecoderJointSSL, RNNTJoint, StatelessTransducerDecoder
from nemo.collections.asr.modules.squeezeformer_encoder import SqueezeformerEncoder, SqueezeformerEncoderAdapter
Loading

0 comments on commit c0bfa6f

Please sign in to comment.