Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for the yes_no dataset. #16

Merged
merged 11 commits into from
Aug 23, 2021
Merged

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Aug 21, 2021

There are 60 sound files in the dataset. 30 sound files are used for training and the other 30 files are used for testing.

The decoding log is below:

$ ./tdnn/decode.py --epoch 49
2021-08-21 17:20:27,047 INFO [decode.py:321] Decoding started
2021-08-21 17:20:27,047 INFO [decode.py:322] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 23, 'subsampling_factor': 1, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'method': '1best', 'num_paths': 30, 'epoch': 49, 'avg': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-21 17:20:27,048 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-21 17:20:27,109 INFO [decode.py:331] device: cuda:0
2021-08-21 17:20:31,515 INFO [decode.py:390] averaging ['tdnn/exp/epoch-35.pt', 'tdnn/exp/epoch-36.pt', 'tdnn/exp/epoch-37.pt', 'tdnn/exp/epoch-38.pt', 'tdnn/exp/epoch-39.pt', 'tdnn/exp/epoch-40.pt', 'tdnn/exp/epoch-41.pt', 'tdnn/exp/epoch-42.pt', 'tdnn/exp/epoch-43.pt', 'tdnn/exp/epoch-44.pt', 'tdnn/exp/epoch-45.pt', 'tdnn/exp/epoch-46.pt', 'tdnn/exp/epoch-47.pt', 'tdnn/exp/epoch-48.pt', 'tdnn/exp/epoch-49.pt']
2021-08-21 17:20:31,540 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-21 17:20:31,540 INFO [asr_datamodule.py:243] About to get test cuts
2021-08-21 17:20:31,846 INFO [decode.py:270] batch 0/8, cuts processed until now is 4
2021-08-21 17:20:33,255 INFO [decode.py:285] The transcripts are stored in tdnn/exp/recogs-test-no_rescore.txt
2021-08-21 17:20:33,256 INFO [utils.py:300] [test-no_rescore] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
2021-08-21 17:20:33,258 INFO [decode.py:294] Wrote detailed error stats to tdnn/exp/errs-test-no_rescore.txt
2021-08-21 17:20:33,258 INFO [decode.py:308]
For test, WER of different settings are:
no_rescore      0.42    best for test

2021-08-21 17:20:33,258 INFO [decode.py:418] Done!

You see there is only 1 deletion error.


The dataset is so small that it can run on the CPU.

It is useful for education and demonstration purposes as it involves almost all concepts used in the training and decoding, i.e.,

  • data preparation
  • lexicon preparation
  • LM preparation
  • HLG construction
  • CTC training
  • 1best decoding

(It does not contain LM rescoring)


Requires lhotse-speech/lhotse#380

--

TODOs:

  • Refactor the training and decoding code, remove those that are not needed
  • Add GitHub actions to run it
  • Use a colab notebook to run it See Open In Colab
  • Support inferencing with a pre-trained model

@csukuangfj
Copy link
Collaborator Author

The code for selecting the training set and test set can be found in
lhotse-speech/lhotse#380

See https://github.com/lhotse-speech/lhotse/blob/ba534a08fc17196f4caf27433587a54779991826/lhotse/recipes/yesno.py#L138-L143

    wave_files = list(corpus_dir.glob("*.wav"))
    assert len(wave_files) == 60

    wave_files.sort()
    train_set = wave_files[::2]
    test_set = wave_files[1::2]

    assert len(train_set) == 30
    assert len(test_set) == 30

@danpovey
Copy link
Collaborator

Cool!


first_token_disambig_id = lexicon.token_table["#0"]
first_word_disambig_id = lexicon.word_table["#0"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make the following k2 operations run on GPU if there are devices available?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the yesno dataset, the graphs are tiny. It's ok to run them on CPU.

For the librispeech dataset, I think it's worthwhile to have some benchmarks. If GPU is faster, we can switch to it.


"""
This file computes fbank features of the yesno dataset.
Its looks for manifests in the directory data/manifests.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its -> It ?

shuffle=self.args.shuffle,
num_buckets=self.args.num_buckets,
bucket_method="equal_duration",
drop_last=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need make these two arguments configurable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will make it configurable.

@csukuangfj
Copy link
Collaborator Author

I just wrote a Colab notebook to run the yesno recipe, with CPU.

The training time for 50 epochs is within 2 minutes (with CPU).

See Open In Colab

You will see the following in the above Colab notebook:

  • Environment setup (Install torch, torchaudio, k2, lhotse, icefall)
  • Data preparation
  • Training
  • Decoding

Part of the training log is given below:

2021-08-22 15:18:20,422 INFO [train.py:460] Training started
2021-08-22 15:18:20,423 INFO [train.py:461] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.001, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'num_epochs': 50, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-22 15:18:22,039 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-22 15:18:22,187 INFO [asr_datamodule.py:132] About to get train cuts
2021-08-22 15:18:22,188 INFO [asr_datamodule.py:237] About to get train cuts
2021-08-22 15:18:22,191 INFO [asr_datamodule.py:135] About to create train dataset
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:197] Using SingleCutSampler.
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:203] About to create train dataloader
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-22 15:18:22,192 INFO [asr_datamodule.py:243] About to get test cuts
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
2021-08-22 15:18:22,584 INFO [train.py:412] Epoch 0, batch 0, batch avg loss 1.0879, total avg loss: 1.0879, batch size: 4
2021-08-22 15:18:23,268 INFO [train.py:412] Epoch 0, batch 10, batch avg loss 0.5386, total avg loss: 0.7594, batch size: 4
2021-08-22 15:18:23,728 INFO [train.py:428] Epoch 0, valid loss 0.9149, best valid loss: 0.9149 best valid epoch: 0
2021-08-22 15:18:24,213 INFO [train.py:412] Epoch 0, batch 20, batch avg loss 0.3465, total avg loss: 0.6211, batch size: 3
2021-08-22 15:18:24,614 INFO [train.py:428] Epoch 0, valid loss 0.3521, best valid loss: 0.3521 best valid epoch: 0
2021-08-22 15:18:24,628 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-0.pt
2021-08-22 15:18:24,804 INFO [train.py:412] Epoch 1, batch 0, batch avg loss 0.4360, total avg loss: 0.4360, batch size: 5
2021-08-22 15:18:25,460 INFO [train.py:412] Epoch 1, batch 10, batch avg loss 0.2444, total avg loss: 0.3159, batch size: 5
2021-08-22 15:18:25,756 INFO [train.py:428] Epoch 1, valid loss 0.1264, best valid loss: 0.1264 best valid epoch: 1
2021-08-22 15:18:26,288 INFO [train.py:412] Epoch 1, batch 20, batch avg loss 0.2659, total avg loss: 0.2966, batch size: 3
2021-08-22 15:18:26,617 INFO [train.py:428] Epoch 1, valid loss 0.1510, best valid loss: 0.1264 best valid epoch: 1
2021-08-22 15:18:26,635 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-1.pt
2021-08-22 15:18:26,796 INFO [train.py:412] Epoch 2, batch 0, batch avg loss 0.1710, total avg loss: 0.1710, batch size: 4
2021-08-22 15:18:27,411 INFO [train.py:412] Epoch 2, batch 10, batch avg loss 0.2394, total avg loss: 0.2257, batch size: 5
2021-08-22 15:18:27,650 INFO [train.py:428] Epoch 2, valid loss 0.1196, best valid loss: 0.1196 best valid epoch: 2
2021-08-22 15:18:28,214 INFO [train.py:412] Epoch 2, batch 20, batch avg loss 0.2267, total avg loss: 0.2257, batch size: 3
2021-08-22 15:18:28,482 INFO [train.py:428] Epoch 2, valid loss 0.0662, best valid loss: 0.0662 best valid epoch: 2
2021-08-22 15:18:28,496 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-2.pt

...

2021-08-22 15:20:03,495 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-47.pt
2021-08-22 15:20:03,656 INFO [train.py:412] Epoch 48, batch 0, batch avg loss 0.0124, total avg loss: 0.0124, batch size: 4
2021-08-22 15:20:04,250 INFO [train.py:412] Epoch 48, batch 10, batch avg loss 0.0127, total avg loss: 0.0174, batch size: 4
2021-08-22 15:20:04,547 INFO [train.py:428] Epoch 48, valid loss 0.0108, best valid loss: 0.0108 best valid epoch: 48
2021-08-22 15:20:05,095 INFO [train.py:412] Epoch 48, batch 20, batch avg loss 0.0191, total avg loss: 0.0188, batch size: 4
2021-08-22 15:20:05,432 INFO [train.py:428] Epoch 48, valid loss 0.0106, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:05,487 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-48.pt
2021-08-22 15:20:05,686 INFO [train.py:412] Epoch 49, batch 0, batch avg loss 0.0168, total avg loss: 0.0168, batch size: 4
2021-08-22 15:20:06,361 INFO [train.py:412] Epoch 49, batch 10, batch avg loss 0.0193, total avg loss: 0.0228, batch size: 4
2021-08-22 15:20:06,733 INFO [train.py:428] Epoch 49, valid loss 0.0113, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:07,312 INFO [train.py:412] Epoch 49, batch 20, batch avg loss 0.0193, total avg loss: 0.0206, batch size: 3
2021-08-22 15:20:07,680 INFO [train.py:428] Epoch 49, valid loss 0.0109, best valid loss: 0.0106 best valid epoch: 48
2021-08-22 15:20:07,707 INFO [checkpoint.py:45] Saving checkpoint to tdnn/exp/epoch-49.pt
2021-08-22 15:20:07,710 INFO [train.py:532] Done!

The decoding log is:

2021-08-22 15:21:07,711 INFO [decode.py:261] Decoding started
2021-08-22 15:21:07,711 INFO [decode.py:262] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 23, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 49, 'avg': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2}
2021-08-22 15:21:07,712 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-22 15:21:07,727 INFO [decode.py:271] device: cpu
2021-08-22 15:21:07,731 INFO [decode.py:291] averaging ['tdnn/exp/epoch-35.pt', 'tdnn/exp/epoch-36.pt', 'tdnn/exp/epoch-37.pt', 'tdnn/exp/epoch-38.pt', 'tdnn/exp/epoch-39.pt', 'tdnn/exp/epoch-40.pt', 'tdnn/exp/epoch-41.pt', 'tdnn/exp/epoch-42.pt', 'tdnn/exp/epoch-43.pt', 'tdnn/exp/epoch-44.pt', 'tdnn/exp/epoch-45.pt', 'tdnn/exp/epoch-46.pt', 'tdnn/exp/epoch-47.pt', 'tdnn/exp/epoch-48.pt', 'tdnn/exp/epoch-49.pt']
/content/icefall/icefall/checkpoint.py:129: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:450.)
  avg[k] //= n
2021-08-22 15:21:07,755 INFO [asr_datamodule.py:216] About to get test cuts
2021-08-22 15:21:07,755 INFO [asr_datamodule.py:243] About to get test cuts
2021-08-22 15:21:07,891 INFO [decode.py:203] batch 0/8, cuts processed until now is 4
2021-08-22 15:21:08,111 INFO [decode.py:240] The transcripts are stored in tdnn/exp/recogs-test_set.txt
2021-08-22 15:21:08,112 INFO [utils.py:301] [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
2021-08-22 15:21:08,113 INFO [decode.py:248] Wrote detailed error stats to tdnn/exp/errs-test_set.txt
2021-08-22 15:21:08,113 INFO [decode.py:311] Done!

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Aug 22, 2021

@pzelasko

Could you have a look at the above Colab notebook about the installation of lhotse?

The shebang is changed from #!/usr/bin/env python3 to #!python after installation and I have
to correct it manually.


[EDITED]: If I don't, it throws the following error while running egs/yesno/ASR/prepare.sh:

2021-08-22 15:55:43 (prepare.sh:24:main) dl_dir: /content/icefall/egs/yesno/ASR/download
2021-08-22 15:55:43 (prepare.sh:27:main) stage 0: Download data
./prepare.sh: /usr/local/bin/lhotse: python: bad interpreter: No such file or directory

@pzelasko
Copy link
Collaborator

@pzelasko

Could you have a look at the above Colab notebook about the installation of lhotse?

The shebang is changed from #!/usr/bin/env python3 to #!python after installation and I have
to correct it manually.

[EDITED]: If I don't, it throws the following error while running egs/yesno/ASR/prepare.sh:

2021-08-22 15:55:43 (prepare.sh:24:main) dl_dir: /content/icefall/egs/yesno/ASR/download
2021-08-22 15:55:43 (prepare.sh:27:main) stage 0: Download data
./prepare.sh: /usr/local/bin/lhotse: python: bad interpreter: No such file or directory

Yes I’ll have a look tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants