Refactor asr_datamodule. #15

csukuangfj · 2021-08-20T15:49:59Z

It throws the following error:

@pzelasko Could you please have a look at it? Thanks

(I am using the latest lhotse, with the commit d24e6faa6f26a5034cebf1d97dc1bd933f285a03)

The refactoring is based on the asr datamodule from the gigaspeech recipe in snowfall.

2021-08-21 00:02:42,414 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-21 00:02:42,552 INFO [decode.py:336] device: cuda:0
2021-08-21 00:02:55,867 INFO [checkpoint.py:75] Loading checkpoint from tdnn_lstm_ctc/exp/epoch-19.pt
/ceph-fj/open-source/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames o
r max_cuts constraints - we'll return it anyway. Consider increasing max_frames/max_cuts.
  warnings.warn(
ERROR:root:Error while extracting the features for cut with ID 8224-274384-0008-1657-0 -- details:
MonoCut(id='8224-274384-0008-1657-0', start=0, duration=13.42, channel=0, supervisions=[SupervisionSegment(id='8224-274384-0008', recording_id='8224-
274384-0008', start=0.0, duration=13.42, channel=0, text='THE GOOD NATURED AUDIENCE IN PITY TO FALLEN MAJESTY SHOWED FOR ONCE GREATER DEFERENCE TO TH
E KING THAN TO THE MINISTER AND SUNG THE PSALM WHICH THE FORMER HAD CALLED FOR', language='English', speaker='8224', gender=None, custom=None, alignm
ent=None)], features=Features(type='fbank', num_frames=1342, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=13.42, storage
_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='a83019d1-3639-47a3-8790-ae2d82cde42e', recording_id=None, ch
annels=0), recording=Recording(id='8224-274384-0008', sources=[AudioSource(type='file', channels=[0], source='data/LibriSpeech/test-clean/8224/274384
/8224-274384-0008.flac')], sampling_rate=16000, num_samples=214720, duration=13.42, transforms=None))
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 102, in _get_item
    return _GLOBAL_DATASET_CACHE[cut_ids]
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/speech_recognition.py", line 105, in __getitem__
    inputs, _ = self.input_strategy(cuts)
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/input_strategies.py", line 244, in __call__
    features = self.extractor.extract(samples, cuts[idx].sampling_rate)
AttributeError: 'PrecomputedFeatures' object has no attribute 'extract'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./tdnn_lstm_ctc/decode.py", line 428, in <module>
    main()
  File "/ceph-fj/fangjun/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "./tdnn_lstm_ctc/decode.py", line 411, in main
    results_dict = decode_dataset(
  File "./tdnn_lstm_ctc/decode.py", line 246, in decode_dataset
    for batch_idx, batch in enumerate(dl):
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 85, in __next__
    return self._retrieve_one()
  File "/ceph-fj/open-source/lhotse/lhotse/dataset/dataloading.py", line 79, in _retrieve_one
    return self._futures.popleft().result()
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/root/fangjun/open-source/pyenv/versions/3.8.6/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
AttributeError: 'PrecomputedFeatures' object has no attribute 'extract'
ERROR:root:Error while extracting the features for cut with ID 61-70970-0038-2373-0 -- details:
MonoCut(id='61-70970-0038-2373-0', start=0, duration=10.4, channel=0, supervisions=[SupervisionSegment(id='61-70970-0038', recording_id='61-70970-003
8', start=0.0, duration=10.4, channel=0, text='THE OLD SERVANT TOLD HIM QUIETLY AS THEY CREPT BACK TO GAMEWELL THAT THIS PASSAGE WAY LED FROM THE HUT
 IN THE PLEASANCE TO SHERWOOD AND THAT GEOFFREY FOR THE TIME WAS HIDING WITH THE OUTLAWS IN THE FOREST', language='English', speaker='61', gender=Non
e, custom=None, alignment=None)], features=Features(type='fbank', num_frames=1040, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, d
uration=10.4, storage_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='f6d871ec-b1e5-4a5b-bd1f-0b95f7972946',
recording_id=None, channels=0), recording=Recording(id='61-70970-0038', sources=[AudioSource(type='file', channels=[0], source='data/LibriSpeech/test
-clean/61/70970/61-70970-0038.flac')], sampling_rate=16000, num_samples=166400, duration=10.4, transforms=None))
ERROR:root:Error while extracting the features for cut with ID 1995-1837-0024-1526-0 -- details:
MonoCut(id='1995-1837-0024-1526-0', start=0, duration=5.385, channel=0, supervisions=[SupervisionSegment(id='1995-1837-0024', recording_id='1995-1837
-0024', start=0.0, duration=5.385, channel=0, text='FOR A WHILE SHE LAY IN HER CHAIR IN HAPPY DREAMY PLEASURE AT SUN AND BIRD AND TREE', language='En
glish', speaker='1995', gender=None, custom=None, alignment=None)], features=Features(type='fbank', num_frames=539, num_features=80, frame_shift=0.01
, sampling_rate=16000, start=0, duration=5.385, storage_type='lilcom_hdf5', storage_path='data/fbank/feats_test-clean/feats-0.h5', storage_key='0811f
f5e-48ed-467d-873c-6a9f742472e0', recording_id=None, channels=0), recording=Recording(id='1995-1837-0024', sources=[AudioSource(type='file', channels
=[0], source='data/LibriSpeech/test-clean/1995/1837/1995-1837-0024.flac')], sampling_rate=16000, num_samples=86160, duration=5.385, transforms=None))
ERROR:root:Error while extracting the features for cut with ID 121-127105-0026-2191-0 -- details:

... ...

pzelasko · 2021-08-20T16:51:35Z

egs/librispeech/ASR/conformer_ctc/asr_datamodule.py

-                    Fbank(FbankConfig(num_mel_bins=80))
+                    Fbank(FbankConfig(num_mel_bins=80), num_workers=4)
+                    if self.args.on_the_fly_feats
+                    else PrecomputedFeatures()
                ),


I think this closing parenthesis has to be moved 3 lines up, so the code looks like:

input_strategy=OnTheFlyFeatures( Fbank(FbankConfig(num_mel_bins=80), num_workers=4) ) if self.args.on_the_fly_feats else PrecomputedFeatures(), return_cuts=...

Currently when args.on_the_fly_feats = False, it tries to use OnTheFlyFeatures(PrecomputedFeatures()) which is an error.

Thanks!
In that case, I think we should also change snowfall to fix that as this block of code is from snowfall.
See
https://github.com/k2-fsa/snowfall/blob/1f79957e9716c3f980c151df5b1d77bc4bb7ce78/egs/gigaspeech/asr/simple_v1/asr_datamodule.py#L337-L344

test = K2SpeechRecognitionDataset( input_strategy=( OnTheFlyFeatures(Fbank(FbankConfig(num_mel_bins=80)), num_workers=8) if self.args.on_the_fly_feats else PrecomputedFeatures() ), return_cuts=self.args.return_cuts, )

yes, you’re right

pzelasko · 2021-08-20T16:54:30Z

egs/librispeech/ASR/conformer_ctc/asr_datamodule.py

+        #      persistent_workers=False,
+        #  )
+
+        train_dl = LhotseDataLoader(


I would say be careful with LhotseDataLoader -- it is experimental and I'm hoping to avoid needing to use it in the future. It overcomes some I/O issues with GigaSpeech, but for LibriSpeech you shouldn't see any difference in perf with a regular DataLoader.

The downside of LhotseDataLoader is that it doesn't have the elaborate shutdown mechanisms of PyTorch DataLoader and might leave your script running after the training has finished (i.e., everything runs ok, but the script doesn't exit by itself..).

pzelasko · 2021-08-20T16:55:45Z

egs/librispeech/ASR/conformer_ctc/asr_datamodule.py

                input_strategy=OnTheFlyFeatures(
-                    Fbank(FbankConfig(num_mel_bins=80))
+                    Fbank(FbankConfig(num_mel_bins=80), num_workers=4)


For LibriSpeech, remove the num_workers argument from OnTheFlyFeatures -- it will attempt to spawn extra processes that are not needed for LibriSpeech (they help with GigaSpeech which has long OPUS recordings)

pzelasko · 2021-08-20T16:58:37Z

egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py

@@ -0,0 +1,362 @@
+import argparse


Not sure if it makes sense -- but maybe it's sufficient to have a single copy of this script one level of directories up, and if any recipe requires non-standard processing, it would make it's own copy at the "current" directory level?

How about putting a symlink to other model directories to this file?
I was thinking that each model is as self-contained as possible.
If someone wants to modify this file, he/she can replace the symlink with a copy of this file.

yeah that makes sense to me

pzelasko · 2021-08-20T16:59:13Z

egs/librispeech/ASR/tdnn_lstm_ctc/decode.py

+    try:
+        num_batches = len(dl)
+    except TypeError:
+        num_batches = None


num_batches = '?' which will display nicer

and batch_str below won't need an extra if

csukuangfj · 2021-08-21T00:28:30Z

I've fixed all the comments. @pzelasko Thanks and please accept the invitation for this repo.

Ready to merge.

WIP: Refactor asr_datamodule.

dbc76db

pzelasko reviewed Aug 20, 2021

View reviewed changes

csukuangfj added 2 commits August 21, 2021 08:15

Fixes after review.

8a8bf67

Minor fixes.

ed16585

csukuangfj changed the title ~~WIP: Refactor asr_datamodule.~~ Refactor asr_datamodule. Aug 21, 2021

csukuangfj merged commit 8469f9a into k2-fsa:master Aug 21, 2021

csukuangfj deleted the refactor-asr-datamodule branch August 21, 2021 01:53

Lzhang-hub mentioned this pull request Oct 20, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this pull request Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

ahazned mentioned this pull request Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor asr_datamodule. #15

Refactor asr_datamodule. #15

csukuangfj commented Aug 20, 2021 •

edited

Loading

pzelasko Aug 20, 2021

csukuangfj Aug 20, 2021

pzelasko Aug 20, 2021

pzelasko Aug 20, 2021

pzelasko Aug 20, 2021

pzelasko Aug 20, 2021

csukuangfj Aug 20, 2021

pzelasko Aug 20, 2021

pzelasko Aug 20, 2021

pzelasko Aug 20, 2021

csukuangfj commented Aug 21, 2021

Refactor asr_datamodule. #15

Refactor asr_datamodule. #15

Conversation

csukuangfj commented Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Aug 21, 2021

csukuangfj commented Aug 20, 2021 •

edited

Loading