Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lhotse download librispeech command is error #20

Closed
shanguanma opened this issue Aug 23, 2021 · 16 comments
Closed

lhotse download librispeech command is error #20

shanguanma opened this issue Aug 23, 2021 · 16 comments

Comments

@shanguanma
Copy link
Contributor

I install k2_fsa and lhotse via the below commands:

1. install k2-fsa
$ conda create -n k2-fsa20210823 python=3.8
$ conda activate k2-fsa20210823
$ conda install -c k2-fsa -c pytorch -c conda-forge k2 python=3.8 cudatoolkit=11.1 pytorch=1.8.1

2. install hotse
$ pip install git+https://github.com/lhotse-speech/lhotse
3.  install icefall
$ git clone https://github.com/k2-fsa/icefall.git
$ cd icafall
$ pip install -r requirements.txt
$  export PYTHONPATH=/home/maduo/w2021/k2-fsa_20210823/icefall:$PYTHONPATHON
                                                                               

When I run ./prepared.sh --stage 0 --stop_stage 0, the error is as follows:

(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage 0 --stop_stage 0
2021-08-23 14:32:41 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-23 14:32:41 (prepare.sh:66:main) stage 0: Download data
./prepare.sh: /home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse: python: bad interpreter: No such file or directory

I found the https://github.com/lhotse-speech/lhotse/blob/2a1410bfd08bc5117d67d09f470fde14b8231521/lhotse/bin/lhotse#L1
The python interpreter is ok. I don't know where is it wrong?

@csukuangfj
Copy link
Collaborator

What's the output of the following command?

head -n 5 /home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse

If the first line of the output is #!python, you would probably need to manually modify it to #!/usr/bin/env python3

@shanguanma
Copy link
Contributor Author

Thanks for your reply, you are right,

(base) maduo@pd:~$ head -n 5 /home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse
#!python
"""
Use this script like:

$ lhotse --help

Yes, I manually modified it.

@csukuangfj
Copy link
Collaborator

@pzelasko
Looks like the shebang problem is reproducible.

@shanguanma shanguanma reopened this Aug 23, 2021
@shanguanma
Copy link
Contributor Author

I occurs another error in the command lhotse download librispeech , the errror is as follows:

(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage -0 --stop_stage 3
2021-08-23 17:42:44 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-23 17:42:44 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: dev-other
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: test-clean
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: test-other
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-23 17:42:45,494 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-23 17:42:45,494 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:57<00:23, 11.54s/it]

Aborted!

@danpovey
Copy link
Collaborator

Likely a system problem, like full disk, OOM killer, something like that.
If you rerun that command it should hopefully continue from the stage that failed.

@csukuangfj
Copy link
Collaborator

If you rerun that command it should hopefully continue from the stage that failed.

Yes, rerun

./prepare.sh --stage -0 --stop_stage 3

will continue to download train-clean-360.

@shanguanma
Copy link
Contributor Author

Thanks for your reply, I follow your suggestion, but it is still fail. the the below is as follows:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        3.6T  1.4T  2.1T  41% /mnt/4T
(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ rm -rf /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360
(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage -0 --stop_stage 3
2021-08-23 18:21:09 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-23 18:21:09 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: dev-other
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: test-clean
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: test-other
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-23 18:21:10,792 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-23 18:21:10,792 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:54<00:21, 10.89s/it]

Aborted!

@danpovey
Copy link
Collaborator

@pzelasko do you have any ideas?
This is really a Lhotse issue.
Perhaps we need a better way to either continue partial downloads, or better debug why downloads are killed.

@pzelasko
Copy link
Collaborator

OK I will look into it. I am not sure what's the reason, maybe there is a timeout somewhere. Will check.

I'll try to take care of the shebang issue first though.

@pzelasko
Copy link
Collaborator

Was there some exception stack trace?

It looks to me like it didn't even start downloading train-clean-360, otherwise there would have been a partial download message like this:

image

@shanguanma could you try wrapping this whole loop:

https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/librispeech.py#L55

with try/except like the following:

try:
  for ...
except Exception as e:
  print(type(e))
  print(e)
  print(locals())
  raise

and see what comes out? I am also not 100% sure that it is a Lhotse error, your job might be getting killed etc., but we can try to debug it.

@shanguanma
Copy link
Contributor Author

@pzelasko ,I follow your suggestion, the errors is as follows:

(k2-fsa20210823) maduo@pd:~/w2021/k2-fsa_20210823/icefall/egs/librispeech/ASR$ ./prepare.sh --stage 0 --stop_stage 0
2021-08-24 19:20:54 (prepare.sh:57:main) dl_dir: /mnt/4T/md/icefall_recipes/librispeech/download
2021-08-24 19:20:54 (prepare.sh:66:main) stage 0: Download data
Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: dev-other
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: test-clean
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: test-other
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-24 19:20:55,222 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-24 19:20:55,222 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading LibriSpeech parts:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:57<00:23, 11.59s/it]<class 'EOFError'>
Compressed file ended before the end-of-stream marker was reached
{'target_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download'), 'dataset_parts': ('dev-clean', 'dev-other', 'test-clean', 'test-other', 'train-clean-100', 'train-clean-360', 'train-other-500'), 'force_download': False, 'alignments': False, 'base_url': 'http://www.openslr.org/resources', 'alignments_url': 'https://drive.google.com/uc?id=1WYfgr31T-PPwMcxuAq09XZfHQO5Mw8fE', 'part': 'train-clean-360', 'url': 'http://www.openslr.org/resources/12', 'part_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360'), 'completed_detector': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-360/.completed'), 'tar_name': 'train-clean-360.tar.gz', 'tar_path': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/train-clean-360.tar.gz'), 'tar': <tarfile.TarFile object at 0x7f5fb3d76f70>, 'e': EOFError('Compressed file ended before the end-of-stream marker was reached')}

Aborted!

@pzelasko
Copy link
Collaborator

You have a partially downloaded archive. If you remove train-clean-360.tar.gz and re-run it should work. I will change the code to handle this correctly.

@shanguanma
Copy link
Contributor Author

@pzelasko ,I follow your suggestion( I remove train-clean-360.tar.gz and re-run it)the errors is as follows:

Downloading LibriSpeech parts:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]2021-08-24 19:31:09,098 INFO [librispeech.py:56] Processing split: dev-clean
2021-08-24 19:31:09,098 INFO [librispeech.py:69] Skipping dev-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-clean/.completed exists.
2021-08-24 19:31:09,098 INFO [librispeech.py:56] Processing split: dev-other
2021-08-24 19:31:09,098 INFO [librispeech.py:69] Skipping dev-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/dev-other/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: test-clean
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping test-clean because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-clean/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: test-other
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping test-other because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/test-other/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: train-clean-100
2021-08-24 19:31:09,099 INFO [librispeech.py:69] Skipping train-clean-100 because /mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-clean-100/.completed exists.
2021-08-24 19:31:09,099 INFO [librispeech.py:56] Processing split: train-clean-360
Downloading train-clean-360.tar.gz: 21.5GB [1:32:18, 4.16MB/s]                                                                                                                                                   Downloading LibriSpeech parts:  86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 6/7 [1:37:25<16:14, 974.31s/it]2021-08-24 21:08:34,933 INFO [librispeech.py:56] Processing split: train-other-500
Downloading train-other-500.tar.gz:  80%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊                         | 22.9G/28.5G [1:50:48<26:51, 3.70MB/s]Downloading LibriSpeech parts:  86%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                   | 6/7 [3:28:14<34:42, 2082.37s/it]<class 'urllib.error.ContentTooShortError'>
<urlopen error retrieval incomplete: got only 24626188545 out of 30593501606 bytes>
{'target_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download'), 'dataset_parts': ('dev-clean', 'dev-other', 'test-clean', 'test-other', 'train-clean-100', 'train-clean-360', 'train-other-500'), 'force_download': False, 'alignments': False, 'base_url': 'http://www.openslr.org/resources', 'alignments_url': 'https://drive.google.com/uc?id=1WYfgr31T-PPwMcxuAq09XZfHQO5Mw8fE', 'part': 'train-other-500', 'url': 'http://www.openslr.org/resources/12', 'part_dir': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-other-500'), 'completed_detector': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/LibriSpeech/train-other-500/.completed'), 'tar_name': 'train-other-500.tar.gz', 'tar_path': PosixPath('/mnt/4T/md/icefall_recipes/librispeech/download/train-other-500.tar.gz'), 'tar': <tarfile.TarFile object at 0x7f7bdff7edf0>, 'e': ContentTooShortError('retrieval incomplete: got only 24626188545 out of 30593501606 bytes')}
Traceback (most recent call last):
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/bin/lhotse", line 24, in <module>
    cli()
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/bin/modes/recipes/librispeech.py", line 32, in librispeech
    download_librispeech(target_dir, dataset_parts='librispeech' if full else 'mini_librispeech')
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/recipes/librispeech.py", line 75, in download_librispeech
    urlretrieve_progress(f'{url}/{tar_name}', filename=tar_path, desc=f'Downloading {tar_name}')
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/site-packages/lhotse/utils.py", line 335, in urlretrieve_progress
    return urlretrieve(url=url, filename=filename, reporthook=reporthook, data=data)
  File "/home/maduo/miniconda3/envs/k2-fsa20210823/lib/python3.8/urllib/request.py", line 286, in urlretrieve
    raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 24626188545 out of 30593501606 bytes>

@pkufool
Copy link
Collaborator

pkufool commented Aug 25, 2021

The download error occurred to me too. My download of train-clean-360.tar.gz was kill twice at the same position (71%).
image

And this is the error of train-other-500.tar.gz.
image

I guess it is the problem of weak network connection, it takes too long to download these data, may be some unstable connection during these time.

@danpovey
Copy link
Collaborator

Perhaps it would be better to use another tool for downloading, that allows continuing? E.g. wget?
Maybe that urlretrieve is only good for short files.

@pzelasko
Copy link
Collaborator

pzelasko commented Aug 25, 2021

I think it’s a server side timeout… for now please use wget for these two files like Dan suggested, I might not have enough time to improve the downloading in Lhotse right away, but I definitely want it to “just work” and will work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants