Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming modified beam search #142

Merged

Conversation

ezerhouni
Copy link
Collaborator

This PR adds the following :

  • Modified beam search to streaming transducer

@ezerhouni ezerhouni changed the title Add streaming modified beam search [WIP] Add streaming modified beam search Sep 29, 2022
@ezerhouni
Copy link
Collaborator Author

@csukuangfj I need to test the branch (and add stuff to the CI job)

@csukuangfj
Copy link
Collaborator

@ezerhouni
Thanks!

Would you mind also picking up the following PR?

#119

The core part is almost finished.

@csukuangfj
Copy link
Collaborator

Could you also add modified_beam_search to
https://github.com/k2-fsa/sherpa/blob/master/sherpa/bin/conv_emformer_transducer_stateless2/beam_search.py

It can be added in a separate PR.

@ezerhouni
Copy link
Collaborator Author

@csukuangfj Added streaming modified beam search to conv_emformer. I will have a look for the other PR tomorrow. If I didn't push anything, feel free to ping me.

@csukuangfj
Copy link
Collaborator

@csukuangfj Added streaming modified beam search to conv_emformer. I will have a look for the other PR tomorrow. If I didn't push anything, feel free to ping me.

Thanks a lot! Could you also update the CI by changing the following lines?

You only need to add "modified_beam_search" to the list.

decoding: ["greedy_search", "fast_beam_search", "fast_beam_search_nbest", "fast_beam_search_nbest_LG"]

decoding: ["greedy_search", "fast_beam_search", "fast_beam_search_nbest", "fast_beam_search_nbest_LG"]

decoding: ["greedy_search", "fast_beam_search", "fast_beam_search_nbest", "fast_beam_search_nbest_LG"]

@ezerhouni
Copy link
Collaborator Author

@csukuangfj Yup, first I will debug the code locally and then update the CI :)
I will let you know !

@ezerhouni
Copy link
Collaborator Author

@csukuangfj When testing on CPU I am getting :
OSError: libtorch_hip.so: cannot open shared object file: No such file or directory

do you know if modified beam search needs to be on GPU ?

@csukuangfj
Copy link
Collaborator

@csukuangfj When testing on CPU I am getting : OSError: libtorch_hip.so: cannot open shared object file: No such file or directory

do you know if modified beam search needs to be on GPU ?

modified_beam_search is able to run on both CPU and GPU.


How did you install your PyTorch?

@ezerhouni
Copy link
Collaborator Author

@csukuangfj Never mind, wrong move on my side !

@ezerhouni
Copy link
Collaborator Author

@csukuangfj I tested the code for streaming_transducer but not for the conv_emformer. Somehow I get the following error when using the models from : https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05

  File "./sherpa/bin/conv_emformer_transducer_stateless2/streaming_server.py", line 239, in __init__
    self.model = RnntConvEmformerModel(nn_model_filename, device=device)
RuntimeError: Unrecognized data format
Exception raised from load at ../torch/csrc/jit/serialization/import.cpp:449 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7fa3e2f3abbe in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x60 (0x7fa3e2f15ef9 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: torch::jit::load(std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x27a (0x7fa3326e071a in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)

@ezerhouni ezerhouni changed the title [WIP] Add streaming modified beam search Add streaming modified beam search Sep 29, 2022
@csukuangfj
Copy link
Collaborator

What is the command you are using for testing?

@ezerhouni
Copy link
Collaborator Author

  ./sherpa/bin/conv_emformer_transducer_stateless2/streaming_server.py \
  --port 6006 \
  --max-batch-size 50 \
  --max-wait-ms 5 \
  --max-active-connections 500 \
  --nn-pool-size 1 \
  --decoding-method "fast_beam_search" \
  --nn-model-filename /path/to/cpu-jit-epoch-30-avg-10-torch-1.10.0.pt \
  --bpe-model-filename /path/to/bpe.model

@csukuangfj
Copy link
Collaborator

What is the output of

ls -lh /path/to/cpu-jit-epoch-30-avg-10-torch-1.10.0.pt 

Just want to make sure that you have downloaded the pretrained model using git lfs.

Also, are you using PyTorch >= 1.10.0 ?

@ezerhouni
Copy link
Collaborator Author

-rw-r--r-- 1 root root 134 Sep 29 14:27

Yes I have torch 1.12
I will dig more tomorrow, most likely a bug on my side tbh

@csukuangfj
Copy link
Collaborator

-rw-r--r-- 1 root root 134 Sep 29 14:27

The filesize of the model is only 134 bytes, which is too small.

I think you don't use

git lfs install
git clone xxxx

You need to use git lfs install to download the pretrained model as it is managed by GIT LFS.

@ezerhouni
Copy link
Collaborator Author

@csukuangfj Yup working, tested and fix the issue, everything should be good to go

@csukuangfj
Copy link
Collaborator

@ezerhouni
Thanks! Merging.

@csukuangfj csukuangfj merged commit b163428 into k2-fsa:master Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants