Skip to content

Commit

Permalink
Add streaming ASR with Emformer RNN-T (#6)
Browse files Browse the repository at this point in the history
* First working version.

* First C++ working version.

* Refactoring.

* Add streaming ASR with stateless Emformer RNN-T.

* typo fixes

* Fix comments.

* Add web interface.

* Add CI for streaming ASR.

* Minor fixes to README.

* Minor fixes.
  • Loading branch information
csukuangfj authored Jun 1, 2022
1 parent 259d2b9 commit ba865c7
Show file tree
Hide file tree
Showing 35 changed files with 2,282 additions and 58 deletions.
118 changes: 118 additions & 0 deletions .github/workflows/run-streaming-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Copyright 2022 Xiaomi Corp. (author: Fangjun Kuang)

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Run streaming ASR tests

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
run_streaming_asr_tests:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, macos-10.15]
torch: ["1.10.0"]
torchaudio: ["0.10.0"]
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install GCC 7
if: startsWith(matrix.os, 'ubuntu')
run: |
sudo apt-get install -y gcc-7 g++-7
echo "CC=/usr/bin/gcc-7" >> $GITHUB_ENV
echo "CXX=/usr/bin/g++-7" >> $GITHUB_ENV
- name: Install PyTorch ${{ matrix.torch }}
shell: bash
if: startsWith(matrix.os, 'ubuntu')
run: |
python3 -m pip install -qq --upgrade pip
python3 -m pip install -qq wheel twine typing_extensions websockets sentencepiece>=0.1.96
python3 -m pip install -qq torch==${{ matrix.torch }}+cpu torchaudio==${{ matrix.torchaudio }}+cpu numpy -f https://download.pytorch.org/whl/cpu/torch_stable.html
- name: Install PyTorch ${{ matrix.torch }}
shell: bash
if: startsWith(matrix.os, 'macos')
run: |
python3 -m pip install -qq --upgrade pip
python3 -m pip install -qq wheel twine typing_extensions websockets sentencepiece>=0.1.96
python3 -m pip install -qq torch==${{ matrix.torch }} torchaudio==${{ matrix.torchaudio }} numpy -f https://download.pytorch.org/whl/cpu/torch_stable.html
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-${{ matrix.os }}

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Install sherpa
shell: bash
run: |
python3 setup.py install
- name: Download pretrained model and test-data
shell: bash
run: |
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01
- name: Start server
shell: bash
run: |
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/exp/cpu_jit-epoch-39-avg-6-use-averaged-model-1.pt \
--bpe-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/data/lang_bpe_500/bpe.model &
echo "Sleep 10 seconds to wait for the server startup"
sleep 10
- name: Start client
shell: bash
run: |
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py \
--server-addr localhost \
--server-port 6006 \
./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/test_wavs/1221-135766-0001.wav
1 change: 0 additions & 1 deletion .github/workflows/run-test.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Copyright 2022 Xiaomi Corp. (author: Fangjun Kuang)

# See ../../LICENSE for clarification regarding multiple authors
Expand Down
48 changes: 48 additions & 0 deletions .github/workflows/style_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright (c) 2022 Xiaomi Corporation (authors: Fangjun Kuang)
#
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: style_check

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
style_check:
runs-on: ubuntu-18.04
strategy:
matrix:
python-version: [3.8]
fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- name: Check style with cpplint
shell: bash
working-directory: ${{github.workspace}}
run: ./scripts/check_style_cpplint.sh
124 changes: 110 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
## Introduction

An ASR server framework in **Python**, aiming to support both streaming
An ASR server framework in **Python**, supporting both streaming
and non-streaming recognition.

**Note**: Only non-streaming recognition is implemented at present. We
will add streaming recognition later.

CPU-bound tasks, such as neural network computation, are implemented in
C++; while IO-bound tasks, such as socket communication, are implemented
in Python.

**Caution**: We assume the model is trained using pruned stateless RNN-T
from [icefall][icefall] and it is from a directory like
`pruned_transducer_statelessX` where `X` >=2.
**Caution**: For offline ASR, we assume the model is trained using pruned
stateless RNN-T from [icefall][icefall] and it is from a directory like
`pruned_transducer_statelessX` where `X` >=2. For streaming ASR, we
assume the model is using `pruned_stateless_emformer_rnnt2`.

We provide a Colab notebook, containing how to start the server, how to
start the client, and how to decode `test-clean` of LibriSpeech.
For the offline ASR, we provide a Colab notebook, containing how to start the
server, how to start the client, and how to decode `test-clean` of LibriSpeech.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JX5Ph2onYm1ZjNP_94eGqZ-DIRMLlIca?usp=sharing)

For the streaming ASR, we provide a YouTube demo, showing you how to use it.
See <https://www.youtube.com/watch?v=z7HgaZv5W0U>

## Installation

First, you have to install `PyTorch` and `torchaudio`. PyTorch 1.10 is known
Expand Down Expand Up @@ -63,7 +64,6 @@ make -j
export PYTHONPATH=$PWD/../sherpa/python:$PWD/lib:$PYTHONPATH
```


## Usage

First, check that `sherpa` has been installed successfully:
Expand All @@ -74,7 +74,103 @@ python3 -c "import sherpa; print(sherpa.__version__)"

It should print the version of `sherpa`.

### Start the server
#### Streaming ASR with pruned stateless Emformer RNN-T

#### Start the server

To start the server, you need to first generate two files:

- (1) The torch script model file. You can use `export.py --jit=1` in
`pruned_stateless_emformer_rnnt2` from [icefall][icefall].

- (2) The BPE model file. You can find it in `data/lang_bpe_XXX/bpe.model`
in [icefall][icefall], where `XXX` is the number of BPE tokens used in
the training.

With the above two files ready, you can start the server with the
following command:

```bash
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./path/to/exp/cpu_jit.pt \
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model
```

You can use `./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py --help`
to view the help message.

We provide a pretrained model using the LibriSpeech dataset at
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01>

The following shows how to use the above pretrained model to start the server.

```bash
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/exp/cpu_jit-epoch-39-avg-6-use-averaged-model-1.pt \
--bpe-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/data/lang_bpe_500/bpe.model
```

#### Start the client

We provide two clients at present:

- (1) [./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py](./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py)
It shows how to decode a single sound file.

- (2) [./sherpa/bin/pruned_stateless_emformer_rnnt2/web](./sherpa/bin/pruned_stateless_emformer_rnnt2/web)
You can record your speech in real-time within a browser and send it to the server for recognition.

##### streaming_client.py

```bash
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py --help

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py \
--server-addr localhost \
--server-port 6006 \
./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/test_wavs/1221-135766-0001.wav
```

##### Web client

```bash
cd ./sherpa/bin/pruned_stateless_emformer_rnnt2/web
python3 -m http.server 6008
```

Then open your browser and go to `http://localhost:6008/record.html`. You will
see a UI like the following screenshot.

![web client screenshot](./pic/emformer-streaming-asr-web-client.png)

Click the button `Record`.

Now you can `speak` and you will get recognition results from the
server in real-time.

**Caution**: For the web client, we hard-code the server port to `6006`.
You can change the file [./sherpa/bin/pruned_stateless_emformer_rnnt2/web/record.js](./sherpa/bin/pruned_stateless_emformer_rnnt2/web/record.js)
to replace `6006` in it to whatever port the server is using.

**Caution**: `http://0.0.0.0:6008/record.html` or `http://127.0.0.1:6008/record.html`
won't work. You have to use `localhost`. Otherwise, you won't be able to use
your microphone in your browser since we are not using `https` which requires
a certificate.

### Offline ASR

#### Start the server

To start the server, you need to first generate two files:

Expand All @@ -97,7 +193,7 @@ sherpa/bin/offline_server.py \
--feature-extractor-pool-size 5 \
--nn-pool-size 1 \
--nn-model-filename ./path/to/exp/cpu_jit.pt \
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model &
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model
```

You can use `./sherpa/bin/offline_server.py --help` to view the help message.
Expand All @@ -122,7 +218,7 @@ sherpa/bin/offline_server.py \
--bpe-model-filename ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model
```

### Start the client
#### Start the client
After starting the server, you can use the following command to start the client:

```bash
Expand All @@ -147,7 +243,7 @@ sherpa/bin/offline_client.py \
icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13//test_wavs/1221-135766-0002.wav
```

### RTF test
#### RTF test

We provide a demo [./sherpa/bin/decode_manifest.py](./sherpa/bin/decode_manifest.py)
to decode the `test-clean` dataset from the LibriSpeech corpus.
Expand Down
Binary file added pic/emformer-streaming-asr-web-client.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ba865c7

Please sign in to comment.