Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming ASR with Emformer RNN-T #6

Merged
merged 10 commits into from
Jun 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions .github/workflows/run-streaming-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Copyright 2022 Xiaomi Corp. (author: Fangjun Kuang)

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
name: Run streaming ASR tests

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
run_streaming_asr_tests:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-18.04, macos-10.15]
torch: ["1.10.0"]
torchaudio: ["0.10.0"]
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install GCC 7
if: startsWith(matrix.os, 'ubuntu')
run: |
sudo apt-get install -y gcc-7 g++-7
echo "CC=/usr/bin/gcc-7" >> $GITHUB_ENV
echo "CXX=/usr/bin/g++-7" >> $GITHUB_ENV

- name: Install PyTorch ${{ matrix.torch }}
shell: bash
if: startsWith(matrix.os, 'ubuntu')
run: |
python3 -m pip install -qq --upgrade pip
python3 -m pip install -qq wheel twine typing_extensions websockets sentencepiece>=0.1.96
python3 -m pip install -qq torch==${{ matrix.torch }}+cpu torchaudio==${{ matrix.torchaudio }}+cpu numpy -f https://download.pytorch.org/whl/cpu/torch_stable.html

- name: Install PyTorch ${{ matrix.torch }}
shell: bash
if: startsWith(matrix.os, 'macos')
run: |
python3 -m pip install -qq --upgrade pip
python3 -m pip install -qq wheel twine typing_extensions websockets sentencepiece>=0.1.96
python3 -m pip install -qq torch==${{ matrix.torch }} torchaudio==${{ matrix.torchaudio }} numpy -f https://download.pytorch.org/whl/cpu/torch_stable.html

- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-${{ matrix.os }}

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh

- name: Install sherpa
shell: bash
run: |
python3 setup.py install

- name: Download pretrained model and test-data
shell: bash
run: |
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01

- name: Start server
shell: bash
run: |
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/exp/cpu_jit-epoch-39-avg-6-use-averaged-model-1.pt \
--bpe-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/data/lang_bpe_500/bpe.model &

echo "Sleep 10 seconds to wait for the server startup"
sleep 10

- name: Start client
shell: bash
run: |
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py \
--server-addr localhost \
--server-port 6006 \
./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/test_wavs/1221-135766-0001.wav
1 change: 0 additions & 1 deletion .github/workflows/run-test.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Copyright 2022 Xiaomi Corp. (author: Fangjun Kuang)

# See ../../LICENSE for clarification regarding multiple authors
Expand Down
48 changes: 48 additions & 0 deletions .github/workflows/style_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Copyright (c) 2022 Xiaomi Corporation (authors: Fangjun Kuang)
#
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: style_check

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
style_check:
runs-on: ubuntu-18.04
strategy:
matrix:
python-version: [3.8]
fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- name: Check style with cpplint
shell: bash
working-directory: ${{github.workspace}}
run: ./scripts/check_style_cpplint.sh
124 changes: 110 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,25 @@
## Introduction

An ASR server framework in **Python**, aiming to support both streaming
An ASR server framework in **Python**, supporting both streaming
and non-streaming recognition.

**Note**: Only non-streaming recognition is implemented at present. We
will add streaming recognition later.

CPU-bound tasks, such as neural network computation, are implemented in
C++; while IO-bound tasks, such as socket communication, are implemented
in Python.

**Caution**: We assume the model is trained using pruned stateless RNN-T
from [icefall][icefall] and it is from a directory like
`pruned_transducer_statelessX` where `X` >=2.
**Caution**: For offline ASR, we assume the model is trained using pruned
stateless RNN-T from [icefall][icefall] and it is from a directory like
`pruned_transducer_statelessX` where `X` >=2. For streaming ASR, we
assume the model is using `pruned_stateless_emformer_rnnt2`.

We provide a Colab notebook, containing how to start the server, how to
start the client, and how to decode `test-clean` of LibriSpeech.
For the offline ASR, we provide a Colab notebook, containing how to start the
server, how to start the client, and how to decode `test-clean` of LibriSpeech.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1JX5Ph2onYm1ZjNP_94eGqZ-DIRMLlIca?usp=sharing)

For the streaming ASR, we provide a YouTube demo, showing you how to use it.
See <https://www.youtube.com/watch?v=z7HgaZv5W0U>

## Installation

First, you have to install `PyTorch` and `torchaudio`. PyTorch 1.10 is known
Expand Down Expand Up @@ -63,7 +64,6 @@ make -j
export PYTHONPATH=$PWD/../sherpa/python:$PWD/lib:$PYTHONPATH
```


## Usage

First, check that `sherpa` has been installed successfully:
Expand All @@ -74,7 +74,103 @@ python3 -c "import sherpa; print(sherpa.__version__)"

It should print the version of `sherpa`.

### Start the server
#### Streaming ASR with pruned stateless Emformer RNN-T

#### Start the server

To start the server, you need to first generate two files:

- (1) The torch script model file. You can use `export.py --jit=1` in
`pruned_stateless_emformer_rnnt2` from [icefall][icefall].

- (2) The BPE model file. You can find it in `data/lang_bpe_XXX/bpe.model`
in [icefall][icefall], where `XXX` is the number of BPE tokens used in
the training.

With the above two files ready, you can start the server with the
following command:

```bash
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./path/to/exp/cpu_jit.pt \
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model
```

You can use `./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py --help`
to view the help message.

We provide a pretrained model using the LibriSpeech dataset at
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01>

The following shows how to use the above pretrained model to start the server.

```bash
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_server.py \
--port 6006 \
--max-batch-size 50 \
--max-wait-ms 5 \
--nn-pool-size 1 \
--nn-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/exp/cpu_jit-epoch-39-avg-6-use-averaged-model-1.pt \
--bpe-model-filename ./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/data/lang_bpe_500/bpe.model
```

#### Start the client

We provide two clients at present:

- (1) [./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py](./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py)
It shows how to decode a single sound file.

- (2) [./sherpa/bin/pruned_stateless_emformer_rnnt2/web](./sherpa/bin/pruned_stateless_emformer_rnnt2/web)
You can record your speech in real-time within a browser and send it to the server for recognition.

##### streaming_client.py

```bash
./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py --help

./sherpa/bin/pruned_stateless_emformer_rnnt2/streaming_client.py \
--server-addr localhost \
--server-port 6006 \
./icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01/test_wavs/1221-135766-0001.wav
```

##### Web client

```bash
cd ./sherpa/bin/pruned_stateless_emformer_rnnt2/web
python3 -m http.server 6008
```

Then open your browser and go to `http://localhost:6008/record.html`. You will
see a UI like the following screenshot.

![web client screenshot](./pic/emformer-streaming-asr-web-client.png)

Click the button `Record`.

Now you can `speak` and you will get recognition results from the
server in real-time.

**Caution**: For the web client, we hard-code the server port to `6006`.
You can change the file [./sherpa/bin/pruned_stateless_emformer_rnnt2/web/record.js](./sherpa/bin/pruned_stateless_emformer_rnnt2/web/record.js)
to replace `6006` in it to whatever port the server is using.

**Caution**: `http://0.0.0.0:6008/record.html` or `http://127.0.0.1:6008/record.html`
won't work. You have to use `localhost`. Otherwise, you won't be able to use
your microphone in your browser since we are not using `https` which requires
a certificate.

### Offline ASR

#### Start the server

To start the server, you need to first generate two files:

Expand All @@ -97,7 +193,7 @@ sherpa/bin/offline_server.py \
--feature-extractor-pool-size 5 \
--nn-pool-size 1 \
--nn-model-filename ./path/to/exp/cpu_jit.pt \
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model &
--bpe-model-filename ./path/to/data/lang_bpe_500/bpe.model
```

You can use `./sherpa/bin/offline_server.py --help` to view the help message.
Expand All @@ -122,7 +218,7 @@ sherpa/bin/offline_server.py \
--bpe-model-filename ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model
```

### Start the client
#### Start the client
After starting the server, you can use the following command to start the client:

```bash
Expand All @@ -147,7 +243,7 @@ sherpa/bin/offline_client.py \
icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13//test_wavs/1221-135766-0002.wav
```

### RTF test
#### RTF test

We provide a demo [./sherpa/bin/decode_manifest.py](./sherpa/bin/decode_manifest.py)
to decode the `test-clean` dataset from the LibriSpeech corpus.
Expand Down
Binary file added pic/emformer-streaming-asr-web-client.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading