Skip to content

Commit

Permalink
joeys2t documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
may- committed Jan 22, 2024
1 parent eb60b5f commit e808353
Show file tree
Hide file tree
Showing 7 changed files with 251 additions and 70 deletions.
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#   ![Joey-S2T](joey2-small.png) Joey S2T
[![build](https://github.com/may-/joeys2t/actions/workflows/main.yml/badge.svg)](https://github.com/may-/joeys2t/actions/workflows/main.yml)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![arXiv](https://img.shields.io/badge/arXiv-2210.02545-b31b1b.svg)](https://arxiv.org/abs/2210.02545)


JoeyS2T is an extension of [JoeyNMT](https://github.com/joeynmt/joeynmt) for Speech-to-Text tasks.
JoeyS2T is a [JoeyNMT](https://github.com/joeynmt/joeynmt) extension for Speech-to-Text tasks such as Automatic Speech Recognition (ASR) and end-to-end Speech Translation (ST). It inherits the core philosophy of JoeyNMT, a minimalist novice-friendly toolkit built on PyTorch, seeking **simplicity** and **accessibility**.


## What's new
Expand All @@ -12,7 +12,7 @@ JoeyS2T is an extension of [JoeyNMT](https://github.com/joeynmt/joeynmt) for Spe


## Features
Joey S2T implements the following features:
JoeyS2T implements the following features:
- Transformer Encoder-Decoder
- 1d-Conv Subsampling
- Cross-entropy and CTC joint objective
Expand All @@ -34,7 +34,7 @@ Furthermore, all the functionalities in JoeyNMT v2 are also available from JoeyS
## Installation

JoeyS2T is built on [PyTorch](https://pytorch.org/). Please make sure you have a compatible environment.
We tested JoeyS2T with
We tested JoeyS2T v2.3 with
- python 3.11
- torch 2.1.2
- torchaudio 2.1.2
Expand All @@ -45,26 +45,29 @@ Clone this repository and install via pip:
```bash
$ git clone https://github.com/may-/joeys2t.git
$ cd joeys2t
$ pip install -e .
$ python -m pip install -e .
$ python -m unittest
```

> :memo: **Note**
> You may need to install extra dependencies (torchaudio backends): [ffmpeg](https://ffmpeg.org/), [sox](https://sox.sourceforge.net/), [soundfile](https://pysoundfile.readthedocs.io/), etc.
> See [torchaudio installation instructions](https://pytorch.org/audio/stable/installation.html).

## Documentation & Tutorials

Please check the JoeyNMT's [documentation](https://joeynmt.readthedocs.io) first, if you are not familiar with JoeyNMT yet.
Please check the JoeyNMT's [documentation](https://joeys2t.readthedocs.io) first, if you are not familiar with JoeyNMT yet.

For details, follow the tutorials in [notebooks](notebooks) dir.

- [quick-start-with-joeynmt2](notebooks/quick-start-with-joeynmt2.ipynb)
- [speech-to-text-with-joeys2t](notebooks/joeyS2T_ASR_tutorial.ipynb)

## Benchmarks & pretrained models

We provide [benchmarks](benchmarks_s2t.md) and pretraind models for Speech Recognition (ASR) and Speech Translation (ST) with JoeyS2T.
## Benchmarks & Pretrained models

- [ASR on LibriSpeech](benchmarks_s2t.md#librispeech)
- [ST on MuST-C en-de](benchmarks_s2t.md#must-c-v2-en-de)
We provide [benchmarks](https://joeys2t.readthedocs.io/en/latest/benchmarks.html) and pretraind models for Speech Recognition (ASR) and Speech Translation (ST) with JoeyS2T.

Models are also available via Torch Hub!
The models are also available via Torch Hub!
```python
import torch

Expand Down
File renamed without changes.
65 changes: 50 additions & 15 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,14 @@ joeynmt.config module
:show-inheritance:


joeynmt.data_augmentation module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. automodule:: joeynmt.data_augmentation
:members:
:undoc-members:
:show-inheritance:


joeynmt.data module
^^^^^^^^^^^^^^^^^^^
.. automodule:: joeynmt.data
Expand Down Expand Up @@ -96,6 +104,24 @@ joeynmt.encoders module
:show-inheritance:


joeynmt.helpers_for_audio module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.helpers_for_audio
:members:
:undoc-members:
:show-inheritance:


joeynmt.helpers_for_ddp module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.helpers_for_ddp
:members:
:undoc-members:
:show-inheritance:


joeynmt.helpers module
^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -105,6 +131,15 @@ joeynmt.helpers module
:show-inheritance:


joeynmt.hub_interface module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.hub_interface
:members:
:undoc-members:
:show-inheritance:


joeynmt.initialization module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -114,6 +149,15 @@ joeynmt.initialization module
:show-inheritance:


joeynmt.loss module
^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.loss
:members:
:undoc-members:
:show-inheritance:


joeynmt.metrics module
^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -176,28 +220,19 @@ joeynmt.training module
:show-inheritance:


joeynmt.vocabulary module
^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.vocabulary
:members:
:undoc-members:
:show-inheritance:


joeynmt.loss module
^^^^^^^^^^^^^^^^^^^
joeynmt.transformer_layers module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.loss
.. automodule:: joeynmt.transformer_layers
:members:
:undoc-members:
:show-inheritance:


joeynmt.transformer_layers module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
joeynmt.vocabulary module
^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: joeynmt.transformer_layers
.. automodule:: joeynmt.vocabulary
:members:
:undoc-members:
:show-inheritance:
110 changes: 110 additions & 0 deletions docs/source/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,116 @@ Benchmarks
We provide several pretrained models with their benchmark results.


JoeyS2T
-------


* For ASR task, we compute WER (lower is better)
* For MT and ST task, we compute BLEU (higher is better)


LibriSpeech 100h
^^^^^^^^^^^^^^^^

+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| System | Architecture | dev-clean | dev-other | test-clean | test-other | #params | download |
+========================================================================+==============+===========+===========+============+============+=========+===========================================+
| `Kahn etal <https://arxiv.org/abs/1909.09116>`_ | BiLSTM | 14.00 | 37.02 | 14.85 | 39.95 | \- | |
+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `Laptev etal <https://arxiv.org/abs/2005.07157>`_ | Transformer | 10.3 | 24.0 | 11.2 | 24.9 | \- | |
+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `ESPnet <https://huggingface.co/pyf98/librispeech_100h_transformer>`__ | Transformer | 8.1 | 20.2 | 8.4 | 20.5 | \- | |
+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `ESPnet <https://huggingface.co/pyf98/librispeech_100h_conformer>`__ | Conformer | 6.3 | 17.4 | 6.5 | 17.3 | \- | |
+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| JoeyS2T | Transformer | 10.18 | 23.39 | 11.58 | 24.31 | 93M | :joeynmt2:`librispeech100h.tar.gz` (948M) |
+------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+


LibriSpeech 960h
^^^^^^^^^^^^^^^^

+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| System | Architecture | dev-clean | dev-other | test-clean | test-other | #params | download |
+===============================================================================================+==============+===========+===========+============+============+=========+===========================================+
| `Gulati etal <https://arxiv.org/abs/2005.08100>`_ | BiLSTM | 1.9 | 4.4 | 2.1 | 4.9 | \- | \- |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `ESPnet <https://github.com/espnet/espnet/tree/v.202207/egs2/librispeech/asr1#without-lm>`__ | Conformer | 2.3 | 6.1 | 2.6 | 6.0 | \- | \- |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `SpeechBrain <https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech>`_ | Conformer | 2.13 | 5.51 | 2.31 | 5.61 | 165M | \- |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `fairseq S2T <https://huggingface.co/facebook/s2t-small-librispeech-asr>`_ | Transformer | 3.23 | 8.01 | 3.52 | 7.83 | 71M | \- |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| `fairseq wav2vec2 <https://huggingface.co/facebook/wav2vec2-base-960h>`_ | Conformer | 3.17 | 8.87 | 3.39 | 8.57 | 94M | \- |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+
| JoeyS2T | Transformer | 10.18 | 23.39 | 11.58 | 24.31 | 102M | :joeynmt2:`librispeech960h.tar.gz` (1.1G) |
+-----------------------------------------------------------------------------------------------+--------------+-----------+-----------+------------+------------+---------+-------------------------------------------+


MuST-C ASR pretraining
^^^^^^^^^^^^^^^^^^^^^^

+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| System | train | eval | dev | tst-COMMON | tst-HE | #params | download |
+========================================================================+===============+=======+=======+=======+============+========+=========+=====================================+
| `Gangi etal <https://cris.fbk.eu/retrieve/handle/11582/319654/29817/3045.pdf>`_ | v1 | v1 | \- | 27.0 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| `ESPnet <https://github.com/espnet/espnet/tree/v.202207/egs/must_c/asr1/RESULTS.md>`__ | v1 | v1 | \- | 12.70 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| :fairseq:`fairseq S2T <speech_to_text/docs/mustc_example.md>` | v1 | v1 | 13.07 | 12.72 | 10.93 | 29.5M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| :fairseq:`fairseq S2T <speech_to_text/docs/mustc_example.md>` | v1 | v2 | 9.11 | 11.88 | 10.43 | 29.5M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v1 | 18.09 | 18.66 | 14.97 | 96M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v2 | 9.77 | 12.51 | 10.73 | 96M | :joeynmt2:`mustc_asr.tar.gz` (940M) |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+


MuST-C MT pretraining
^^^^^^^^^^^^^^^^^^^^^

+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| System | train | eval | dev | tst-COMMON | tst-HE | #params | download |
+========================================================================+===============+=======+=======+=======+============+========+=========+=====================================+
| `Gangi etal <https://cris.fbk.eu/retrieve/handle/11582/319654/29817/3045.pdf>`_ | v1 | v1 | \- | 25.3 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| `Zhang etal <https://aclanthology.org/2020.findings-emnlp.230/>`_ | v1 | v1 | \- | 29.69 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| `ESPnet <https://github.com/espnet/espnet/tree/v.202207/egs/must_c/asr1/RESULTS.md>`__ | v1 | v1 | \- | 27.63 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v1 | 21.85 | 23.15 | 20.37 | 66.5M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v2 | 26.99 | 27.61 | 25.26 | 66.5M | :joeynmt2:`mustc_mt.tar.gz` (729M) |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+


MuST-C end-to-end ST
^^^^^^^^^^^^^^^^^^^^

+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| System | train | eval | dev | tst-COMMON | tst-HE | #params | download |
+========================================================================+===============+=======+=======+=======+============+========+=========+=====================================+
| `Gangi etal <https://cris.fbk.eu/retrieve/handle/11582/319654/29817/3045.pdf>`_ | v1 | v1 | \- | 17.3 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| `Zhang etal <https://aclanthology.org/2020.findings-emnlp.230/>`_ | v1 | v1 | \- | 20.67 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| `ESPnet <https://github.com/espnet/espnet/tree/v.202207/egs/must_c/st1/RESULTS.md>`__ | v1 | v1 | \- | 22.91 | \- | \- | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| :fairseq:`fairseq S2T <speech_to_text/docs/mustc_example.md>` | v1 | v2 | 22.05 | 22.70 | 21.70 | 31M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v1 | 21.06 | 20.92 | 21.78 | 96M | |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+
| JoeyS2T | v2 | v2 | 24.26 | 23.86 | 23.86 | 96M | :joeynmt2:`mustc_st.tar.gz` (952M) |
+----------------------------------------------------------------------------------------+-------+-------+-------+------------+--------+---------+-------------------------------------+

sacrebleu signature: `nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.1.0`

.. note::

For MuST-C, we trained our model on the English-German subset of version 2, and evaluated the model both on version 1 and version 2 ``tst-COMMON``, ``and tst-HE splits``. See :notebooks:`benchmarks.ipynb` to replicate these results.


JoeyNMT v2.x
------------

Expand Down
Loading

0 comments on commit e808353

Please sign in to comment.