Releases · OpenNMT/CTranslate2

29 Nov 11:24

guillaumekln

v3.1.0

abc4ea8

CTranslate2 3.1.0

Changes

The input prompt is no longer included in the result of Whisper.generate as it is usually not useful in a transcription loop
The default beam size in Whisper.generate is updated from 1 to 5 to match the default value in openai/whisper
Generation options min_length and no_repeat_ngram_size now penalize the logits instead of the log probs which may change some scores
Raise a deprecation warning when reading the TranslationResult object as a list of dictionaries

New features

Allow configuring the C++ logs from Python with the function ctranslate2.set_log_level
Implement the timestamp decoding rules when the Whisper prompt does not include the token <|notimestamps|>
Add option return_no_speech_prob to the method Whisper.generate for the result to include the probability of the no speech token

Fixes and improvements

Improve performance of the Whisper model when generating with a context
Fix timestamp tokens in the Whisper vocabulary to use the correct format (<|X.XX|>)
Fix AVX and NEON log functions to return -inf on log(0) instead of NaN
When info logs are enabled, log the system configuration only when the first model is loaded and not immediately when the library is loaded
Define a LogitsProcessor abstract class to apply arbitrary updates to the logits during decoding
Update oneDNN to 2.7.2

Assets 2

14 Nov 16:01

guillaumekln

v3.0.2

67e69c5

CTranslate2 3.0.2

Fixes and improvements

Whisper: fix generate arguments that were not correctly passed to the model

Assets 2

10 Nov 15:30

guillaumekln

v3.0.1

15c6f40

CTranslate2 3.0.1

Fixes and improvements

Whisper: do not implicitly add <|startoftranscript|> in generate since it is not always the first token

Assets 2

07 Nov 14:44

guillaumekln

v3.0.0

a0673b7

CTranslate2 3.0.0

This major version integrates the Whisper speech recognition model published by OpenAI. It also introduces some breaking changes to remove deprecated usages and simplify some modules.

Breaking changes

General

Remove option normalize_scores: the scores are now always divided by pow(length, length_penalty) with length_penalty defaulting to 1
Remove option allow_early_exit: the beam search now exits early only when no penalties are used

Python

Rename some classes:
- OpenNMTTFConverterV2 -> OpenNMTTFConverter
- TranslationStats -> ExecutionStats
Remove compatibility for reading ScoringResult as a list of scores: the scores can be accessed with the attribute log_probs
Remove compatibility for reading ExecutionStats as a tuple
Remove support for deprecated Python version 3.6

CLI

Rename the client executable translate to a more specific name ct2-translator

C++

Rename or remove some classes and methods:
- TranslationStats -> ExecutionStats
- GeneratorPool -> Generator
- TranslatorPool -> Translator
- TranslatorPool::consume_* -> Translator::translate_*
- TranslatorPool::consume_stream -> removed
- TranslatorPool::score_stream -> removed
Remove support for building with CUDA 10

New features

Integrate the Whisper speech recognition model published by OpenAI
Support conversion of models trained with OpenNMT-py V3
Add method Generator.forward_batch to get the full model output for a batch of sequences
Add Python class StorageView to expose C++ methods taking or returning N-dimensional arrays: the class implements the array interface for interoperability with Numpy and PyTorch
Add a new configuration file config.json in the model directory that contains non structual model parameters (e.g. related to the input, the vocabulary, etc.)
Implement the Conv1D layer and operator on CPU and GPU (using oneDNN and cuDNN respectively)
[C++] Allow registration of external models with models::ModelFactory

Fixes and improvements

Fix conversion of models that use biases only for some QKV projections but not for all
Fuse masking of the output log probs by aggregating disabled tokens from all related options: disable_unk, min_length, no_repeat_ngram_size, etc.
Reduce the layer norm epsilon value on GPU to 1e-5 to match the default value in PyTorch
Move some Transformer model attributes under the encoder/decoder scopes to simplify loading
Redesign the ReplicaPool base class to simplify adding new classes with multiple model workers
Compile the library with C++17
Update oneDNN to 2.7.1
Update oneMKL to 2022.2
Update pybind11 to 2.10.1
Update cibuildwheel to 2.11.2

Assets 2

03 Oct 16:36

guillaumekln

v2.24.0

ecc8267

CTranslate2 2.24.0

Changes

The Linux binaries now use the GNU OpenMP runtime instead of Intel OpenMP to workaround an initialization error on systems without /dev/shm

Fixes and improvements

Fix a memory error when running random sampling on GPU
Optimize the model loading on multiple GPUs by copying the finalized model weights instead of reading the model from disk multiple times
In the methods Translator.translate_iterable and Translator.score_iterable, raise an error if the input iterables don't have the same length
Fix some compilation warnings

Assets 2

16 Sep 10:41

guillaumekln

v2.23.0

aafefca

CTranslate2 2.23.0

New features

Build wheels for Python 3.11

Fixes and improvements

In beam search, get more candidates from the model output and replace finished hypotheses by these additional candidates
Fix possibly incorrect attention vectors returned from the beam search
Fix coverage penalty that was actually not applied
Fix crash when the beam size is larger than the vocabulary size
Add missing compilation flag -fvisibility=hidden when building the Python module
Update oneDNN to 2.6.2
Update OpenBLAS to 0.3.21

Assets 2

02 Sep 13:18

guillaumekln

v2.22.0

15ebbe2

CTranslate2 2.22.0

Changes

score_batch methods now return a list of ScoringResult instances instead of plain lists of probabilities. In most cases you should not need to update your code: the result object implements the methods __len__, __iter__, and __getitem__ so that it can still be used as a list.

New features

Add methods to efficiently process long iterables:
- Translator.translate_iterable
- Translator.score_iterable
- Generator.generate_iterable
- Generator.score_iterable
Add decoding option min_alternative_expansion_prob to filter out unlikely alternatives in return_alternatives mode
Return ScoringResult instances from score_batch to include additional outputs. The current attributes are:
- tokens: the list of tokens that were actually scored (including special tokens)
- log_probs: the log probability of each scored token
Support running score_batch asynchronously by setting the asynchronous flag

Fixes and improvements

Fix possibly incorrect results when using disable_unk or use_vmap with one of the following options:
- min_decoding_length
- no_repeat_ngram_size
- prefix_bias_beta
- repetition_penalty
Also pad the output layer during scoring to enable Tensor Cores
Improve the correctness of the model output probabilities when the output layer is padded
Skip translation when the NLLB input is empty (i.e. when the input only contains EOS and the language token)

Assets 2

29 Jul 17:49

guillaumekln

v2.21.1

77a48d4

CTranslate2 2.21.1

Fixes and improvements

Fix conversion of NLLB models when tokenizer_class is missing from the configuration

Assets 2

27 Jul 15:11

guillaumekln

v2.21.0

dbfdd4f

CTranslate2 2.21.0

New features

Support NLLB multilingual models via the Transformers converter
Support Pegasus summarization models via the Transformers converter

Fixes and improvements

Do not stop decoding when the EOS token is coming from the user input: this is required by some text generation models like microsoft/DialoGPT where EOS is used as a separator
Fix conversion error for language models trained with OpenNMT-py
Fix conversion of models that are not using bias terms in the multi-head attention
Fix data type error when enabling the translation options return_alternatives and return_attention with a float16 model
Improve CPU performance of language models quantized to int8
Implement a new vectorized GELU operator on CPU
Raise a more explicit error when trying to convert a unsupported Fairseq model
Update pybind11 to 2.10.0

Assets 2

06 Jul 16:58

guillaumekln

v2.20.0

7557678

CTranslate2 2.20.0

New features

Generation option no_repeat_ngram_size to prevent the repetitions of N-grams with a minimum size

Fixes and improvements

Fix conversion of OpenNMT-tf models that use static position embeddings
Fix a segmentation fault in return_alternatives mode when the target prefix is longer than max_decoding_length
Fix inconsistent state of asynchronous results in Python when a runtime exception is raised
Remove <pad> token when converting MarianMT models from Transformers: this token is only used to start the decoder from a zero embedding, but it is not included in the original Marian model
Optimize CPU kernels with vectorized reduction of accumulated values
Do not modify the configuration passed to OpenNMTTFConverterV2.from_config
Improve Python classes documentation by listing members at the top

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes

New features

Fixes and improvements

Fixes and improvements

Fixes and improvements

Breaking changes

General

Python

CLI

C++

New features

Fixes and improvements

Changes

Fixes and improvements

New features

Fixes and improvements

Changes

New features

Fixes and improvements

Fixes and improvements

New features

Fixes and improvements

New features

Fixes and improvements

Releases: OpenNMT/CTranslate2

CTranslate2 3.1.0

Changes

New features

Fixes and improvements

CTranslate2 3.0.2

Fixes and improvements

CTranslate2 3.0.1

Fixes and improvements

CTranslate2 3.0.0

Breaking changes

General

Python

CLI

C++

New features

Fixes and improvements

CTranslate2 2.24.0

Changes

Fixes and improvements

CTranslate2 2.23.0

New features

Fixes and improvements

CTranslate2 2.22.0

Changes

New features

Fixes and improvements

CTranslate2 2.21.1

Fixes and improvements

CTranslate2 2.21.0

New features

Fixes and improvements

CTranslate2 2.20.0

New features

Fixes and improvements