Releases: OpenNMT/CTranslate2
Releases Β· OpenNMT/CTranslate2
CTranslate2 3.1.0
Changes
- The input prompt is no longer included in the result of
Whisper.generate
as it is usually not useful in a transcription loop - The default beam size in
Whisper.generate
is updated from 1 to 5 to match the default value in openai/whisper - Generation options
min_length
andno_repeat_ngram_size
now penalize the logits instead of the log probs which may change some scores - Raise a deprecation warning when reading the
TranslationResult
object as a list of dictionaries
New features
- Allow configuring the C++ logs from Python with the function
ctranslate2.set_log_level
- Implement the timestamp decoding rules when the Whisper prompt does not include the token
<|notimestamps|>
- Add option
return_no_speech_prob
to the methodWhisper.generate
for the result to include the probability of the no speech token
Fixes and improvements
- Improve performance of the Whisper model when generating with a context
- Fix timestamp tokens in the Whisper vocabulary to use the correct format (
<|X.XX|>
) - Fix AVX and NEON log functions to return -inf on log(0) instead of NaN
- When info logs are enabled, log the system configuration only when the first model is loaded and not immediately when the library is loaded
- Define a
LogitsProcessor
abstract class to apply arbitrary updates to the logits during decoding - Update oneDNN to 2.7.2
CTranslate2 3.0.2
Fixes and improvements
- Whisper: fix
generate
arguments that were not correctly passed to the model
CTranslate2 3.0.1
Fixes and improvements
- Whisper: do not implicitly add
<|startoftranscript|>
ingenerate
since it is not always the first token
CTranslate2 3.0.0
This major version integrates the Whisper speech recognition model published by OpenAI. It also introduces some breaking changes to remove deprecated usages and simplify some modules.
Breaking changes
General
- Remove option
normalize_scores
: the scores are now always divided bypow(length, length_penalty)
withlength_penalty
defaulting to 1 - Remove option
allow_early_exit
: the beam search now exits early only when no penalties are used
Python
- Rename some classes:
OpenNMTTFConverterV2
->OpenNMTTFConverter
TranslationStats
->ExecutionStats
- Remove compatibility for reading
ScoringResult
as a list of scores: the scores can be accessed with the attributelog_probs
- Remove compatibility for reading
ExecutionStats
as a tuple - Remove support for deprecated Python version 3.6
CLI
- Rename the client executable
translate
to a more specific namect2-translator
C++
- Rename or remove some classes and methods:
TranslationStats
->ExecutionStats
GeneratorPool
->Generator
TranslatorPool
->Translator
TranslatorPool::consume_*
->Translator::translate_*
TranslatorPool::consume_stream
-> removedTranslatorPool::score_stream
-> removed
- Remove support for building with CUDA 10
New features
- Integrate the Whisper speech recognition model published by OpenAI
- Support conversion of models trained with OpenNMT-py V3
- Add method
Generator.forward_batch
to get the full model output for a batch of sequences - Add Python class
StorageView
to expose C++ methods taking or returning N-dimensional arrays: the class implements the array interface for interoperability with Numpy and PyTorch - Add a new configuration file
config.json
in the model directory that contains non structual model parameters (e.g. related to the input, the vocabulary, etc.) - Implement the Conv1D layer and operator on CPU and GPU (using oneDNN and cuDNN respectively)
- [C++] Allow registration of external models with
models::ModelFactory
Fixes and improvements
- Fix conversion of models that use biases only for some QKV projections but not for all
- Fuse masking of the output log probs by aggregating disabled tokens from all related options:
disable_unk
,min_length
,no_repeat_ngram_size
, etc. - Reduce the layer norm epsilon value on GPU to 1e-5 to match the default value in PyTorch
- Move some Transformer model attributes under the encoder/decoder scopes to simplify loading
- Redesign the
ReplicaPool
base class to simplify adding new classes with multiple model workers - Compile the library with C++17
- Update oneDNN to 2.7.1
- Update oneMKL to 2022.2
- Update pybind11 to 2.10.1
- Update cibuildwheel to 2.11.2
CTranslate2 2.24.0
Changes
- The Linux binaries now use the GNU OpenMP runtime instead of Intel OpenMP to workaround an initialization error on systems without
/dev/shm
Fixes and improvements
- Fix a memory error when running random sampling on GPU
- Optimize the model loading on multiple GPUs by copying the finalized model weights instead of reading the model from disk multiple times
- In the methods
Translator.translate_iterable
andTranslator.score_iterable
, raise an error if the input iterables don't have the same length - Fix some compilation warnings
CTranslate2 2.23.0
New features
- Build wheels for Python 3.11
Fixes and improvements
- In beam search, get more candidates from the model output and replace finished hypotheses by these additional candidates
- Fix possibly incorrect attention vectors returned from the beam search
- Fix coverage penalty that was actually not applied
- Fix crash when the beam size is larger than the vocabulary size
- Add missing compilation flag
-fvisibility=hidden
when building the Python module - Update oneDNN to 2.6.2
- Update OpenBLAS to 0.3.21
CTranslate2 2.22.0
Changes
score_batch
methods now return a list ofScoringResult
instances instead of plain lists of probabilities. In most cases you should not need to update your code: the result object implements the methods__len__
,__iter__
, and__getitem__
so that it can still be used as a list.
New features
- Add methods to efficiently process long iterables:
Translator.translate_iterable
Translator.score_iterable
Generator.generate_iterable
Generator.score_iterable
- Add decoding option
min_alternative_expansion_prob
to filter out unlikely alternatives inreturn_alternatives
mode - Return
ScoringResult
instances fromscore_batch
to include additional outputs. The current attributes are:tokens
: the list of tokens that were actually scored (including special tokens)log_probs
: the log probability of each scored token
- Support running
score_batch
asynchronously by setting theasynchronous
flag
Fixes and improvements
- Fix possibly incorrect results when using
disable_unk
oruse_vmap
with one of the following options:min_decoding_length
no_repeat_ngram_size
prefix_bias_beta
repetition_penalty
- Also pad the output layer during scoring to enable Tensor Cores
- Improve the correctness of the model output probabilities when the output layer is padded
- Skip translation when the NLLB input is empty (i.e. when the input only contains EOS and the language token)
CTranslate2 2.21.1
Fixes and improvements
- Fix conversion of NLLB models when
tokenizer_class
is missing from the configuration
CTranslate2 2.21.0
New features
- Support NLLB multilingual models via the Transformers converter
- Support Pegasus summarization models via the Transformers converter
Fixes and improvements
- Do not stop decoding when the EOS token is coming from the user input: this is required by some text generation models like
microsoft/DialoGPT
where EOS is used as a separator - Fix conversion error for language models trained with OpenNMT-py
- Fix conversion of models that are not using bias terms in the multi-head attention
- Fix data type error when enabling the translation options
return_alternatives
andreturn_attention
with afloat16
model - Improve CPU performance of language models quantized to
int8
- Implement a new vectorized GELU operator on CPU
- Raise a more explicit error when trying to convert a unsupported Fairseq model
- Update pybind11 to 2.10.0
CTranslate2 2.20.0
New features
- Generation option
no_repeat_ngram_size
to prevent the repetitions of N-grams with a minimum size
Fixes and improvements
- Fix conversion of OpenNMT-tf models that use static position embeddings
- Fix a segmentation fault in
return_alternatives
mode when the target prefix is longer thanmax_decoding_length
- Fix inconsistent state of asynchronous results in Python when a runtime exception is raised
- Remove
<pad>
token when converting MarianMT models from Transformers: this token is only used to start the decoder from a zero embedding, but it is not included in the original Marian model - Optimize CPU kernels with vectorized reduction of accumulated values
- Do not modify the configuration passed to
OpenNMTTFConverterV2.from_config
- Improve Python classes documentation by listing members at the top