You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As describe on PyTorch's blog since version 1.12 it is possible to have significantly faster transformers.
To benefit from it in Python one has to use pre-built modules such as TransformerEncoder. Looking at the source code it seems to boil down to using _transformer_encoder_layer_fwd which is also available in tch-rs.
Do you think it would be possible to make use of it in rust-bert ?
I can have a look at it if you think it is worth investigating.
The text was updated successfully, but these errors were encountered:
Yes the availability of BetterTransformer is an interesting development.
The challenge for an integration in the library is twofold:
a lot of the language models implemented implement the attention mechanism from scratch, often with subtle differences that may differ from the BetterTransformer module.
even if the logic of the transformer block would be identical between the base implementation and BetterTransformer, the submodule and parameters may have different names that will not be loaded correctly using a torch.load_state_dict in Python (or varstore.load in the Rust version). The weight may have to be re-exported with updated variable names causing a lack of backward compatibility if the old one is removed.
Hello,
As describe on PyTorch's blog since version 1.12 it is possible to have significantly faster transformers.
To benefit from it in Python one has to use pre-built modules such as TransformerEncoder. Looking at the source code it seems to boil down to using
_transformer_encoder_layer_fwd
which is also available in tch-rs.Do you think it would be possible to make use of it in
rust-bert
?I can have a look at it if you think it is worth investigating.
The text was updated successfully, but these errors were encountered: