Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Docs for LanguageModelFeaturizer #10385

Closed
koaning opened this issue Nov 25, 2021 · 16 comments
Closed

Improve Docs for LanguageModelFeaturizer #10385

koaning opened this issue Nov 25, 2021 · 16 comments
Labels
area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components type:docs 📖 Improvements to the documenation. Adding missing pieces or improving existing ones.

Comments

@koaning
Copy link
Contributor

koaning commented Nov 25, 2021

There's a lot of confusion on the forum this week on how to configure the LanguageModelFeautizer. This suggests that we may want to add documentation that explains how to link models. I think part of the issue here is that it's hard for a user to know what the model_name of any given model on huggingface might be.

@koaning koaning added area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components type:docs 📖 Improvements to the documenation. Adding missing pieces or improving existing ones. labels Nov 25, 2021
@mleimeister
Copy link
Contributor

mleimeister commented Nov 26, 2021

I ran into the same issue and tried finding the reason in the LanguageModelFeaturizer code. If I understand correctly, when the model is loaded in _load_model_metadata, only the architectures listed in the model_class_dict are possible choices. This is necessary since we use some hard-coded constants per architecture (e.g. here)

Therefore, any other architecture such as xlm-roberta (which is different from plain roberta in HF) will produce an error. The docs do list the currently possible model names, but the sentence "The full list of supported architectures can be found in the HuggingFace documentation." is misleading. Another option would be to maintain the supported model architectures more actively in line with HF releases, I guess.

@mleimeister
Copy link
Contributor

Another potential issue comes from the fact that not all models published on https://huggingface.co/models have a Tensorflow implementation. For example, xlm-roberta-base only seems to come with PyTorch weights: https://huggingface.co/xlm-roberta-base/tree/main and running

from transformers import TFXLMRobertaModel
model = TFXLMRobertaModel.from_pretrained("xlm-robert-base")

results in an error

OSError: Can't load weights for 'xlm-roberta-base'. Make sure that:
- 'xlm-roberta-base' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'xlm-roberta-base' is the correct path to a directory containing a file named one of tf_model.h5, pytorch_model.bin.

@koaning
Copy link
Contributor Author

koaning commented Nov 29, 2021

@mleimeister wouldn't this be fixed by installing pytorch?

@datonefaridze
Copy link

@koaning i think no, but not sure if tensorflow will compute gradients on pytorch architecture, so it can mess up training

@koaning
Copy link
Contributor Author

koaning commented Nov 30, 2021

The tensors that we pass to DIET are static. Rasa uses the huggingface models as a featurizer so the gradients will never update the weights in the LanguageModelFeaturizer. So that should not be a concern here.

Having a pytorch dependency will make for very heavy docker containers though, especially if it already contains tensorflow ... I guess 1GB+ volumes are to be expected.

@mleimeister
Copy link
Contributor

mleimeister commented Dec 8, 2021

@koaning Hm, yes, not sure tbh about adding pytorch as dependency and resulting Docker image sizes. Would this then be more of an ops/infrastructure question re the Docker images?

For most models, there seems to be a version of TF weights, sometimes just not under the default specifier. In that case the more limiting factor is that one can currently only use architectures from the hard-coded list in registry.py. I wonder if this could be made more flexible, e.g. by using the HF Auto classes instead to automatically infer the model and tokenizer classes.

Edit: the TFAutoModel usage has already been discussed in #6307 and might be blocked due to how transformers are used in our pipeline.

@koaning
Copy link
Contributor Author

koaning commented Dec 8, 2021

@mleimeister oh just to be clear, I don't think that Rasa should add it as a dependency, but I think there's nothing stopping any of our users to do so if they wanted to.

@mleimeister
Copy link
Contributor

This was another user forum question that might be relevant for improving the docs. Seemingly the error came from a HF model using a different tokenizer than what we expect (e.g. BertModel with AlbertTokenizer), which breaks due to the hard coded mapping in registry.py.

@ganbaaelmer
Copy link

@mleimeister is right i have same issue. please fix it soon as possible...

@ganbaaelmer
Copy link

We need to run Bert model with AlbertTokenizer.

@koernerfelicia
Copy link
Contributor

koernerfelicia commented Jan 3, 2022

Maybe it's worth producing a tutorial on how to extend the registry and pre-/post-processing to accommodate models that Rasa doesn't officially support? I've sketched this out to users in the forum before (see here), but not with a whole lot of detail

@koaning
Copy link
Contributor Author

koaning commented Jan 3, 2022

@koernerfelicia you think the main component guide on our docs does not suffice? Part of me is a bit anxious in writing a guide for it because I fear that it may suggest that writing your own components for huggingface kinds of featurizers is a best practice. Certainly for English, I'd prefer to prevent users from trying out all sorts of models if they haven't shown their assistant to an end-user yet.

@koernerfelicia
Copy link
Contributor

I think that's already very useful! I was thinking more of LMFeaturizer specific stuff, like how to figure out which tokens to add, which tokens need to be cleaned up, what tokenizer to use, etc.

I understand not wanting to encourage the use of/endless fiddling with these models, and ultimately defer to you there! I don't really know the justification behind why we support the models we do support, and think that for example the user I linked wanting to set up Albert seems fair enough to me.

I don't think we want to support more models bc we don't have a generic way to do this, so maybe it is easiest to show people how they can extend the component if they want to.

@mleimeister
Copy link
Contributor

As a side project, I started an attempt to use the HuggingFace Auto classes to enable loading arbitrary models/weights, so we might be able to get rid of the model-specific token manipulations. The current state is here.

Since it'll take some more time to get this validated, would the docs update be good to clarify the current situation for now? Regarding a tutorial on how to extend the current version with other models, would the docs be the right place, or rather some external ressource we can point to?

@koernerfelicia
Copy link
Contributor

@mleimeister the docs update is definitely good as is to close this imo!

I'm not as sure about the tutorial, I think ultimately that is up to @koaning (whether to do a tutorial bc of his reservations, and how we would go about the tutorial).

I'm interested to see how you get on with the Auto classes. As you already saw in that other TFAutoModel issue there is a pending transformers update that might make this easier, this is tracked here/here, and I believe Kathrin and Matt are responsible (out for now, but back in the next days).

@mleimeister
Copy link
Contributor

Closing this ticket since #10616 has been merged to 3.0.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components type:docs 📖 Improvements to the documenation. Adding missing pieces or improving existing ones.
Projects
None yet
Development

No branches or pull requests

5 participants