Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HF Auto classes in LanguageModelFeaturizer #10624

Closed
wants to merge 43 commits into from

Conversation

mleimeister
Copy link
Contributor

@mleimeister mleimeister commented Jan 5, 2022

In order to allow users to use arbitrary HuggingFace models in LanguageModelFeaturizer, this PR gets rid of the hard-coded mapping between model architecture and model/tokenizer classes. Instead the model classes are inferred from the specified weights using AutoTokenizer and TFAutoModel. The implementation aims to provide the following:

  • Enable all HF models that are compatible with our current version of transformers
  • Keep default weights for the currently available models (for backward compatibility)
  • Reproduce output for existing model/weights combinations in the unit tests

TODO:

  • One part that so far was tricky to move to the Auto classes is the delimiter token cleaning (e.g. ## in BERT). The current version has a fixed list of known delimiter tokens (from BERT, GPT2, XLNET). Should investigate how the Tokenizer.convert_tokens_to_string function can be used that implements this cleaning step in the child classes. -> The 3 existing delimiter tokens are the ones currently listed in the HF documentation/tokenizer course.
  • Validate role of different special tokens in HF tokenizers (especially, should only CLS, BOS, EOS be filtered?) -> Testing various tokenizers, removing the UNK token seems the correct way. Nothing would prevent though someone writing a custom tokenizer that breaks this. The latest HF version contains a base class function SpecialTokensMixin.get_special_tokens_mask that however includes the UNK token in the mask and is therefore not useful for our purpose.
  • Remove nlu.utils.huggingface subdirectory
  • Add unit tests for "unknown" model architectures and missing default weights
  • Run model regression tests
  • Update docs

Proposed changes:

  • Use the HuggingFace Auto classes to load arbitrary weights from the HF model hub

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)
  • Check on HF issue board if exposing the delimiter prefixes could be implemented without too much work
  • Go through existing tokenizers and check if convert_tokens_to_string behaves as expected with our cleanup function
  • Unit tests for tokenizers that differ considerably to the currently supported

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2022

Commit: a20e992, The full report is available as an artifact.

Dataset: Sara, Dataset repository branch: main, commit: 819cb7b3cc077753e67178ad022d577f164e99cf

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(seq) + ResponseSelector(t2t)
test: 8m32s, train: 6m22s, total: 14m54s
0.7136 (0.00) 0.7925 (0.00) 0.7783 (0.00)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 7m22s, train: 12m59s, total: 20m21s
0.6957 (0.01) 0.7949 (0.00) 0.7860 (-0.01)

@github-actions github-actions bot deleted a comment from mleimeister Jan 14, 2022
@mleimeister
Copy link
Contributor Author

mleimeister commented Feb 14, 2022

@TyDunn Since this PR is close to being finished and contains a feature update (specifically, expanding the functionality of the current LanguageModelFeaturizer), I wanted to check with you regarding where this could best go? Would this be appropriate for the upcoming 3.1 minor release? The changes would be as follows:

  • More models from HuggingFace are supported, by using the model identifier from the HF model hub as the model_weights parameter. This enables most of the models we've seen in the forum that people wanted to use and couldn't so far.
  • The models that are not supported as of transformers version 4.13.0 are listed in the documentation. There is also an explicit check when the component loads and an error is being raised in case of an incompatible model, pointing to the documentation for further information.
  • I will discuss with QA about how to best run a regular check to ensure that the models we do support are up to date.

Let me know if you have any questions or we should discuss details in a short meeting.

@TyDunn
Copy link
Contributor

TyDunn commented Feb 15, 2022

@mleimeister Since this contains enhancements, then it should go into the next minor release (3.1)

@mleimeister
Copy link
Contributor Author

Hi @koernerfelicia, the latest commits now contain the changes we discussed. Particularly:

  • Incompatible models are listed in the docs and are checked in the component
  • A unit test checks the token cleanup for the currently supported model architectures and that incompatible models raise an error
  • A script is in place to test the token cleanup for all models that are supported from this current transformers version. I reached out to QA to discuss how to integrate this in e.g. a cronjob, or the pre-release QA process. The concrete implementation would be covered in this follow-up ticket.
  • Product is informed and confirmed it's viable for the upcoming minor release

Let me know if this looks ok to you 🙂

Copy link
Contributor

@koernerfelicia koernerfelicia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments. Also it occurred to me that I think we should have someone else review the docs part (dunno if CSE still does this). I'm not sure anymore if the concept of weights and models and how they relate to each other is confusing since we were not always good about consistent terminology here. Maybe we can have fresh eyes on just that docs section?

@koernerfelicia
Copy link
Contributor

I think we should have someone else review the docs part

I think we can request this in #product-docs, unless we think of someone specific (they should not have a deep knowledge of how HF transformers models are organised )

Also, have you let DevRel know about this? I think it is worth a forum post to drum up excitement 💃

@koaning
Copy link
Contributor

koaning commented Feb 16, 2022

Can we check some non-Latin alphabets here? It’d be a shame if we work for English, but break for Hebrew, Korean, Chinese, or Arabic. This is probably best served as a “slow test” that we run once a week or so. Running that on every PR would be horrendously slow.

@mleimeister
Copy link
Contributor Author

This is probably best served as a “slow test” that we run once a week or so. Running that on every PR would be horrendously slow.

@koaning That would be great. I'm actually looking for a similar functionality to regularly check if the tokenizer cleanup works as expected. Do we already have a process in place for running such "slow tests"? I guess it could be a cron job (?) and created this follow-up ticket to discuss with QA how to best do this: #10893

@koaning
Copy link
Contributor

koaning commented Feb 16, 2022

It's not super formal, but we have a cronjob for the base Rasa install that might serve as a source of inspiration. That runs daily though, this might be better served as an optional/weekly job.

@mleimeister
Copy link
Contributor Author

we have a cronjob for the base Rasa install that might serve as a source of inspiration

@koaning Ah yes, I saw that. I expanded the follow-up ticket to also contain the testing of non-latin script models. I would set up a meeting with QA to discuss next week how to best implement this, also as part of maybe a general strategy to run such "slow tests". Would you be happy to have this handled there in order to not expand this PR further?

Do you have any concerns regarding the docs in terms of being understandable for bot developers?

@koaning
Copy link
Contributor

koaning commented Feb 17, 2022

@mleimeister yeah for sure the topic of "what are good slow tests to run from cron" is a larger topic that's a bit out of scope for us here.

@koaning
Copy link
Contributor

koaning commented Feb 17, 2022

I guess the main thing in my mind when I read the docs is "this reads fine, but maybe it's time that we have a benchmarking guide". My main fear is that folks try out the Huggingface models but forget to check the computational overhead.

That's a whole 'nother effort though. Also related to "can we make the benchmarking dev experience better".

@koernerfelicia
Copy link
Contributor

@koaning maybe we could have a forum post announcing this change and link to your video where you go through why it's important to focus on other things like data quality over fancy, heavy embeddings?

@koaning
Copy link
Contributor

koaning commented Feb 17, 2022

A blog post wouldn't hurt, but the docs might be a more appropriate place. If you're interested in doing a benchmark, will you sooner look for info on the blog or on the docs?

This algorithm whiteboard video might be appropriate to link.

@github-actions
Copy link
Contributor

🚀 A preview of the docs have been deployed at the following URL: https://10624--rasahq-docs-rasa-v2.netlify.app/docs/rasa

@losterloh
Copy link
Contributor

@dakshvar22 Do you think we should still follow up on this PR?

@dakshvar22 dakshvar22 closed this Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load Huggingface Transformers model using TFAutoModel
6 participants