Add missing type hints #16059

Rocketknight1 · 2022-03-10T19:06:15Z

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors!

🚀 Add missing type hints

Type hints are used inconsistently in the transformers repo across both TF and PT models, and it'd be nice to make them a complete, consistent thing for the core models, especially because we want to develop features that depend on them!

Guide to contributing:

Ensure you've read our contributing guidelines 📜
Claim your architecture(s) in this thread (ensure no one is working on it). It's 100% okay to only take the TensorFlow or PyTorch version of a model, if you're not familiar with both frameworks! It's also okay to claim multiple models and group those changes into a single PR! 🎯
Implement the changes as in Adding type hints for TFRoBERTa #16057 or Add type annotations for BERT and copies #16074 (see the diff on the model architectures for a few examples) 💪
Open the PR and tag me in it. You should run make fixup at the end to do a code quality check before your final commit!

Tips for making your PR

The files you need to edit will be in src/transformers/models/[model_name]/
For TensorFlow, you want the modeling_tf_[model_name].py file. For PyTorch, you want the modeling_[model_name].py file.
Remember, you do not have to cover every class in that file!. The main thing we want to cover is the call (for TF) or forward (for PT) method for user-facing classes like TFRobertaForMaskedLM or RobertaForSequenceClassification. It's not necessary to add type hints to layers or base classes like RobertaModel or TFRobertaPreTrainedModel - these are trickier to write, and generally people do not use those classes as standalone models.
If you're unfamiliar with how type hints work, you can read the Python library documentation on them, but it's probably even easier to just look at another PR that added them. Take a look at the list of changes in the pull requests linked above!
The types will usually be obvious - most inputs are Optional[Union[np.ndarray, tf.Tensor]] for TF models and Optional[torch.Tensor] for PyTorch models, and boolean inputs are Optional[bool]. Pay attention to the first input of TF models, though, which is usually TFModelInputType - this is because Keras handles that first input in a special way! Other inputs to pay attention to are past_key_values, which can vary between models, and also the model output type. For the base model classes like RobertaModel, you may have to look at the corresponding MainLayer to figure out the right output type! Also, note that the output type may be a tuple if return_dict is False, in which case you should specify Union[Tuple, ...]. Finally, note that in TF models, training is never None, so it should be training: bool and not training: Optional[bool].
Note that some code is copied across our codebase. If you see a line like # Copied from transformers.models.bert..., this means that the code is copied from that source, and our scripts will automatically keep that in sync. If you see that, you should not edit the copied method! Instead, edit the original method it's copied from, and run make fixup to synchronize that across all the copies. Be sure you installed the development dependencies with pip install -e ".[dev"], as described in the contributor guidelines above, to ensure that the code quality tools in make fixup can run.

How can I find models that need type hints?

I used to maintain a list here, but it got out of date, I'm sorry. Instead, you can use this Colab notebook. If you run this, it will show you models in PyTorch or TF that are still missing type hints. Unlike my manually curated lists, it's guaranteed to be up to date - but do double-check that someone else in the thread hasn't claimed a model before you start, because the Colab code will only register type hints after the PR containing them is merged!

The text was updated successfully, but these errors were encountered:

divyanshugit · 2022-03-11T17:08:26Z

I would love to work on PyTorch Albert🚀

johnnv1 · 2022-03-11T17:11:06Z

Hi, I would like to work on PyTorch ImageGPT

chainyo · 2022-03-11T17:19:22Z

Hi, I would like to work on CamemBERT for PT & TF.

I will take a look at LayoutLMv2 after the first one 😃

Edit: Because CamemBert depends on Roberta I will take PyTorch Roberta 👍

Vaibhavs10 · 2022-03-11T17:22:17Z

Hello!

I'd like to take Hubert & Wav2Vec2 for Pytorch.

Cheers!

johnryan465 · 2022-03-11T17:28:13Z

I'll try PyTorch BERT to start!

Rocketknight1 · 2022-03-11T17:29:52Z

@johnryan465 I just did it as an example, I'm sorry! I'm marking off the completed models now.

johnryan465 · 2022-03-11T17:32:17Z

@Rocketknight1 no worries, will try and do DistillBert instead

cakiki · 2022-03-11T17:32:44Z

I'd like to work on GPT2 (TF).

chainyo · 2022-03-11T17:33:01Z

@Rocketknight1 I switch to Roberta PyTorch because CamemBERT depends on Roberta modeling

johnnygreco · 2022-03-11T17:52:58Z

Awesome! Hey @Rocketknight1 – I'd like to work on Longformer for both PyTorch & TF!

tanmoyio · 2022-03-11T18:46:24Z

I'd like to work on BigBird

jacobdineen · 2022-03-11T19:07:26Z

I would like to work on Clip for pytorch.

johnnv1 · 2022-03-11T19:08:33Z

Also, will work on BeiT, Deit and ViT (Pytorch)

bhavika · 2022-03-11T21:58:05Z

I can work on ImageGPT.

omer-dor · 2022-03-11T22:44:15Z

I can work on Swin (Pytorch)

elusenji · 2022-03-11T23:03:23Z

I'd like to work on XLM (Tensorflow)

Dahlbomii · 2022-03-11T23:43:12Z

I'll take T5 (Tensorflow)!

KristijanArmeni · 2022-03-12T00:40:50Z

I'd like to claim GPT-2 (PyTorch).

robotjellyzone · 2022-03-12T03:08:11Z

Hi @Rocketknight1,

I would like to work on BART of both TF and PyTorch

kamalkraj · 2022-03-12T06:22:02Z

ELECTRA TF - #16104
ELECTRA PT - #16103
DeBERTA PT - #16105

manandey · 2022-03-12T07:06:13Z

XLMRobertaXL (PyTorch)

Rocketknight1 · 2023-08-04T13:22:18Z

@nablabits If the Colab doc says type hints are missing, then at least some are still missing, so you can totally take it!

(People often forget the return type, and the Colab will still mark a model as missing type hints if it isn't there)

nablabits · 2023-08-08T04:47:23Z

@Rocketknight1, ahh yeah, thanks for the reminder, I've seen that in the comments before, I will pay extra attention to that when opening the PR

nablabits · 2023-08-08T06:38:42Z

Hi @Rocketknight1 I've just pushed above PR ☝️, hope I'm not missing anything 🤞

Chances are this has been suggested/tried in the past and there's a good reason to not do it, but just in case, probably we can improve the notebook with this bit that not only will check for the return but also for all the arguments in the forward method:

import transformers
from typing import get_type_hints
from inspect import getfullargspec
for obj in dir(transformers):
    try:
        model = getattr(transformers, obj)
        if issubclass(model, transformers.PreTrainedModel):
            actual_hints = set(get_type_hints(model.forward))
            expected_hints = set(getfullargspec(model.forward).args)
            expected_hints.remove('self')  # self does not carry type hints
            expected_hints.add('return')  # we need a type hint also for the output

            missing_hints = expected_hints - actual_hints
            if missing_hints:
                print(f"{obj}: {missing_hints}")

    except:
        pass

Running above, yields that for instance AltCLIPModel is missing token_type_ids (check here)

nablabits · 2023-08-09T04:44:01Z

Hey @Rocketknight1 I will continue with AltCLIP now (a quick search on this page does not return any result for that)

nablabits · 2023-08-09T07:23:40Z

@sgugger I'd say that this might have been closed automatically but it shouldn't have. Probably there's some automation I don't know about that closes issues whenever some conditions are met, in that case lmk if I need to change anything whenever I open a PR.
Thanks for your patience 🙏

Rocketknight1 · 2023-08-09T12:15:58Z

@nablabits That happens when you link an issue in the PR - Github assumes the PR resolves the issue!

Also, your suggestion for the notebook is good - I'll update it today!

Rocketknight1 · 2023-08-09T12:27:56Z

@nablabits notebook updated based on your suggestion, thank you!

nablabits · 2023-08-10T04:11:23Z

That happens when you link an issue in the PR - Github assumes the PR resolves the issue!

Hey Matt, that makes sense, probably it was picking the 16059 in the title of the PR or in the commit or even the bit fixes 16059 as a whole. I can see some other PRs that didn't close the issue and still keep a reference in the first comment, eg, #23071

notebook updated based on your suggestion, thank you!

Brilliant, always happy to be useful 🤗

I'm keen to further reduce that list so I will pick some of the first entries:

BigBirdForQuestionAnswering was picked back in the day but it's still missing one

These ones have not been picked yet:

Blip2QFormerModel
ConditionalDetrForObjectDetection
ConditionalDetrForSegmentation
ConditionalDetrModel

Is it ok to open a PR with all of them?

@Rocketknight1

nablabits · 2023-08-15T05:01:38Z

Hey Matt, I will pick now these guys, let's crush that list 📋:

CpmAntModel
DecisionTransformerModel
DPR family: DPRContextEncoder, DPRQuestionEncoder, DPRReader
Deformable Detr family: DeformableDetrForObjectDetection, DeformableDetrModel
Deta family: DetaForObjectDetection, DetaModel
Detr family: DetrForObjectDetection, DetrForSegmentation, DetrModel

@Rocketknight1

nablabits · 2023-08-23T04:23:53Z

Hey Matt, I'm working now on the next 6 items:

ErnieM Family: ErnieMForInformationExtraction, ErnieMForMultipleChoice, ErnieMForQuestionAnswering, ErnieMForSequenceClassification, ErnieMForTokenClassification & ErnieMModel
EsmForProteinFolding
GraphormerModel
InstructBlipQFormerModel
LayoutLMForMaskedLM
LukeForEntitySpanClassification

@Rocketknight1

nablabits · 2023-08-24T04:41:40Z

Hi Matt, at this point I'm keen to finish all the remaining type hints for the pytorch models (17). I will open a couple of PRs so the review won't be a pain (unless you tell me otherwise)

@Rocketknight1

nablabits · 2023-08-25T06:58:49Z

So these ☝️ are the last ones for the pytorch models.

By looking at the git history I appreciate that you are pretty busy these days so no rush on this. Here you have a quick recap on what we have so far:

Add type hints for several pytorch models (batch-2) #25557: in this guy I tested what we talked about and kept the type hints so it might be good to deploy.
Add type hints for several pytorch models (batch-3) #25705: no action yet
Add type hints for several pytorch models (batch-4) #25749: no action yet
Add type hints for pytorch models (final batch) #25750: no action yet

In the meantime I will start getting an intuition of how the TF ones work, I can't say that I'm going to address all of them (as there are a lot) but let's see how far we can get 🐌

nablabits · 2023-08-30T06:21:54Z

In the meantime I will start getting an intuition of how the TF ones work, I can't say that I'm going to address all of them (as there are a lot) but let's see how far we can get 🐌

It turns out that there are not that many after all, because most of them are ultimately subclassing from tf.keras.model whose call method is not annotated itself, hence the error. I'd say that we are only interested in the models that actually override the call method, so I updated a bit the script to retrieve missing type hints:

from inspect import getfullargspec, isclass, isfunction
import transformers
from typing import get_type_hints

def is_call_overridden(cls):
    return "call" in cls.__dict__ and isfunction(cls.__dict__["call"])

def compute_missing_hints_tf(model):
    if isclass(model) and issubclass(model, transformers.TFPreTrainedModel) and is_call_overridden(model):
        actual_hints = set(get_type_hints(model.call))
        expected_hints = set(getfullargspec(model.call).args)
        expected_hints.remove("self")  # self does not carry type hints
        expected_hints.add("return")  # we need a type hint also for the output

        missing_hints = expected_hints - actual_hints
        if missing_hints:
            print(f"{obj}: {missing_hints}")

This yields a much narrower list, so if these assumptions are right, with another PR we might be done with the issue. Matt, let me know what you think.

@Rocketknight1

Rocketknight1 · 2023-08-31T16:02:47Z

@nablabits Ah, of course! I should have realized that the base PreTrainedModel classes do not have an overridden call(), because they're always subclassed before being used. Good catch.

Rocketknight1 · 2023-09-04T17:17:59Z

This project is now officially complete! Thank you to everyone in this thread, and to other people who filed PRs, and congratulations to @nablabits who filed the final PR to finish it all!

cebtenzzre · 2023-09-04T17:35:26Z

Are we any closer to reverting #18485?

Rocketknight1 added the Good First Issue label Mar 10, 2022

Rocketknight1 self-assigned this Mar 10, 2022

chainyo mentioned this issue Mar 11, 2022

Add missing type hints for all flavors of RoBERTa PyTorch models. #16086

Merged

johnnv1 mentioned this issue Mar 11, 2022

Add type annotations for ImageGPT #16088

Merged

chainyo mentioned this issue Mar 11, 2022

Add missing type hints for all flavors of LayoutLMv2 PyTorch models. #16089

Merged

johnryan465 mentioned this issue Mar 11, 2022

Adding type hints for Distilbert #16090

Merged

johnnv1 mentioned this issue Mar 11, 2022

Add type annotations for Beit, ViT and Deit models - Pytorch #16092

Closed

osanseviero mentioned this issue Mar 11, 2022

Change unpacking of TF Bart inputs to use decorator #16094

Merged

5 tasks

tanmoyio mentioned this issue Mar 12, 2022

Adding type hints for BigBird pytorch #16091

Closed

nablabits mentioned this issue Aug 8, 2023

16059 - Add missing type hints for ASTModel #25364

Merged

nablabits mentioned this issue Aug 9, 2023

16059 - Add extra type hints for AltCLIPModel #25399

Merged

sgugger closed this as completed in #25364 Aug 9, 2023

sgugger reopened this Aug 9, 2023

Rocketknight1 closed this as completed in #25399 Aug 9, 2023

Rocketknight1 reopened this Aug 9, 2023

nablabits mentioned this issue Aug 14, 2023

Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models #25488

Merged

nablabits mentioned this issue Aug 17, 2023

Add type hints for several pytorch models (batch-2) #25557

Merged

nablabits mentioned this issue Aug 24, 2023

Add type hints for several pytorch models (batch-3) #25705

Merged

This was referenced Aug 25, 2023

Add type hints for several pytorch models (batch-4) #25749

Merged

Add type hints for pytorch models (final batch) #25750

Merged

nablabits mentioned this issue Aug 30, 2023

Add type hints for tf models batch 1 #25853

Merged

nablabits mentioned this issue Aug 31, 2023

Add type hints for tf models final batch #25883

Merged

Rocketknight1 closed this as completed Sep 4, 2023

Add missing type hints #16059

Add missing type hints #16059

Comments

Rocketknight1 commented Mar 10, 2022 • edited Loading

This issue is part of our Great Code Cleanup 2022. If you're interested in helping out, take a look at this thread, or come join us on Discord and talk with other contributors!

🚀 Add missing type hints

Guide to contributing:

Tips for making your PR

How can I find models that need type hints?

divyanshugit commented Mar 11, 2022

johnnv1 commented Mar 11, 2022

chainyo commented Mar 11, 2022 • edited Loading

Vaibhavs10 commented Mar 11, 2022 • edited Loading

johnryan465 commented Mar 11, 2022

Rocketknight1 commented Mar 11, 2022

johnryan465 commented Mar 11, 2022

cakiki commented Mar 11, 2022

chainyo commented Mar 11, 2022

johnnygreco commented Mar 11, 2022

tanmoyio commented Mar 11, 2022

jacobdineen commented Mar 11, 2022 • edited Loading

johnnv1 commented Mar 11, 2022 • edited Loading

bhavika commented Mar 11, 2022

omer-dor commented Mar 11, 2022

elusenji commented Mar 11, 2022

Dahlbomii commented Mar 11, 2022

KristijanArmeni commented Mar 12, 2022

robotjellyzone commented Mar 12, 2022 • edited Loading

kamalkraj commented Mar 12, 2022 • edited Loading

manandey commented Mar 12, 2022

Rocketknight1 commented Aug 4, 2023

nablabits commented Aug 8, 2023

nablabits commented Aug 8, 2023

nablabits commented Aug 9, 2023

nablabits commented Aug 9, 2023

Rocketknight1 commented Aug 9, 2023

Rocketknight1 commented Aug 9, 2023

nablabits commented Aug 10, 2023

nablabits commented Aug 15, 2023

nablabits commented Aug 23, 2023

nablabits commented Aug 24, 2023

nablabits commented Aug 25, 2023

nablabits commented Aug 30, 2023

Rocketknight1 commented Aug 31, 2023

Rocketknight1 commented Sep 4, 2023

cebtenzzre commented Sep 4, 2023

Rocketknight1 commented Mar 10, 2022 •

edited

Loading

chainyo commented Mar 11, 2022 •

edited

Loading

Vaibhavs10 commented Mar 11, 2022 •

edited

Loading

jacobdineen commented Mar 11, 2022 •

edited

Loading

johnnv1 commented Mar 11, 2022 •

edited

Loading

robotjellyzone commented Mar 12, 2022 •

edited

Loading

kamalkraj commented Mar 12, 2022 •

edited

Loading