Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples/run_s2s] remove task_specific_params and update rouge computation #10133

Merged
merged 5 commits into from
Feb 12, 2021

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Feb 11, 2021

What does this PR do?

  • correctly handle task_specific_params and prefix
    The current script tries to access the prefix from config.task_specific_params.prefix, which is always going to be None as task_specific_params is a nested dict with each key being a task name. This PR retrieves the task_specific_params from config using the task name (data_args.task), updates the config with the retrieved params (this is needed for T5), and access prefix using config.prefix

    @stas00 as you reported offline, the bleu score for the new script was different from the old script for T5 on the en-ro task. This was because the old script was using the task_specific_params and the new script wasn't. This update should resolve that issue.

  • Update rouge score computation.
    The rougeLsum metric expects newlines between each sentence, this is usually the score reported in papers. This PR

    1. adds newlines to each sentence in preds and labels using nltk to correctly compute rougeLsum
    2. pass use_stemmer=True to metric.compute to match the metrics with old script.
  • Add test_file argument to DataTrainingArguments to load custom test dataset.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing the script!

examples/seq2seq/run_seq2seq.py Show resolved Hide resolved
Comment on lines 359 to 365
# update config with task specific params
task_specific_params = model.config.task_specific_params
if task_specific_params is not None:
params = task_specific_params.get(data_args.task, {})
logger.info(f"Updating model.config with task specific params for {data_args.task}:\n {params}")
logger.info("Note: command line args may override some of these.")
model.config.update(params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the thing @patrickvonplaten told me to remove, so just pinging him here so you two can fight :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for T5 and for reproducing the metrics. I don't have any strong opinion here. If we decide to remove this, then we should remove all mentions of task_specific_params from the script and use the prefix only if the user has specified it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care if you want to change this, as long as we can accomplish the same in a new way.

I have a bit of a hard time understanding what is the intention behind removing functionality. Is this bad functionality? Is it not useful?

As I mentioned several times in this let's-rewrite-things context, as long as I have a reliable sensitive tool that can help me detect quality regressions over short dataset sample sets I am not attached to any specific way.

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @patil-suraj for this functionality sync with the old script. That's wonderful!

I see there is a potential conflict with restoring functionality that was purposefully removed. so let's see what @patrickvonplaten says.

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Feb 12, 2021

Context:

Here some context on the task_specific_params config param. In the beginning, we had T5 as the only model that was used for both the translation and summarization pipeline. The problem was that we had one model that we used as a default for both pipelines. At that time @thomwolf and I thought about a nice general design that - depending on the specific task (e.g. summarization, translation) - automatically sets the correct parameter set, so we started adding a task_specific_params parameter to the config that depending on the task sets the correct parameters. This is why the config of T5 is so long and looks like this:

{
...
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4,
      "prefix": "summarize: "
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "translate English to German: "
    },
    "translation_en_to_fr": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "translate English to French: "
    },
    "translation_en_to_ro": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "translate English to Romanian: "
    }
  },
  ...
}

=> So this design was chosen only for the pipelines and essentially only for T5 version 1 since T5 version 1, is the only model we have that needs task-specific params (especially due to the different required prefixes depending on the task). Up until now, there were too many problems with this mechanism IMO so that the benefit of having it is IMO outweighed by its disadvantages, which are:

1) It blows up the config a lot and is not scalable (what do you do with many-to-many translation models? you can have each combination of translation_..._to_...)

2) No one understood anymore what was happening under the hood. IMO, having such a mechanism is a bit too "magical" because it creates a whole other logical layer to the already complicated mechanism that we have for the config params. In short, we currently have the following logic in pipelines:

i) The function argument is used (such as max_length), if not given, then
ii) the config's task_specific_params (such as config.task_specific_params["summarization"]["max_length"] is used, if not set, then
iii) the normal config's param is used such as config.max_length, if not set, then
iiii) the default PretrainedConfig param is used.

=> It is obvious that this a very complicated and somewhat "magical" logic and lot of people internally didn't even really understand it. This is why I really would like to remove the second step. It's confusing to see multiple max_length parameters in the config IMO and it's just not worth it.

3) So far T5 is the only model that really requires this "magical" mechanism and that's mostly because it has a very special constraint in the sense that it was primed during training on cues such as translation from X to Y: ... which is definitely not something general that we would expect future models to have as well. We might very well have models in the future that have task-specific params like max_length and beam_search (It can very well be that a GPT3-like model that can do everything wants to adapt those params depending on the task), but those params are usually things that people are aware of and adjust themselves during evaluation IMO. E.g. if one is evaluating a model on summarization, setting the correct max_length, num_beams and maybe repetition_penalty is IMO something people should do themselves and not expect to be set correctly automatically.

4) It makes the pipelines in general very inflexible. E.g. when importing the pipeline classes directly, say the TranslationPipeline (which is what we did for a long time for the inference API - and maybe still do - not so sure anymore @julien-c @Narsil), there is no way of knowing that we should pass a task="summary" arg to the init to correctly load the task_specific_parms. To be more precise, imagine you want to directly import the TranslationPipeline here:

class TranslationPipeline(Text2TextGenerationPipeline):
where you don't see any task param. But in order to correctly load T5 translation params for TranslationPipeline, you actually manually have to pass task="translation_en_to_de" to the init (also note here that it's not as easy as just saying - let's just add a class attribute self.task = "translation_en_to_de" because the same pipeline is also used for EN->RO translation in which case one could not use the class attribute... => this created a lot of problems leading to @julien-c eventually hard-coding (I think) the correct task name for T5 into the inference API code, which then kind of defeated the purpose of having this mechanism.

Conclusion

That being said, I see two solutions in general:

  1. Eventually completely remove this mechanism (which I prefer)
  2. Keep this mechanism for the pipelines only. Since things like the pipelines or AutoNLP are not meant to be built for researchers I'm ok with having some "under-the-hood" magic / very abstracted logic there, but I definitely don't want to have it anywhere else.

=> This means that I really don't think that should use this param in run_seq2seq.py. It creates more confusion than it really helps and is not in line with our motivation to have the examples be "easy to tweak and to understand" by the user. I think as @sgugger already said multiple times the example scripts should not follow the "one-command-fits-all-cases" approach, but rather should be easy to understand and to tweak for the specific task. This is why I'm quite strongly against using the task_specific_params here. However, @patil-suraj @stas00 I think you are completely correct that we should try to not have a regression in performance here. So I would then actually prefer to hard code T5's prefixes in the script. Something like:

T5_PREFIX = {
    "summary": ...
    "translation_en_to_de": ...
}

Sorry for the long text, but I think this is actually an important mechanism not too many people are aware of and we should think about a more general solution for how to continue with task_specific_params. Actually also pinging @LysandreJik on this one to hear his opinion.

Happy to hear your opinions on what I wrote above :-)

@patil-suraj
Copy link
Contributor Author

Thanks a lot for the context @patrickvonplaten

Regarding the script, to follow the examples philosophy, let's just remove it completely. If a model requires prefix it should be passed explicitly and related params should be copied to the config manually in case one wants to reproduce some metrics.

@stas00
Copy link
Contributor

stas00 commented Feb 12, 2021

Thank you for the detailed explanation, @patrickvonplaten - that was very awesome of you to write it all out in such clarity.

I'm totally fine with your proposal, yet I think it'd be important to document how does one reproduce the same behavior with the new script and new t5 config then.

I already started an issue that documents the nuances of porting from ./finetune_trainer.py #10036 so perhaps it can belong there and once the notes have been compiled we can put them into the seq2seq/README.md to help users transition before ./finetune_trainer.py is moved into the unmaintained territory.

Should you decide to remove this mechanism completely, the t5 models on the hub should probably be updated to reflect that at some future point, so that there is no baggage to carry forward. Perhaps in a few release cycles after the cut is done? Surely, users who use older transformers version should still be able to run their scripts normally for quite some time. I'd imagine that's where the model files versioning could come in.

@patil-suraj
Copy link
Contributor Author

@stas00

To reproduce the same behavior with the new script

  1. Use the same dataset
  2. if using T5 manually pass the prefix argument,
  3. manually copy the task_specific_parms to config

Again, this is just for T5, the rest of the models should give similar results. So I'm going to merge this PR and let's update the readme in the clean-up PR #10136.

@patil-suraj patil-suraj changed the title [examples/run_s2s] fix task_specific_params and update rouge computation [examples/run_s2s] remove task_specific_params and update rouge computation Feb 12, 2021
@patil-suraj patil-suraj merged commit f51188c into huggingface:master Feb 12, 2021
@patil-suraj patil-suraj deleted the fix-run-s2s branch February 12, 2021 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants