Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters #30362

yting27 · 2024-04-20T14:38:59Z

What does this PR do?

Below is the code example I used. The max_length and truncation (as well as padding and add_special_tokens) parameters passed to the pipeline method are overridden when generating output.

from transformers import pipeline


template_inst = "Instruct: {prompt}\nOutput:"
max_length = 25

text_generator = pipeline("text-generation", model="openai-gpt", max_length=max_length, max_new_tokens=30, truncation=True)
results = text_generator(template_inst.format(prompt="The personal life of William Shakespeare is somewhat of a mystery. There are two primary sources that provide historians with an outline of his life. One is his work, and the other is official documentation such as church and court records. However, these provide only brief sketches of specific events in his life and yield little insight into the man himself. When Was Shakespeare Born?"))

print(results)

Print out preprocess_params variables in Pipeline.__call__() method:

Kwargs: {}
self._preprocess_params: {'add_special_tokens': False, 'truncation': True, 'padding': False, 'max_length': 25, 'max_new_tokens': 30}
preprocess_params: {'add_special_tokens': False, 'truncation': None, 'padding': False, 'max_length': None}
preprocess_params (after): {'add_special_tokens': False, 'truncation': None, 'padding': False, 'max_length': None, 'max_new_tokens': 30}

As shown in the output, after self._sanitize_parameters() is called, both truncation and max_length are set to None.

I know we can pass the preprocess parameters when calling text_generator method (see code above). However, it just doesn't make sense to me that the previous parameters provided should be overridden. Hence I made the change. Feel free to improve if needed 😃

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

pipelines: @Narsil

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

amyeroberts · 2024-04-22T11:21:14Z

cc @gante @zucchini-nlp

zucchini-nlp · 2024-04-22T12:36:55Z

I think it makes sense and the changes are consistent with the text2text pipeline, where "truncation" is handled similarly

gante

LGTM, thank you for making transformers better @yting27 💛

Context for @amyeroberts [not sure what's your pipeline expertise, so I'm going with None]: when we init a pipeline, we store kwargs such as tokenizer options to act as default values. Before this PR, preprocess_params (which would hold some of the __call__-time kwargs) was being initialized with a set of default values, and those default values would override the kwargs set at __init__ time. By initializing preprocess_params as an empty dict we avoid that, fixing the issue.

🔍 I've checked the other pipelines: this is the only pipeline that is NOT following the correct pattern on main.

LysandreJik

This looks fair to me as well! Thanks for your contribution @yting27

… previously provided parameters (#30362) * Fixed TextGenerationPipeline._sanitize_parameters default params * removed empty spaces --------- Co-authored-by: Ng, Yen Ting <[email protected]>

Fixed TextGenerationPipeline._sanitize_parameters default params

aaf1e55

yting27 added 2 commits April 23, 2024 22:10

Merge branch 'main' into fix-sanitize_params

00c98f9

removed empty spaces

4560029

gante approved these changes May 3, 2024

View reviewed changes

gante requested a review from amyeroberts May 3, 2024 13:53

LysandreJik approved these changes May 3, 2024

View reviewed changes

LysandreJik merged commit deb7605 into huggingface:main May 3, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters #30362

Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters #30362

yting27 commented Apr 20, 2024

amyeroberts commented Apr 22, 2024

zucchini-nlp commented Apr 22, 2024

gante left a comment •

edited

Loading

LysandreJik left a comment

Prevent TextGenerationPipeline._sanitize_parameters from overriding previously provided parameters #30362

Prevent TextGenerationPipeline._sanitize_parameters from overriding previously provided parameters #30362

Conversation

yting27 commented Apr 20, 2024

What does this PR do?

Before submitting

Who can review?

amyeroberts commented Apr 22, 2024

zucchini-nlp commented Apr 22, 2024

gante left a comment • edited Loading

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters #30362

Prevent `TextGenerationPipeline._sanitize_parameters` from overriding previously provided parameters #30362

gante left a comment •

edited

Loading