-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed #17593
Conversation
if decoder_input_ids is None and decoder_inputs_embeds is None: | ||
use_cache = False | ||
|
||
output_hidden_states = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFBartMainLayer
(i.e. this class) is not a stand-alone class, output_hidden_states
is always passed from the stand-alone classes. This line was redundant and thus removed.
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I love the idea of this tool shaking out a load of bugs, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean-up - thanks!
…assed (huggingface#17593) * Merge PT and TF behavior
…assed (huggingface#17593) * Merge PT and TF behavior
What does this PR do?
The main model for TF and PT have a different behavior when no
decoder_input_ids
are passed. Because of that, the same model with the same inputs has different outputs in the two platforms, in this specific input case.Looking at the git blame, it seems like this
if
branch in PT has added after the TF model was added, and we possibly forgot to port it back.Notes:
pt-to-tf
CLI, which added much stricter tests.