Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to output "test predictions" text file with each checkpoint in run_seq2seq.py #10381

Closed
kingpalethe opened this issue Feb 24, 2021 · 10 comments · Fixed by #10428
Closed

Comments

@kingpalethe
Copy link

Further to this discussion:

https://discuss.huggingface.co/t/how-to-output-test-generations-txt-with-run-seq2seq-py/3825

The prior incarnation of this script would output test generations at each checkpoint, which was very useful for understanding the progress of model training.

The current script...

https://github.com/huggingface/transformers/blob/master/examples/seq2seq/run_seq2seq.py

Seems to only output this text file once, at the end of the last epoch.

If there was a way to enable the previous behavior, I am guessing that would be widely useful.

thanks

@LysandreJik
Copy link
Member

May be of interest to @patil-suraj @stas00 @sgugger

@stas00
Copy link
Contributor

stas00 commented Feb 24, 2021

Yes, as I replied in the forums, this functionality was dropped - not sure why it was done, as I wasn't part part of the planning discussion.

I think it was not intentional, the devs were probably unaware it was used and given that the example tests were dropped too it's not surprising it was missed. I propose the dropped examples tests are restored (which will require porting to the new script) which will expose some of the functionality that was removed with it.

Practically, let's identify what else might have been removed and create separate issues besides this one and may be ask the community to help restore/backport the previously working things to the new script(s)?

e.g. one such important thing is the tests that were moved to legacy, so this script is no longer being tested.

p.s. this should be of help restoring/porting the example tests #10036

@stas00
Copy link
Contributor

stas00 commented Feb 25, 2021

@bhadreshpsavani, please let us know if you're inspired to take care of this in:
#10337 (comment)
Thank you.

@bhadreshpsavani
Copy link
Contributor

Sure @stas00,
I can take care of this with a separate PR or if possible in the same PR,
Thanks

@stas00
Copy link
Contributor

stas00 commented Feb 26, 2021

Correction, as I was refactoring run_seq2seq.py I can see now that the code wasn't removed - it's exactly the same. Someone decided to rename the resulting file instead. So the feature hasn't been removed, just renamed.

I'm not attached to either,

  1. the original was saving it as "test_generations.txt"
  2. the new one as "test_preds_seq2seq.txt"

I think the original name is the most intuitive one.

@sgugger, do you have an opinion here?

@stas00
Copy link
Contributor

stas00 commented Feb 26, 2021

@bhadreshpsavani, so please hold a moment while we are re-modelling run_seq2seq.py and then I will update you when the model example is ready to be synced. Thank you!

stas00 added a commit that referenced this issue Feb 26, 2021
This PR restores the original functionality that for some reason was modified.

Fixes: #10381

@sgugger
@stas00
Copy link
Contributor

stas00 commented Feb 26, 2021

PR to restore the original functionality: #10428

stas00 added a commit that referenced this issue Feb 27, 2021
#10428)

This PR restores the original functionality that for some reason was modified.

Fixes: #10381

@sgugger
@stas00
Copy link
Contributor

stas00 commented Feb 27, 2021

OK, the original name has been restored as it used to be, @kingpalethe

As I mentioned in #10428 if you'd like to request a new feature to do this on each check point please don't hesitate to make such request.

@kingpalethe
Copy link
Author

@stas00 thanks -- apologies, you are correct. I had hallucinated this behavior. I made a new issue: #10439

@stas00
Copy link
Contributor

stas00 commented Feb 27, 2021

All is good.

and now I see that my PR made that script inconsistent with other scripts, but perhaps all scripts should use the same filename for test_generations.txt. I can't quite see the point of it having a different name in each script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants