We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using the following deployment, specifically with the Phi-3 model, sometimes the job freezes after a few requests (e.g. 2% with mmlu).
Phi-3
mmlu
Performing tail -f output/stderr.log from inside the main container confirms this to be the case.
tail -f output/stderr.log
main
However, if the exact same lm_eval invocation (as created by the driver) is manually started from the pod, it does not appear to freeze.
lm_eval
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
When using the following deployment, specifically with the
Phi-3
model, sometimes the job freezes after a few requests (e.g. 2% withmmlu
).Performing
tail -f output/stderr.log
from inside themain
container confirms this to be the case.However, if the exact same
lm_eval
invocation (as created by the driver) is manually started from the pod, it does not appear to freeze.The text was updated successfully, but these errors were encountered: