You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This value, however, is not representative of the overall samples per sec, but supposedly a samples per second for GPU 0 (rank: 0 in the sample json), and this value does not match very closely (only somewhat closely) to the actual metric we want, which is "CurrSamplesPerSec", not found in this file, but in the stdout of the training process.
The stdout of the training process is filtered by the ilab-client script and written to ilab-client-stderrout.txt. This file needs to be post-processed and occurrences of this metric logged.
The text was updated successfully, but these errors were encountered:
Currently, training metric is taken from training_params_and_metrics_global0.jsonl, getting each vale of "average_throughput" from a sample like:
This value, however, is not representative of the overall samples per sec, but supposedly a samples per second for GPU 0 (rank: 0 in the sample json), and this value does not match very closely (only somewhat closely) to the actual metric we want, which is "CurrSamplesPerSec", not found in this file, but in the stdout of the training process.
The stdout of the training process is filtered by the ilab-client script and written to ilab-client-stderrout.txt. This file needs to be post-processed and occurrences of this metric logged.
The text was updated successfully, but these errors were encountered: