AutoML - Can't get training progress during image training #5553

LittleLittleCloud · 2020-12-15T18:58:32Z

System information

OS version/distro: win10
.NET Version (eg., dotnet --info): 3.1.4

Issue

What did you do?
I use AutoML API to launch an image classification training, and in order to get training progress, I attach a logger to the current context. However, no training progress shows after I attach to logger and start training.

What might happen

After some investigation, I believe the error is caused by one of the latest changes we made on how a trial is launched. In this PR #5445, it creates a new context instead of reusing the current context when starting a trial at the beginning. So when I subscribe to the log channel when calling API, it is actually listening to the current context's channel where no trial is ongoing. However, since that new context where the trial is ongoing is not available externally, there's no way to have a peek at training progress right now.

justinormont · 2020-12-16T00:28:20Z

Earlier discussion -- #5445 (review)

My initial thoughts from #5445 (comment):

We can always duplicate the logger. Or attach a logger to the new context, and when called, have it pass the message to the original context.

@LittleLittleCloud : What type of message are you reading from the log? Log scraping is likely the only usable method currently.

Future

In the longer term, we may want to have each component pass along a structured status message: { rows processed, percent complete, processing duration, current step name, memory, other stats }. ML․NET conveys very little information on the status of a training job.

The output from MAML was sometimes sufficient (examples: 1, 2, 3, 4). These give some notion of the progress of the training job.

Related issues on having an output verbosity level besides zero & firehose:

To quote an earlier issue comment:

As mentioned in #3235, MLContext.Log() doesn't have a verbosity selection, so it's more of a firehose.

If a verbosity argument is added to MLContext.Log(), the log output from there should be human readable to see general progress.

I believe it's still hidden within the firehose of output and once the verbosity is scaled down, you should see messages like:
LightGBM objective=multiclassova
[7] 'Loading data for LightGBM' finished in 00:00:15.6600468.
[8] 'Training with LightGBM' started.
..................................................(00:30.58)	0/200 iterations
..................................................(01:00.9)	1/200 iterations
..................................................(01:31.2)	2/200 iterations
..................................................(02:01.4)	2/200 iterations
..................................................(02:31.9)	3/200 iterations
..................................................(03:02.5)	4/200 iterations
..................................................(03:32.9)	4/200 iterations
..................................................(04:03.6)	5/200 iterations
..................................................(04:34.4)	5/200 iterations
..................................................(05:04.8)	6/200 iterations
And naively extrapolating, there's around 2.7 hours left in the LightGBM training.

mstfbl self-assigned this Dec 15, 2020

mstfbl added AutoML.NET Automating various steps of the machine learning process P1 Priority of the issue for triage purpose: Needs to be fixed soon. labels Dec 15, 2020

mstfbl mentioned this issue Dec 16, 2020

Forward logs of Experiment's sub MLContexts to main MLContext #5554

Merged

mstfbl linked a pull request Dec 16, 2020 that will close this issue

Forward logs of Experiment's sub MLContexts to main MLContext #5554

Merged

mstfbl closed this as completed in #5554 Dec 16, 2020

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML - Can't get training progress during image training #5553

AutoML - Can't get training progress during image training #5553

LittleLittleCloud commented Dec 15, 2020 •

edited

Loading

justinormont commented Dec 16, 2020

AutoML - Can't get training progress during image training #5553

AutoML - Can't get training progress during image training #5553

Comments

LittleLittleCloud commented Dec 15, 2020 • edited Loading

System information

Issue

What might happen

justinormont commented Dec 16, 2020

Future

LittleLittleCloud commented Dec 15, 2020 •

edited

Loading