Set default models to `training` mode in the `train_step`. #103

drewoldag · 2024-10-23T21:40:45Z

It turns out that we have to explicitly set the model to training mode to actually train it. Oddly this isn't mentioned in the pytorch example code that was used when adding the ExampleCNN model.

For now, I've added model.train() to the create_trainer and model.eval() to the create_evaluator functions.

Confusion matrix before adding `model.train()`

Confusion matrix after adding `model.train()` 10 epochs

Confusion matrix after adding `model.train()` 50 epochs

Overall accuracy for this is about 64%, which is a little better than the pytorch example code that got to about 54%.

…ally train it.

github-actions · 2024-10-23T21:44:00Z

Before [`dad7062`] <v0.1.1>	After [`096ecfe`]	Ratio	Benchmark (Parameter)
2.50±0.3s	2.36±0.6s	0.94	benchmarks.time_computation
2.8k	440	0.16	benchmarks.mem_list

Click here to view all benchmarks.

codecov · 2024-10-23T21:44:38Z

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 34.75%. Comparing base (dad7062) to head (411de9d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/fibad/pytorch_ignite.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   34.83%   34.75%   -0.08%     
==========================================
  Files          18       18              
  Lines         887      889       +2     
==========================================
  Hits          309      309              
- Misses        578      580       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mtauraso · 2024-10-23T22:50:07Z

It really seems like we should add a test for this.

Maybe we do a few dozen epochs in CI on CIFAR10 and ExampleAutoencoder, and threshold either loss or confusion matrix values?

That being said I have not objection to merging this as-is.

drewoldag · 2024-10-23T23:02:09Z

Yeah, I'm not sure what the best way to test this would be. It seems like there is an attribute model.training that is a boolean that we could use as a signal. I'm surprised by this, it feels like a gotcha that isn't obviously documented in the pytorch getting started guides (and I would have assumed it would be)

I'll create a follow up issue to add some kind of testing around this.

drewoldag · 2024-10-23T23:03:34Z

new issue here: #104

It turns out that we have to set the model to training mode to actu…

c97284e

…ally train it.

drewoldag requested review from mtauraso and aritraghsh09 October 23, 2024 21:40

drewoldag self-assigned this Oct 23, 2024

drewoldag linked an issue Oct 23, 2024 that may be closed by this pull request

Training isn't doing what we expect #101

Closed

Looks like moving the model.train() and model.eval() here should work.

411de9d

mtauraso approved these changes Oct 23, 2024

View reviewed changes

drewoldag merged commit cefda87 into main Oct 23, 2024
6 of 8 checks passed

drewoldag deleted the issue/101/set-model-to-train-mode branch October 23, 2024 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set default models to `training` mode in the `train_step`. #103

Set default models to `training` mode in the `train_step`. #103

drewoldag commented Oct 23, 2024 •

edited

Loading

github-actions bot commented Oct 23, 2024 •

edited

Loading

codecov bot commented Oct 23, 2024 •

edited

Loading

mtauraso commented Oct 23, 2024

drewoldag commented Oct 23, 2024 •

edited

Loading

drewoldag commented Oct 23, 2024

Set default models to training mode in the train_step. #103

Set default models to training mode in the train_step. #103

Conversation

drewoldag commented Oct 23, 2024 • edited Loading

Confusion matrix before adding model.train()

Confusion matrix after adding model.train() 10 epochs

Confusion matrix after adding model.train() 50 epochs

github-actions bot commented Oct 23, 2024 • edited Loading

codecov bot commented Oct 23, 2024 • edited Loading

Codecov Report

mtauraso commented Oct 23, 2024

drewoldag commented Oct 23, 2024 • edited Loading

drewoldag commented Oct 23, 2024

Set default models to `training` mode in the `train_step`. #103

Set default models to `training` mode in the `train_step`. #103

drewoldag commented Oct 23, 2024 •

edited

Loading

Confusion matrix before adding `model.train()`

Confusion matrix after adding `model.train()` 10 epochs

Confusion matrix after adding `model.train()` 50 epochs

github-actions bot commented Oct 23, 2024 •

edited

Loading

codecov bot commented Oct 23, 2024 •

edited

Loading

drewoldag commented Oct 23, 2024 •

edited

Loading