Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set default models to training mode in the train_step. #103

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

drewoldag
Copy link
Collaborator

@drewoldag drewoldag commented Oct 23, 2024

It turns out that we have to explicitly set the model to training mode to actually train it. Oddly this isn't mentioned in the pytorch example code that was used when adding the ExampleCNN model.

For now, I've added model.train() to the create_trainer and model.eval() to the create_evaluator functions.

Confusion matrix before adding model.train()

Screenshot 2024-10-23 at 2 55 11 PM

Confusion matrix after adding model.train() 10 epochs

Screenshot 2024-10-23 at 2 42 03 PM

Confusion matrix after adding model.train() 50 epochs

Screenshot 2024-10-23 at 3 25 17 PM Overall accuracy for this is about 64%, which is a little better than the pytorch example code that got to about 54%.

@drewoldag drewoldag self-assigned this Oct 23, 2024
@drewoldag drewoldag linked an issue Oct 23, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Oct 23, 2024

Before [dad7062] <v0.1.1> After [096ecfe] Ratio Benchmark (Parameter)
2.50±0.3s 2.36±0.6s 0.94 benchmarks.time_computation
2.8k 440 0.16 benchmarks.mem_list

Click here to view all benchmarks.

Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 34.75%. Comparing base (dad7062) to head (411de9d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/fibad/pytorch_ignite.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
- Coverage   34.83%   34.75%   -0.08%     
==========================================
  Files          18       18              
  Lines         887      889       +2     
==========================================
  Hits          309      309              
- Misses        578      580       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtauraso
Copy link
Collaborator

It really seems like we should add a test for this.

Maybe we do a few dozen epochs in CI on CIFAR10 and ExampleAutoencoder, and threshold either loss or confusion matrix values?

That being said I have not objection to merging this as-is.

@drewoldag
Copy link
Collaborator Author

drewoldag commented Oct 23, 2024

Yeah, I'm not sure what the best way to test this would be. It seems like there is an attribute model.training that is a boolean that we could use as a signal. I'm surprised by this, it feels like a gotcha that isn't obviously documented in the pytorch getting started guides (and I would have assumed it would be)

I'll create a follow up issue to add some kind of testing around this.

@drewoldag
Copy link
Collaborator Author

new issue here: #104

@drewoldag drewoldag merged commit cefda87 into main Oct 23, 2024
6 of 8 checks passed
@drewoldag drewoldag deleted the issue/101/set-model-to-train-mode branch October 23, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training isn't doing what we expect
2 participants