Cifar-10 example arguments being overwritten #57

Enumaris · 2020-10-01T22:44:24Z

Trying out the Cifar-10 examples and it appears that adding in some arguments doesn't work because they are being overwritten somewhere.

I tried changing the number of epochs to run by using the flag --epochs but it looks like the cifar10_deepspeed.py script has hard coded 2 epochs:

for epoch in range(2): # loop over the dataset multiple times

I also tried to change the learning rate to 0.0005 by changing the ds_config.json file and it seems like that gets pick up in some parts but overwritten in other parts.

For example I see
worker-0: [2020-10-01 22:41:49,395] [INFO] [config.py:624:print] optimizer_params ............. {'lr': 0.0005, 'betas': [0.8, 0.999], 'eps': 1e-08, 'weight_decay': 3e-07}

Which seems to have picked it up, but when it actually runs the training it always says:

worker-0: [2020-10-01 22:42:57,190] [INFO] [logging.py:60:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.001], mom=[[0.8, 0.999]]

Which seems to have not picked up the lr change (it stays at 0.001 throughout as well which suggests it's not doing any lr_warmup either). I haven't tracked down where in the script the learning rate got overwritten... but it does seem to be happening.

The text was updated successfully, but these errors were encountered:

PareesaMS · 2023-10-05T22:27:21Z

You are right about the epochs being hard-coded. Please use this patch to resolve this issue:#759

About the learning rate, I see that it used to be hardcoded to 0.001 but that line of code is already commented out. Therefore, this issue should have been resolved.

PareesaMS · 2023-11-10T00:28:12Z

The issue with the epochs being hard-coded is fixed here. So I am closing this issue.

PareesaMS closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cifar-10 example arguments being overwritten #57

Cifar-10 example arguments being overwritten #57

Enumaris commented Oct 1, 2020

PareesaMS commented Oct 5, 2023

PareesaMS commented Nov 10, 2023

Cifar-10 example arguments being overwritten #57

Cifar-10 example arguments being overwritten #57

Comments

Enumaris commented Oct 1, 2020

PareesaMS commented Oct 5, 2023

PareesaMS commented Nov 10, 2023