time_model.py gives different results to those in model_zoo #79

Ushk · 2020-06-02T12:21:34Z

Hi - I appreciate there's already an open issue related to speed, but mine is slightly different.

When I run
python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml
having changed GPUS: from 8 to 1, I get the following dump. I am running this on a batch of size 64, with input resolution 224x224, on a V100, as stated in the paper.

This implies a forward pass of ~62ms, not the 33ms stated in MODEL_ZOO. Have I done something wrong? Not sure why the times are so different. The other numbers (acts, params, flops) all seem fine. The latency differences are seen for other models as well - here is 800MF (39ms vs model zoo's 21ms):

I am using commit a492b56, not the latest version of the repo, but MODEL_ZOO has not been changed since before this commit. This is because it is useful being able to time the models on dummy data, rather than having to construct a dataset. Would it be possible to have an option to do this? I can open a separate issue as a feature request for consideration if necessary.

The text was updated successfully, but these errors were encountered:

ir413 · 2020-06-02T21:54:59Z

Hi Dan, thanks for raising the issue. As a first step, I think it would be good to ensure that we are using the same version of the code and settings for the precise timing. Could you please try running the following command on the latest master and let us know what you observe?

Command:

python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml NUM_GPUS 1 TEST.BATCH_SIZE 64 PREC_TIME.WARMUP_ITER 5 PREC_TIME.NUM_ITER 50

We double-checked that using this command we get times very close to the model zoo (35ms). So if this does not resolve the issue, we should probably check the software versions.

Re timing on dummy data: do you mean that in the latest master we also time the loader which requires constructing a dataset?

Ushk · 2020-06-02T23:00:46Z

Thanks for the response! I tried the command, and unfortunately no change - this was the dump

I should have mentioned in the first place - I was using pytorch 1.3.1 with CUDA10.0.
I didn't have time to upgrade the CUDA version today, which precludes me from using a later version of pytorch on the server with a V100.

I also did a quick test with a P100, pytorch 1.3.1 and CUDA 10.1, and then I upgraded to pytorch 1.5, both experiments also got in the ~60ms ballpark.

Are there other dependencies you'd like to check?

And yep - in the commit in the initial issue, the timing uses dummy data, in the current master it requires constructing a dataset. Obviously both are valid use cases but - at least in my mind - the first makes strictly timing inference easier.

ir413 · 2020-06-02T23:14:18Z

Thanks for the update. From looking at the screenshots, it seems that you are still using an earlier commit? Could you please retry with latest master? Just to eliminate that as a potential cause of the issue. Here is a screenshot of what we observe:

Re dependencies: The versions you are using differ from ours but let's maybe double-check the code version first and then get back to the dependencies.

Re dummy data: I agree, constructing the loader can be annoying. We should probably add a flag for loader timing. Let's address this in a different issue.

Ushk · 2020-06-03T10:35:22Z

I checked out master, but I commented out the lines referring to the train loader, because I didn't want to have to download imagenet. Please let me know if this is an issue - I can do it if necessary.

I tried running the code with just compute_time_loader commented out, but got CUDA out of memory errors. This is really weird, since I assume you didn't see this - it's a V100, and it uses the command you posted above. I commented out compute_time_train as well - for clarity's sake, here is the code that I ran:

The final dump was as seen below - still in the ~60ms range unfortunately! Very strange.

ir413 · 2020-06-03T18:06:46Z

Thanks for trying with master. I think that timing without the loading code should be fine.

The code for computing training timings uses the train batch size by default. Using the config and the command above the train batch size ends up being 1024 on 1 GPU which likely causes the OOM issue. I should have included TRAIN.BATCH_SIZE 64 in the command. Sorry about that.

As a sanity check, could you check that the batch size used for eval timing is as expected? (e.g. print the size of inputs here). Also, could you print the timing per iteration? (e.g. print timer.diff at the end of the loop after this).

Ushk · 2020-06-03T18:53:39Z

Thanks again for the help! No worries re: the OOM, I should have clocked that.

I dropped the number of iterations to 10, so that I could get it all in a single screenshot; the results were the same at 50 iterations.

ir413 · 2020-06-03T20:51:08Z

Thanks for the information. The batch size is as expected. The timings also seem stable and consistent across iterations. I don't have any other ideas on what else to try on this front.

Let's maybe check the dependencies next? The timings I posted above were computed using PyTorch 1.4.0, CUDA 10.1, and cuDNN 7.6.3. Would it be possible to compute the timings using the same dependency versions on your side?

As an additional data point, we computed the timings on P100 GPU and observed 55ms (the command and the environment are the same as above; the only difference is P100 vs V100):

Also, do you have anything else running on that GPU or machine? It may be good to ensure that this is the only thing running on the system in case there is some overhead / interference.

Ushk · 2020-06-04T12:55:17Z

So as a confirmation of versions:

and I'm assuming there was no significant difference between 7.6.3 and 7.6.5 for now.

I used a fresh clone of the repo on a fresh server and:

So that's obviously good news. Unfortunately, I deleted the original server before preparing the new one, which obviously makes it hard to figure out what changed - now I can't recover the old timings! (Have tried with other pytorch versions). Since the issue is gone, I'm happy to put it down to a dodgy environment for now, and if I see the problem again, reopen the issue? The issue doesn't seem to have been experienced by anyone else, and it's taken enough time already.

ir413 · 2020-06-04T19:00:45Z

Thanks for the update. Glad to see that the timings match on a fresh server. Sounds good, let's revisit if the issue reappears in the future.

ir413 mentioned this issue Jun 2, 2020

Question about speed #74

Closed

rajprateek pinned this issue Jun 6, 2020

rajprateek closed this as completed Jun 6, 2020

tuggeluk mentioned this issue Aug 16, 2020

Bottlenecked by Dataloader #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time_model.py gives different results to those in model_zoo #79

time_model.py gives different results to those in model_zoo #79

Ushk commented Jun 2, 2020 •

edited

Loading

ir413 commented Jun 2, 2020 •

edited

Loading

Ushk commented Jun 2, 2020

ir413 commented Jun 2, 2020 •

edited

Loading

Ushk commented Jun 3, 2020

ir413 commented Jun 3, 2020

Ushk commented Jun 3, 2020

ir413 commented Jun 3, 2020

Ushk commented Jun 4, 2020

ir413 commented Jun 4, 2020

time_model.py gives different results to those in model_zoo #79

time_model.py gives different results to those in model_zoo #79

Comments

Ushk commented Jun 2, 2020 • edited Loading

ir413 commented Jun 2, 2020 • edited Loading

Ushk commented Jun 2, 2020

ir413 commented Jun 2, 2020 • edited Loading

Ushk commented Jun 3, 2020

ir413 commented Jun 3, 2020

Ushk commented Jun 3, 2020

ir413 commented Jun 3, 2020

Ushk commented Jun 4, 2020

ir413 commented Jun 4, 2020

Ushk commented Jun 2, 2020 •

edited

Loading

ir413 commented Jun 2, 2020 •

edited

Loading

ir413 commented Jun 2, 2020 •

edited

Loading