Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

time_model.py gives different results to those in model_zoo #79

Closed
Ushk opened this issue Jun 2, 2020 · 9 comments
Closed

time_model.py gives different results to those in model_zoo #79

Ushk opened this issue Jun 2, 2020 · 9 comments

Comments

@Ushk
Copy link

Ushk commented Jun 2, 2020

Hi - I appreciate there's already an open issue related to speed, but mine is slightly different.

When I run
python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml
having changed GPUS: from 8 to 1, I get the following dump. I am running this on a batch of size 64, with input resolution 224x224, on a V100, as stated in the paper.

image
This implies a forward pass of ~62ms, not the 33ms stated in MODEL_ZOO. Have I done something wrong? Not sure why the times are so different. The other numbers (acts, params, flops) all seem fine. The latency differences are seen for other models as well - here is 800MF (39ms vs model zoo's 21ms):
image

I am using commit a492b56, not the latest version of the repo, but MODEL_ZOO has not been changed since before this commit. This is because it is useful being able to time the models on dummy data, rather than having to construct a dataset. Would it be possible to have an option to do this? I can open a separate issue as a feature request for consideration if necessary.

@ir413
Copy link
Contributor

ir413 commented Jun 2, 2020

Hi Dan, thanks for raising the issue. As a first step, I think it would be good to ensure that we are using the same version of the code and settings for the precise timing. Could you please try running the following command on the latest master and let us know what you observe?

Command:

python tools/time_net.py --cfg configs/dds_baselines/regnetx/RegNetX-1.6GF_dds_8gpu.yaml NUM_GPUS 1 TEST.BATCH_SIZE 64 PREC_TIME.WARMUP_ITER 5 PREC_TIME.NUM_ITER 50

We double-checked that using this command we get times very close to the model zoo (35ms). So if this does not resolve the issue, we should probably check the software versions.

Re timing on dummy data: do you mean that in the latest master we also time the loader which requires constructing a dataset?

@Ushk
Copy link
Author

Ushk commented Jun 2, 2020

Thanks for the response! I tried the command, and unfortunately no change - this was the dump
image

I should have mentioned in the first place - I was using pytorch 1.3.1 with CUDA10.0.
I didn't have time to upgrade the CUDA version today, which precludes me from using a later version of pytorch on the server with a V100.

I also did a quick test with a P100, pytorch 1.3.1 and CUDA 10.1, and then I upgraded to pytorch 1.5, both experiments also got in the ~60ms ballpark.
image

Are there other dependencies you'd like to check?

And yep - in the commit in the initial issue, the timing uses dummy data, in the current master it requires constructing a dataset. Obviously both are valid use cases but - at least in my mind - the first makes strictly timing inference easier.

@ir413
Copy link
Contributor

ir413 commented Jun 2, 2020

Thanks for the update. From looking at the screenshots, it seems that you are still using an earlier commit? Could you please retry with latest master? Just to eliminate that as a potential cause of the issue. Here is a screenshot of what we observe:

101842989_281370959706342_8505875605206597632_n

Re dependencies: The versions you are using differ from ours but let's maybe double-check the code version first and then get back to the dependencies.

Re dummy data: I agree, constructing the loader can be annoying. We should probably add a flag for loader timing. Let's address this in a different issue.

@Ushk
Copy link
Author

Ushk commented Jun 3, 2020

I checked out master, but I commented out the lines referring to the train loader, because I didn't want to have to download imagenet. Please let me know if this is an issue - I can do it if necessary.

I tried running the code with just compute_time_loader commented out, but got CUDA out of memory errors. This is really weird, since I assume you didn't see this - it's a V100, and it uses the command you posted above. I commented out compute_time_train as well - for clarity's sake, here is the code that I ran:
image

The final dump was as seen below - still in the ~60ms range unfortunately! Very strange.
image

@ir413
Copy link
Contributor

ir413 commented Jun 3, 2020

Thanks for trying with master. I think that timing without the loading code should be fine.

The code for computing training timings uses the train batch size by default. Using the config and the command above the train batch size ends up being 1024 on 1 GPU which likely causes the OOM issue. I should have included TRAIN.BATCH_SIZE 64 in the command. Sorry about that.

As a sanity check, could you check that the batch size used for eval timing is as expected? (e.g. print the size of inputs here). Also, could you print the timing per iteration? (e.g. print timer.diff at the end of the loop after this).

@Ushk
Copy link
Author

Ushk commented Jun 3, 2020

Thanks again for the help! No worries re: the OOM, I should have clocked that.

I dropped the number of iterations to 10, so that I could get it all in a single screenshot; the results were the same at 50 iterations.

image

@ir413
Copy link
Contributor

ir413 commented Jun 3, 2020

Thanks for the information. The batch size is as expected. The timings also seem stable and consistent across iterations. I don't have any other ideas on what else to try on this front.

Let's maybe check the dependencies next? The timings I posted above were computed using PyTorch 1.4.0, CUDA 10.1, and cuDNN 7.6.3. Would it be possible to compute the timings using the same dependency versions on your side?

As an additional data point, we computed the timings on P100 GPU and observed 55ms (the command and the environment are the same as above; the only difference is P100 vs V100):

101925007_2672289793006532_8939347722176036864_n

Also, do you have anything else running on that GPU or machine? It may be good to ensure that this is the only thing running on the system in case there is some overhead / interference.

@Ushk
Copy link
Author

Ushk commented Jun 4, 2020

So as a confirmation of versions:
image
image

and I'm assuming there was no significant difference between 7.6.3 and 7.6.5 for now.

I used a fresh clone of the repo on a fresh server and:
image

So that's obviously good news. Unfortunately, I deleted the original server before preparing the new one, which obviously makes it hard to figure out what changed - now I can't recover the old timings! (Have tried with other pytorch versions). Since the issue is gone, I'm happy to put it down to a dodgy environment for now, and if I see the problem again, reopen the issue? The issue doesn't seem to have been experienced by anyone else, and it's taken enough time already.

@ir413
Copy link
Contributor

ir413 commented Jun 4, 2020

Thanks for the update. Glad to see that the timings match on a fresh server. Sounds good, let's revisit if the issue reappears in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants