Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contact libraries authors for possible enhancements #10

Open
Randl opened this issue Feb 10, 2017 · 10 comments
Open

Contact libraries authors for possible enhancements #10

Randl opened this issue Feb 10, 2017 · 10 comments

Comments

@Randl
Copy link

Randl commented Feb 10, 2017

I'm Tensorflow user, so I've opened an issue regarding performance of Tensorflow in several cases.

One of things we found out is that the code used by dlbench is suboptimal - tensorflow/tensorflow#7187 (comment)

So I thought you might consider to contact other libraries authors too, to get feedback from them

@shyhuai
Copy link
Collaborator

shyhuai commented Feb 10, 2017

Thanks for your suggestion. We have tried to contact the authors of other tools to confirm the scripts and configuration files. Feel free to submit your pull request if you have optimal implementations of our tested networks.

@shyhuai
Copy link
Collaborator

shyhuai commented Feb 10, 2017

@Randl The ResNet-50 script in MXNet was found a mistake configuration, and we have revised it to the correct one. So we start to re-run the revised script to generate new results. Could you also provide the tensorflow code that avoids to use feed_dict=feed_dict in FCN so that we can release the newer results together. Please be noted that the TF version should be 0.11. Thank you!

@Randl
Copy link
Author

Randl commented Feb 10, 2017

@shyhuai You should ask @tfboyd for optimal code.

@Randl
Copy link
Author

Randl commented Feb 12, 2017

@shelhamer @KeDengMS @piiswrong @soumith Sorry if you're wrong people to tag. Do you have something to add? Do think the benchmark can be improved somehow or your framework isn't used in a most efficient way?

@tfboyd
Copy link
Contributor

tfboyd commented Feb 12, 2017

Hi @shyhuai ,

I know how hard it is to run a bunch benchmarks using a wide range of tools. I do not know if I will have time to submit any PRs in the near future but I will if I can find time. One idea I did have that would make it easier for us to help. We do not do a lot with the CIFRA data sets because the image sizes are really small and GPUs end up processing in some cases 6,000+ samples (images) / sec. I understand moving to ImageNet could be a big change given you have done multiple rounds with CIFAR.

Good luck on future iterations. I cannot say I will always have time, but please feel free to reach out to me for code or whatever. I do not want to influence your results, but I am happy to help as impartially as I can.

@tfboyd
Copy link
Contributor

tfboyd commented Feb 12, 2017

Oh sorry, one more thing. We should have an MNIST example that does not use the python feed soon. It was intended as a tutorial. I will try to submit a PR or at a minimum link it to you when it is released.

edit: will have a MNIST example soon.

@shyhuai
Copy link
Collaborator

shyhuai commented Feb 13, 2017

@tfboyd Thank you very much for your kindly response and help. We also try to include real ImageNet data set into evaluation, but it could take more time to generate results since it takes several days to train a network model converged. I will inform you if we have further progress.

@ke1337
Copy link

ke1337 commented Feb 13, 2017

@shihuai, appreciate your effort on building benchmarks for major DL platforms. Please let me know if you found any issues in testing CNTK.
As to CIFAR vs. ImageNet, I think having both would be beneficial to measure the speed of computation and I/O separately. CIFAR-10 is a small data set, but one can still build complex networks on it, like ResNet110 in CNTK. That would be a very good indicator of how the platform performs when computation is intensive. ImageNet would put more pressure on I/O comparing to CIFAR-10.

@cepera-ang
Copy link

Maybe it worth looking something in between ImageNet and CIFAR, like Pascal VOC dataset?

@tfboyd
Copy link
Contributor

tfboyd commented Jun 1, 2017

I rewrote all of the TensorFlow examples with the exception of the RNN. I think this can be closed once the PRs are accepted. I suspect our ResNet is still off as there should not be as large of a gap between any of the platforms on one or even multiple GPUs especially a K80. they should all be with in about 5-10% maybe 20% in some weird cases but in general and as tested by NVIDIA the top frameworks are nearly identical (yes some are faster and slower but not dramatically) with CNNs. RNNs might be a different story but if everyone is using cuDNN again it should be similar and not dramatically different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants