-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel-optimized TF 1.9, AVX support #5142
Conversation
We only need 1 now that older hardware is supported.
@EdwardDixon Thanks for trying this - it would be great if we were able to have a single conda env, but a couple of questions:
|
@EdwardDixon What is the behavior if AVX is not available? Does it crash/refuse to run, or is there a graceful automatic fallback to non-vectorized code? Without such a fallback in place, I'm not sure we could switch to it exclusively. |
No AVX = it'll crash... sounds bad, I admit. But remember we are talking about running a deep neural network over a large dataset: is someone really going to want to do that on hardware that is 8 years old (pre-AVX)? This version of TensorFlow is now the default for all Anaconda users, which in practice probably means a sizeable fraction of the machine learning community, and so having minimum hardware requirements in line with theirs is perhaps not so unreasonable? Another option would be to change the default: have the gatk enviroment use the accelerated TensorFlow (since almost everyone has AVX, and they can get a 10X or so speedup), but make a second environment available for people that want to try to run a deep neural network on very old hardware - gatk-old? |
@EdwardDixon Well, you'd be surprised at some of the hardware we have to deal with. Even some machines here at the Broad don't have AVX. In general, our policy with hardware-dependent optimizations in GATK has been to insist on having a transparent fallback mechanism when the required hardware isn't present -- I'd really prefer not to start making exceptions to that rule. Could the Intel-optimized Tensorflow be patched to fall back to vanilla tensorflow when AVX is not present? Is that an option? Or could it at least be patched to not actually crash in that case? |
This sounds like a good rule, in general. In this case though, if users are going to run deep neural networks, there is going to be a substantial computational burden, such that running them is unlikely to appeal to users with older hardware (about 95% of users who train these models use accelerators, for example - one reason Deep Learning didn't really take off till 2014). If you could see your way to making AVX (i.e. 8 year old hardware) the minimum requirement for your default docker image, you would be giving a 10X speedup to almost every user. |
@EdwardDixon We have a fair number of GATK users who are stuck with older hardware (including university clusters that they have no power to upgrade), and we can't just cut these users off by imposing such a minimum hardware requirement. The best we can do is to use AVX when it's available, and fall back to slower codepaths when it's not. Also, actual crashes in native code impose a significant support burden on our comms team, as they are often hard to diagnose and deal with. Things like |
Aside from the users with old hardware, very few of the GCS zones guarantee processors that support AVX, which would lead to sporadic failures except in central-1f, for example. |
Closing this in favor of #5291 |
The Intel-optimized version of TensorFlow 1.9 is now the default for Anaconda users. It now supports all processors with AVX - so everything since Sandy Bridge, which was released in 2011. With that in mind, I was thinking we could dispense with two different conda environments and fold everything into the
gatk
environment. @samuelklee , I'm the new guy on the Intel team you've been dealing with.