Intel-optimized TF 1.9, AVX support #5142

EdwardDixon · 2018-08-29T12:23:48Z

The Intel-optimized version of TensorFlow 1.9 is now the default for Anaconda users. It now supports all processors with AVX - so everything since Sandy Bridge, which was released in 2011. With that in mind, I was thinking we could dispense with two different conda environments and fold everything into the gatk environment. @samuelklee , I'm the new guy on the Intel team you've been dealing with.

We only need 1 now that older hardware is supported.

cmnbroad · 2018-08-29T13:01:12Z

@EdwardDixon Thanks for trying this - it would be great if we were able to have a single conda env, but a couple of questions:

We'd need to understand the affect of this change on our build times. It looks like the travis builds are failing because the dependency downloads are resulting in so many progress messages that we're exceeding the allowable log length, probably because the download is either large or slow. I'm not sure if thats transient or not.
We try to carefully control the size of our (already sizable) docker image. We'll need to understand how this impacts that.
@lucidtronix Any thoughts on moving from tensorflow 1.4 to 1.9 ?

droazen · 2018-08-29T15:14:57Z

@EdwardDixon What is the behavior if AVX is not available? Does it crash/refuse to run, or is there a graceful automatic fallback to non-vectorized code? Without such a fallback in place, I'm not sure we could switch to it exclusively.

EdwardDixon · 2018-08-29T17:40:03Z

No AVX = it'll crash... sounds bad, I admit. But remember we are talking about running a deep neural network over a large dataset: is someone really going to want to do that on hardware that is 8 years old (pre-AVX)? This version of TensorFlow is now the default for all Anaconda users, which in practice probably means a sizeable fraction of the machine learning community, and so having minimum hardware requirements in line with theirs is perhaps not so unreasonable?

Another option would be to change the default: have the gatk enviroment use the accelerated TensorFlow (since almost everyone has AVX, and they can get a 10X or so speedup), but make a second environment available for people that want to try to run a deep neural network on very old hardware - gatk-old?

droazen · 2018-08-29T19:18:58Z

@EdwardDixon Well, you'd be surprised at some of the hardware we have to deal with. Even some machines here at the Broad don't have AVX. In general, our policy with hardware-dependent optimizations in GATK has been to insist on having a transparent fallback mechanism when the required hardware isn't present -- I'd really prefer not to start making exceptions to that rule. Could the Intel-optimized Tensorflow be patched to fall back to vanilla tensorflow when AVX is not present? Is that an option? Or could it at least be patched to not actually crash in that case?

EdwardDixon · 2018-08-30T15:42:51Z

This sounds like a good rule, in general. In this case though, if users are going to run deep neural networks, there is going to be a substantial computational burden, such that running them is unlikely to appeal to users with older hardware (about 95% of users who train these models use accelerators, for example - one reason Deep Learning didn't really take off till 2014). If you could see your way to making AVX (i.e. 8 year old hardware) the minimum requirement for your default docker image, you would be giving a 10X speedup to almost every user.

droazen · 2018-08-30T16:32:16Z

@EdwardDixon We have a fair number of GATK users who are stuck with older hardware (including university clusters that they have no power to upgrade), and we can't just cut these users off by imposing such a minimum hardware requirement. The best we can do is to use AVX when it's available, and fall back to slower codepaths when it's not.

Also, actual crashes in native code impose a significant support burden on our comms team, as they are often hard to diagnose and deal with. Things like SIGSEGV or SIGILL are a nightmare for our support staff. At a minimum we'd need a graceful failure with an easy-to-understand error message when AVX is not present rather than a crash, before we could make this the default in GATK.

ldgauthier · 2018-08-30T16:38:11Z

Aside from the users with old hardware, very few of the GCS zones guarantee processors that support AVX, which would lead to sporadic failures except in central-1f, for example.

droazen · 2018-10-15T18:13:43Z

Closing this in favor of #5291

EdwardDixon added 2 commits August 28, 2018 15:44

Making Intel-optimized TF 1.9 the default/only conda env option

b5e55d5

Now just 1 conda env

4a62588

We only need 1 now that older hardware is supported.

cmnbroad self-assigned this Aug 29, 2018

droazen requested a review from cmnbroad August 29, 2018 15:12

droazen self-requested a review August 30, 2018 18:12

droazen self-assigned this Aug 31, 2018

droazen mentioned this pull request Oct 9, 2018

AVX present? #5291

Merged

droazen closed this Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel-optimized TF 1.9, AVX support #5142

Intel-optimized TF 1.9, AVX support #5142

EdwardDixon commented Aug 29, 2018

cmnbroad commented Aug 29, 2018 •

edited

Loading

droazen commented Aug 29, 2018 •

edited

Loading

EdwardDixon commented Aug 29, 2018

droazen commented Aug 29, 2018 •

edited

Loading

EdwardDixon commented Aug 30, 2018

droazen commented Aug 30, 2018

ldgauthier commented Aug 30, 2018

droazen commented Oct 15, 2018

Intel-optimized TF 1.9, AVX support #5142

Intel-optimized TF 1.9, AVX support #5142

Conversation

EdwardDixon commented Aug 29, 2018

cmnbroad commented Aug 29, 2018 • edited Loading

droazen commented Aug 29, 2018 • edited Loading

EdwardDixon commented Aug 29, 2018

droazen commented Aug 29, 2018 • edited Loading

EdwardDixon commented Aug 30, 2018

droazen commented Aug 30, 2018

ldgauthier commented Aug 30, 2018

droazen commented Oct 15, 2018

cmnbroad commented Aug 29, 2018 •

edited

Loading

droazen commented Aug 29, 2018 •

edited

Loading

droazen commented Aug 29, 2018 •

edited

Loading