-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] FIFO buffer op, to accelerate sequence modeling with dilated convolutions #4039
Conversation
TODO.
|
cc @vinx13 @merrymercy would be great if you can help comment and review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I will have to look into the details to understand the compute, but overall looks good to me. Will do one more round by tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I left some minor reviews. Otherwise, looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @vinx13 Can you take another look?
…onvolutions (apache#4039) * Add FIFO buffer op to enable explicit computation re-use in convolution * Add a test * Add end-to-end test with 1D convolution * Add a stub in MXNet frontend * Address reviewer comments * Add back stub for MXNet frontend
…onvolutions (apache#4039) * Add FIFO buffer op to enable explicit computation re-use in convolution * Add a test * Add end-to-end test with 1D convolution * Add a stub in MXNet frontend * Address reviewer comments * Add back stub for MXNet frontend
Motivation. Dilated convolutions have appeared as an effective alternative to recurrent units in modeling sequences. For example, WaveNet [1] uses a stack of dilated convolutional layers to generate raw audio waveforms from text. Snips [2] modifies the WaveNet architecture to detect a keyword in an audio stream.
In order to capture temporal context, the WaveNet architecture feeds a sliding window over the input sequence into the first convolutional layer. As noted in [2] and [3], computing convolution over the sliding window results in redundant computation:
This pull request implements a FIFO buffer operator where intermediate outputs are cached from each convolutional layer, so as to eliminate redundant computation. This is like [4], except that here the re-use is explicit and inherent in the model. Note that caching is only applicable in inference time (so not applicable to training).
Semantics. The FIFO buffer op should behave like
Usage. See
topi/tests/python/test_fifo_buffer.py
Limitation. Currently, the buffer op exists only in TOPI. To make it useful, we want to merge it into MXNet and other frameworks. Alternatively, we could conceivably implement a custom pass in Relay so that the user can annotate a stack of convolutional layers.
References
[1] "WaveNet: A Generative Model for Raw Audio." https://arxiv.org/abs/1609.03499
[2] "Efficient keyword spotting using dilated convolutions and gating" https://arxiv.org/abs/1811.07684
[3] "Fast Wavenet Generation Algorithm" https://arxiv.org/abs/1611.09482
[4] "Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse" https://dl.acm.org/citation.cfm?id=3330384
Special thanks to Thibaud Senechal (Amazon) for initially suggesting the concept of FIFO buffer.
cc @yongwww @wweic @zhiics @kevinthesun @anijain2305