-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Numpy] Add qr backward part 2 for wide matrices with m < n #18197
Conversation
Hey @D-Roberts , Thanks for submitting the PR
CI supported jobs: [miscellaneous, centos-cpu, sanity, windows-gpu, website, centos-gpu, unix-cpu, edge, clang, windows-cpu, unix-gpu] Note: |
@haojin2 PR ready for review, tnx. |
@D-Roberts Thanks for your contribution! From now and on please @yzhliu for reviews of NumPy-related contributions since I'm moving my gravity from this project. |
Hi @yzhliu, can you take a look? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit about how the backward works when m < n
? Seems https://arxiv.org/pdf/1710.08717.pdf does not cover this case.
The code follows the idea in the reference Differential Programming Tensor Networks. At high level, partition/split the input A into 2 matrices X and Y and R (from A=QR decomposition) into 2 matrices U and V. Then X = QU and get X_grad by applying the gradient derivation from the square input case (m=n) with adjusted Q_grad. Also get Y_grad separately. Then A_grad is the concatenation of X_grad and Y_grad. |
@mxnet-bot run ci [centos-cpu, unix-cpu, windows-gpu] |
Jenkins CI successfully triggered : [unix-cpu, windows-gpu, centos-cpu] |
@mxnet-bot run ci [centos-cpu, unix-cpu] |
Jenkins CI successfully triggered : [centos-cpu, unix-cpu] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I added one more comment, trying to be more memory-efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
Hi @yzhliu - is there anything else you'd like me to do on this? tnx |
Any updates on this? |
Hello @yzhliu - are we planning to merge this soon? This particular case of differentiable QR can be useful on batch, in place of LQ, or SVD in recent computer vision research for solving least squares. |
Merged into master. Thanks @D-Roberts , @haojin2 |
This PR broke master CPU pipelines and blocks PRs (test_np_linalg_qr fails), see e.g. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-cpu/detail/master/2093/pipeline |
@D-Roberts @hzfan could you look into the issue that @ptrendx mentioned? If it can't be fixed in a couple of hours let's revert the change first. |
@hzfan Thank you for your prompt assistance, I appreciate it. @leezu @szha @DickJC123 I will resubmit a separate PR. For my future reference - what are your recommendations to avoid the "stale PR" situation? CI passed when first submitted about 3 months ago and I rebased and CI passed about 2 months ago when the PR was reviewed. All along I followed up on the PR every 2-3 weeks or so. |
@D-Roberts we will likely need to automate it so that stale CI checks are invalidated. In the meantime, if the PR sits for a long time, feel free to ping me or any other committer to get more attention on it. |
@D-Roberts the recommendation is to comment with "at mxnet-bot run ci [all]" where at is @ |
Description
This is the 2nd part of the QR backward implementation. The 1st part (merged) covered square and deep matrix shapes (nrows >= ncols) and part 2 now covers the remaining wide matrix shapes (ncols > nrows).
References:
Differential Programming Tensor Networks
The added test includes a numerical check (via central differences) of the analytical gradient since this is a novel implementation. The tests were run offline 1K times to insure against flakiness.
Checklist
Changes