-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Numpy] Add qr backward for wide inputs with nrows < ncols #18757
Conversation
Hey @D-Roberts , Thanks for submitting the PR
CI supported jobs: [website, miscellaneous, centos-gpu, windows-gpu, unix-cpu, unix-gpu, windows-cpu, edge, sanity, clang, centos-cpu] Note: |
@mxnet-bot run ci [edge, clang] |
Jenkins CI successfully triggered : [edge, clang] |
@szha Here is the replacement PR. What is necessary to be done about the codecov/project failure? |
@D-Roberts it's a bug and you can ignore it #18421 (comment) Could you elaborate on how the PR differs from the previous version? Do you mean the test was adapted? ("tests were re-verified for robustness") If not, do you know why the previous version failed the CI and your current version passes? Thank you |
Hi @leezu I also made the following changes to the tests (as compared to the previous PR):
The error that led to the "stale PR" failing CI was due to a calculation in the numerical Jacobian for central differences check. The code broke after updates to Numpy and MXNet Numpy when the source array is float32 and the the dtype used in view is float64.
|
I'm comfortable merging this. @leezu? |
Thank you @D-Roberts! |
As titled. This is a resubmit of #18197 . In addition, tests were re-verified for robustness.
The obtained gradient has the same values for a given input as with TensorFlow. The implemented method is similar to the method implemented in tf .
Here are cross-checked examples:
At high level the methodology is: partition/split the input A into 2 matrices X and Y and split matrix R (from A=QR decomposition) into 2 matrices U and V. Then X = QU and get X_grad by applying the gradient derivation from the square input case (m=n) with adjusted Q_grad. Also get Y_grad separately. Then A_grad is the concatenation of X_grad and Y_grad.
Changes