-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[v1.7.x] backport mixed type binary ops to v1.7.x #18649
Conversation
Hey @BenjaminCHEN2016 , Thanks for submitting the PR
CI supported jobs: [centos-cpu, website, windows-cpu, unix-gpu, windows-gpu, edge, miscellaneous, unix-cpu, centos-gpu, clang, sanity] Note: |
306fcad
to
454d401
Compare
Thank you @BenjaminCHEN2016 to backport these fixes. If I understand correctly, this is the only remaining PR that fix the numpy operator and targeting to 1.7 release, am I right? @sxjscience |
6484b2a
to
02d4fbf
Compare
@BenjaminCHEN2016 @sxjscience there's some build error on windows platform as below, please help to take a look. It'll be great if it can be solved within 24h, Thanks a lot!
|
@ciyongch I will try to resolve this issue today. |
I think that might be a compiler issue? I tried on my windows machine and it can be compiled. @ciyongch @sxjscience |
Hi @BenjaminCHEN2016 , it looks more like a compilation error which probably introduced by the current complex expressions or something like that which ran out of memory instead of the compiler itself. |
@leezu Is the CI different between master and v1.7.x branch? It seems that windows CI on v1.7.x is an older version? |
Hi @ciyongch , the issue is related to the version of compiler. If using visual studio, could you try add |
Thanks for your information @wkcn , according to @BenjaminCHEN2016 the failure only happened in MXNet windows CI pipeline but not his local environment. There's a concern that if this is the case and still not able to pass the CI, do we still need to include this in v1.7.0? @sandeep-krishnamurthy @szha @sxjscience . |
Yes, we still need the change. It sounds like we may have missed some windows CI changes to backport. cc @ChaiBapchya to help clarify issues with windows CI |
Windows CI issues in master branch in late March were fixed by PRs :
Infrastructure-wise, all branches run on same infra [meaning Same AMI, Same instance types, etc] @leezu any thoughts on those backports of your Windows CI specific fixes [that have been targeted towards master in the above mentioned PRs]? |
84cd127
to
9060226
Compare
9060226
to
398b4c5
Compare
Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in apache#17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <[email protected]>
398b4c5
to
22de015
Compare
@ChaiBapchya Thanks! |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
Thanks a lot @BenjaminCHEN2016 for your effort to make this patch pass all the CI tests. So we already have all we need for 1.7 release right now. |
I think it was just missed. |
A person might reasonably assume that v1.8 == v1.7 plus select commits from the 1.x branch. Will this PR be added to v1.x and v1.8, or will v1.8 and other future 1.x releases be missing this functionality? I'm basically trying to cherry-pick commits on top of my "v1.7-ish" repo to get to v1.8. Do you think I should revert this PR commit on my repo to minimize future integration conflicts? |
If I remembered correctly, this PR only contains a minimum bug fix that are required for v1.7 other than the full features/bug fixes from master branch. If this is the case, then it might be better to try to pull back the full version of fix into v1.x as well as v1.8.x. Ping @BenjaminCHEN2016 to help confirm. |
* Fix Windows GPU CI (apache#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in apache#17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <[email protected]> * backport mixed type Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: vexilligera <[email protected]>
* * Fix einsum gradient (#18482) * [v1.7.x] Backport PRs of numpy features (#18653) * add zero grad for npi_unique (#18080) * fix np.clip scalar input case (#17788) * fix true_divide (#18393) Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> * [v1.7.x] backport mixed type binary ops to v1.7.x (#18649) * Fix Windows GPU CI (#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in #17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <[email protected]> * backport mixed type Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: vexilligera <[email protected]> * revise activations (#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <[email protected]> * Fail build_windows.py if all retries failed (#18177) * Update to thrust 1.9.8 on Windows (#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> Co-authored-by: Yijun Chen <[email protected]> Co-authored-by: vexilligera <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Tao Lv <[email protected]>
* * Fix einsum gradient (apache#18482) * [v1.7.x] Backport PRs of numpy features (apache#18653) * add zero grad for npi_unique (apache#18080) * fix np.clip scalar input case (apache#17788) * fix true_divide (apache#18393) Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> * [v1.7.x] backport mixed type binary ops to v1.7.x (apache#18649) * Fix Windows GPU CI (apache#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in apache#17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <[email protected]> * backport mixed type Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: vexilligera <[email protected]> * revise activations (apache#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (apache#18632) (apache#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <[email protected]> * Fail build_windows.py if all retries failed (apache#18177) * Update to thrust 1.9.8 on Windows (apache#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (apache#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> Co-authored-by: Yijun Chen <[email protected]> Co-authored-by: vexilligera <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Tao Lv <[email protected]>
* * Fix einsum gradient (#18482) * [v1.7.x] Backport PRs of numpy features (#18653) * add zero grad for npi_unique (#18080) * fix np.clip scalar input case (#17788) * fix true_divide (#18393) Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> * [v1.7.x] backport mixed type binary ops to v1.7.x (#18649) * Fix Windows GPU CI (#17962) Update Windows CI to use VS 2019 and enable x64 bit toolchain. Previously we are using an older 32 bit toolchain causing OOM errors during linking. Switching to x64 bit toolchain on the older VS version previously used by the CI was attempted in #17912 and did not work. Update to Cuda 10.2 as it is required by VS 2019. Switch to ninja-build on Windows to speed up build as ninja-build is now preinstalled. Remove logic to install cmake 3.16 on every PR as cmake 3.17 is now preinstalled. Add build retrials due to cuda thrust + VS2019 flakyness. Co-authored-by: vexilligera <[email protected]> * backport mixed type Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: vexilligera <[email protected]> * revise activations (#18700) * [v1.6] Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703) * Fix the monitor_callback invalid issue during calibration with variable input shapes * retrigger CI * Add UT for monitor check and disable codecov Co-authored-by: Tao Lv <[email protected]> * Fail build_windows.py if all retries failed (#18177) * Update to thrust 1.9.8 on Windows (#18218) * Update to thrust 1.9.8 on Windows * Remove debug logic * Re-enable build retries on MSVC (#18230) Updating thrust alone did not help. Similar issues (though less often) still occur with updated thrust, and also with nvidia cub. Tracked upstream at NVIDIA/thrust#1090 Co-authored-by: Ke Han <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> Co-authored-by: Yijun Chen <[email protected]> Co-authored-by: vexilligera <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Tao Lv <[email protected]> Co-authored-by: Leonard Lausen <[email protected]> Co-authored-by: Ke Han <[email protected]> Co-authored-by: Xingjian Shi <[email protected]> Co-authored-by: Hao Jin <[email protected]> Co-authored-by: Xi Wang <[email protected]> Co-authored-by: Yijun Chen <[email protected]> Co-authored-by: vexilligera <[email protected]> Co-authored-by: ciyong <[email protected]> Co-authored-by: Tao Lv <[email protected]>
Description
Backport mixed type binary ops to v1.7.x branch (Mentioned in #18641 #18648 #18653)
Mainly code for #18250 and #18523