-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix amalgamation failure. #15303
Fix amalgamation failure. #15303
Conversation
Solution to the problem: |
@Roshrini @pengzhao-intel Please review this change as soon as possible. This is an important fix. |
@ZhennanQin @TaoLv to help review this change :) |
Can you elaborate more? What's the mxnet customized nnvm version?
Also, do you think we need unit test to cover this issue? |
@TaoLv I think amalgamation does need a lot of unit testing. Because it is basically broken right now. We did a lot of bash script to modify the codes for building on each platform. There is a So I have to include In the future, I think there should be only one nnvm dependency in 3rd party directory. This is the ultimate solution. My solution is temporary that works. |
To reproduce the error. Simply load a model and perform forward prediction on Mac with amalgamation static/shared build. Basically, it doesn't work without my fixes and I don't think there is any test cases to cover amalgamation predictions on mobile and desktops. @TaoLv |
@TaoLv @pengzhao-intel Check out our work at BrainCoTech/mxnet-prebuild. It's still at early stage but this allows us to run a mxnet model in the same code on multiple platforms. We are working on automated Windows desktop builds now. |
Just a side question, have you ever tried MXNet MKL-DNN build in your project? @adis300 |
@TaoLv We just tried a general mxnet build with make and most works are on amalgamation without MKL-DNN. Because we want to run our python generated models on all desktops/mobile platforms with the same C code. We are working on Windows amalgamation builds now and will try MKL-DNN very soon. |
@eric-haibin-lin Can you please take a look at this PR. It is an important bug fix and has been there for half a month. The current master branch basically doesn't run c_predict_api.h forward pass. |
@eric-haibin-lin @TaoLv Could you please review this PR? Thanks! |
@TaoLv Any updates? |
As I said here: #15303 (comment), the changes look good me. But I still think you need get approval from more experienced committer. @szha @eric-haibin-lin @wkcn Do you have any idea? PS: Could you please rebase the code? I can see the last run of CI is more than one month ago. I can approve if it passes. |
@TaoLv @eric-haibin-lin @szha All checks passed after rebase onto latest master. |
@TaoLv It's been a month since this branch is rebased. I am pretty sure this feature branch fixes amalgamation failure and there is currently not test case that covers this part. Please merge it so that I don't have to constantly rebasing this feature to use in our product. We have to do custom builds for for multiple platforms whenever we want to update mxnet, which is very painful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we intend to keep this feature (matters not to me either way), we should at least consider having a test that runs a simple mnist train or inference (if it doesn't already exist).
I think we're still running Amalagamation tests: https://github.com/apache/incubator-mxnet/blob/master/tests/nightly/Jenkinsfile#L76 |
Hey @TaoLv since we have an approval, could you assist the author to bring this PR to completion and assist them with CI and other questions? |
Thanks @marcoabreu . @TaoLv I will try to rebase this branch and test in the next few days. Since it is very behind now. Hope the merge request can be handled responsively after rebasing because rebasing. I have rebased it a few times already and it takes some decent amount of effort. |
… the original tvm/nnvm.
@marcoabreu @TaoLv I have just rebased the feature onto the latest master branch and resolved related conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your contribution!
Referring to issue #14808
amalgamation predict api fails when loading a network. The error messages are attached.
However, changing MXPlanMemory to PlanMemory would not resolve the issue. Because amalgamation uses tvm/nnvm. and mxnet core engine uses its own customized nnvm version.
Simply using tvm/nnvm/plan_memory would cause Check failure.
Therefore I added mxnet/nnvm to the C predict API amalgamation list.
And refactored some names to avoid redeclaration.