-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen] Generate blob use LLVM directly #4657
Conversation
92b9d98
to
4f1772c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Halfway done. Will spend some time to review the llvm part. The major concern I have so far is that I think we probably don't need to have the setter and getter in Target
as we only use each of them once and we know where we need it.
@tqchen @zhiics About the LLVM target part, I also think current way maybe is not good. I also think of it and cost my some time. So I want to discuss it too. Firstly Let me describe why we need llvm target host. When we use PackImportToLLVM, we will need to use LLVM to create LLVM module. However, we must need one target machine, this is answered to @tqchen 's question. We can not create one generic target. Because we have platform's specific feature to handle. See:https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L107 and https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L167. Another thing is if we leave the target is empty, i.e. no module->setTargetTriple(target_triple.str()); On llvm 6.0 of Mac will report problem : assert error Target-incompatible DataLayout. More import thing is we need to consider the target host is not x86 cpu. For example, target host is llvm -target aarch64, if we leave it empty and build it on x86 server, we will generate devc.o into x86 format, but in fact we want aarch64 of devc.o. So in the codegen_blob part, we should know the correct llvm target host to generate correct file. Current way is a little ugly I think too. Current way will create target host based on LLVM when we have detected target host finally (because tvm.build / relay.build's api could leave |
If we indeed needs to pass target triple, perhaps the best way to do so is to pass that as an argument of PackImportToLLVM |
@tqchen So, your opinion is let the users specify. The API will become this:
If users doesn't set if users want to compile into another os / another cpu, they have to do this: Does it make sense to you? |
Hmm, i meant to have someway to automatically discover the triple and pass around, let us think a bit more about it. For example, if there is already an LLVM module in the modules, then we can get that from that module. |
Some ideas about automatic detection, in the order of things that can be tried
|
Good idea. One quick question, how about Windows's compiler |
if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c |
Yes. We could assume the target is Windows for sure. However, i think besides we need to know win32 / win64, one more thing I am a little worried. Windows could run ARM CPU ( |
Ah...One tricky way I just see on the YoutuBe...
could solve most of our cases. How about your opinion? @tqchen |
sounds good @FrozenGene i agree with all your points |
8fed25b
to
c24fddf
Compare
dc08bfa
to
0da6116
Compare
0da6116
to
f354a00
Compare
Thanks @FrozenGene @tqchen |
This is one prior work of Module based Model Runtime Interface RFC. In this RFC, we want to serialize params (weight) into shared library directly.
However, previous way of serializing the blob is to use
PackImportToC
, i.e. we will serialize the blob into char array__tvm_dev_blob
, which stored the serialized result (0x...0x...), then we write this intodev.cc
and pass to the compiler compiling. If we don't serialize params, it is ok. However, when we serialize params, the__tvm_dev_blob
anddev.cc
will become very large (resnet18 workload ofdev.cc
will be 380M, without serializing weight is 2.6M). The compiling time will increase very much (Even 10x slower).According to the investigation, the bottleneck is the parsing time (see GCC time report).
So we decide to avoid the C++ compiler (i.e. avoid the parsing time) and generate LLVM module directly. After testing, even we we serialize the params into shared library, the compiling time of generating blob is very fast. The compiling time of resnet18 on one cloud machine I tested before could boost from 128.46s to 13.28s. see previous test
After this work, I find it also have some a little improvement of compiling time compared previous way testing on my local machine (Intel i7-7700 CPU @ 3.60GHz).
Workload is restnet18 on CUDA. Run 10 times and get the average
@tqchen @zhiics @yzhliu @icemelon9