-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra #5484
Conversation
50921a4
to
c4f8342
Compare
Update: added a minrpc implementation and example for using popen to create a rpc |
fbd5b2b
to
3bc9db5
Compare
This PR refactors the RPC protocol to make it more modularized. - RPCSession: represent a set of features that need to be implemented - RPCEndPont: End point that forwards the RPCSession requests over a communication channel. - RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API. In the new design, the local machine is presented as a special case of RPCSession. The remote is just another client session that calls into RPCEndPoint. The RPC communication path is as follows. ``` client -> ClientSession -> EndPoint[client@n0] -> networking[between n0 <=> n1] -> EndPoint[server@n1] -> LocalSession[@n1] ``` Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor). ``` client -> ClientSession -> Endpoint[client@n0] -> networking[between n0 <=> n1] -> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1] -> networking[between n1 <=> n2] -> Endpoint[server@n2] -> LocalSession[@n2] ``` We can also implement other types of Sessions. As an example, We introduced a PopenSession that communicates with the another process via a pipe. We also add more comments about the internal of the RPC. The communication protocol is simplfied using a similar convention as PackedFunc. This allows us to further reduce the amount of special remote syscalls. Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly. This PR also introduces a versioning mechanism to the current RPC procotol, so that future upgrade will be produce more user friendly with error messages.
python/tvm/_ffi/base.py
Outdated
lib.TVMGetLastError.restype = ctypes.c_char_p | ||
# Put the libpath to LD_LIBRARY_PATH | ||
# will be useful for pipe session to find libtvm | ||
os.environ["LD_LIBRARY_PATH"] = "%s:%s" % ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just to make minrpcserver work? if so, could we just add in the search path in that binary rather than changing LD_LIBRARY_PATH for all library loads here? or at least, can you revert the change when you're done?
i think this means that all hooks that invoke a subprocess inherit the modified LD_LIBRARY_PATH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it makes sense to make libtvm available to subprocesses, I think we can leave it as it is. While setting rpath for the binary would work if the binary is on the same machine, it may not work after we send it through an RPC, so it might be helpful to keep the env variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm worried that something else might be in the same directory as libtvm.so, which is where the subtle bug may come from. I don't think it's a good idea to muck with LD_LIBRARY_PATH unless you're being really explicit about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the change to LD_LIBRARY_PATH won't affert the execution after the python script exits. I still think it makes sense to make the libary path in the installation to be available for sub-processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct me if i'm wrong, doesn't this mean that any python script that imports tvm and then launches a subprocess will launch a subprocess with LD_LIBRARY_PATH set? we really don't want that. if the subprocess depends on any library that happens to be in the same dir as libtvm.so, it will wind up loading that instead of the system-installed copy. this is a really subtle bug that is very difficult to track down.
@@ -90,7 +90,8 @@ def get_target_triple(): | |||
def cross_compiler(compile_func, | |||
options=None, | |||
output_format=None, | |||
get_target_triple=None): | |||
get_target_triple=None, | |||
add_files=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little bit of a weird place (compiler configuration time) to add extra files (I.e. invocation configuration time). why can't this be pushed to the invocation of fcompile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is an API tradeoff. The main advantage of functor decorator style is that it is composable.
For example, we could pass use the following to attach minserver to the ndk compiler.
mod.export_library("xyz", tvm.rpc.with_minrpc(cc.ndk))
Another reason is that in the case of minrpc, we need to attach a special property(need_system_lib) to tell the export_library to check for the --system-lib
option.
We might need a bit more thoughts to make the alternative API(as follows) work.
mod.export_library("xyz", cc.ndk, libs=[tvm.rpc.minrpc()])
Finally, such functor decorator is easier to work with in the AutoTVM(we just have to pass the decorated version as build_func, instead of passing an additional argument)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very attached to the argument though. Pehaps changig to something like default_libs
or add_libs
would alleviate the concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like based on how this is used now, the problem is that we are trying to make export_library build a binary in a few small cases (but, which will become increasingly common as µTVM is more integrated). i think it's reasonable for export_library to control the compiler invocation up to the point of generating a .o. beyond that ldflags and additional statically-linked dependencies (which will have their own cflags and compiler toolchains) seem like a bit much of an API abuse. maybe this will be better addressed by the compilation and linking work i'm doing? if so, is there an intermediate hack we can do that avoids adding to this API (if you agree that's the right direction to go)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is mainly a difference in API choices.
- A0 pushing more flags to export_library
- A1 using functional style programming decorator, to decorate a existing compiler by adding ldflags implicitly. Pass in fcompile variants, or use fcompile->fcompile functors to decorate new compilers.
My take is that A1 is as powerful as A0, and the complexity of compiler customization goes to the construction process of fcompile. It also have the advantage of moving customization to a single place (fcompile) so that we don't have to worry about customizing the export_library step in autotvm and downstream toolchains that uses export_library.
We can certainly de-couple the APIs in A1. Right now compilation and linking are coupled, and a more powerful way to construct A1 might be as follows, and we can work toward that direction.
fcompile = attach_linker_opts(get_compiler())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I think de-coupling the APIs in A1 makes sense. if we do that though, then only autotvm (and other user scripts and any µTVM veneers) need to care about linker flags and linker scripts, correct? we can remove that complexity from export_library
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your understanding is correct
os.chmod(path_exec, stat.S_IXUSR) | ||
path_exec = os.path.abspath(path_exec) | ||
else: | ||
path_exec = os.path.abspath(binary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could at least do a existence check and check it is an executable file here, since otherwise i'm not sure what error the user might see from C? also, is abspath necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good pt about existence check. abspath makes the execution more consistent, e.g. if we need to chpwd in the subsequent functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess i'm wondering why we would need to do that, and wondering if it's more motivated by your rpc server test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that there is none atm, we could certainly skip this part.
The path to the min server library. | ||
""" | ||
curr_dir = os.path.dirname(os.path.realpath(os.path.expanduser(__file__))) | ||
source_dir = os.path.abspath(os.path.join(curr_dir, "..", "..", "..")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a util function we can use for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary hack. Ideally we should ship min_server to a location (e.g. build) that get installed when we do setup.py install
instead of relying on the availablity of the source. Or we just ship the source with the installation.
When we figure that out(possibly in micro tvm link), we can have a consolidated library as in the libinfo.find_library()
allow_clean_shutdown_ = true; | ||
|
||
this->Read(&packet_len); | ||
if (packet_len == 0) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a max we should also test against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now there is no maximum len limit
|
||
char* data_ptr; | ||
int call_ecode = 0; | ||
if (ctx.device_type == kDLCPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we don't need this check anymore, right? we always want the else case? should we add a CHECK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same code can be used to host a TVM runtime with GPU support
this->ReadArray(dptr, num_bytes); | ||
} else { | ||
char* temp_data = this->ArenaAlloc<char>(num_bytes); | ||
this->ReadArray(temp_data, num_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the micro runtime, we will need to avoid this copy step, here and in CopyFromRemote. is there a corresponding change we need to make to TVMDevice API to facilitate that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only happens in the non-CPU case(which need the copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so for micro would we similarly override kDLMicroDev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, on device microTVM rpc server, I believe it is better to set the device code to directly be kDLCPU(as it is CPU from the pt of view of the microcontroller).
If the code is kDLMicroDev, that means the current machine is a host machine that drives the micro tvm, and the temp memory is on the host
/*! | ||
* \brief IOHandler based on posix API. | ||
*/ | ||
class PosixIOHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will there also be a flush? it would be helpful in implementing simpler e.g. ethernet, usb, etc transports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, perhaps we can add that as a followup.
@@ -2,9 +2,10 @@ | |||
__pycache__/ | |||
*.py[cod] | |||
*$py.class | |||
|
|||
*.S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to ignore assembly files generated in commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok. I guess this means we can't check in assembly without removing it, but we can do that later too.
I think this looks good to me now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually marking LGTM
@@ -2,9 +2,10 @@ | |||
__pycache__/ | |||
*.py[cod] | |||
*$py.class | |||
|
|||
*.S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok. I guess this means we can't check in assembly without removing it, but we can do that later too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Update dmlc-core which was mistakenly overriden * [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra. This PR refactors the RPC protocol to make it more modularized. - RPCSession: represent a set of features that need to be implemented - RPCEndPont: End point that forwards the RPCSession requests over a communication channel. - RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API. In the new design, the local machine is presented as a special case of RPCSession. The remote is just another client session that calls into RPCEndPoint. The RPC communication path is as follows. ``` client -> ClientSession -> EndPoint[client@n0] -> networking[between n0 <=> n1] -> EndPoint[server@n1] -> LocalSession[@n1] ``` Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor). ``` client -> ClientSession -> Endpoint[client@n0] -> networking[between n0 <=> n1] -> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1] -> networking[between n1 <=> n2] -> Endpoint[server@n2] -> LocalSession[@n2] ``` We can also implement other types of Sessions. As an example, We introduced a PopenSession that communicates with the another process via a pipe. We also add more comments about the internal of the RPC. The communication protocol is simplfied using a similar convention as PackedFunc. This allows us to further reduce the amount of special remote syscalls. Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly. This PR also introduces a versioning mechanism to the current RPC procotol, so that future upgrade will be produce more user friendly with error messages. * Address review comments * Remove ld library path
* Update dmlc-core which was mistakenly overriden * [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra. This PR refactors the RPC protocol to make it more modularized. - RPCSession: represent a set of features that need to be implemented - RPCEndPont: End point that forwards the RPCSession requests over a communication channel. - RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API. In the new design, the local machine is presented as a special case of RPCSession. The remote is just another client session that calls into RPCEndPoint. The RPC communication path is as follows. ``` client -> ClientSession -> EndPoint[client@n0] -> networking[between n0 <=> n1] -> EndPoint[server@n1] -> LocalSession[@n1] ``` Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor). ``` client -> ClientSession -> Endpoint[client@n0] -> networking[between n0 <=> n1] -> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1] -> networking[between n1 <=> n2] -> Endpoint[server@n2] -> LocalSession[@n2] ``` We can also implement other types of Sessions. As an example, We introduced a PopenSession that communicates with the another process via a pipe. We also add more comments about the internal of the RPC. The communication protocol is simplfied using a similar convention as PackedFunc. This allows us to further reduce the amount of special remote syscalls. Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly. This PR also introduces a versioning mechanism to the current RPC procotol, so that future upgrade will be produce more user friendly with error messages. * Address review comments * Remove ld library path
* Update dmlc-core which was mistakenly overriden * [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra. This PR refactors the RPC protocol to make it more modularized. - RPCSession: represent a set of features that need to be implemented - RPCEndPont: End point that forwards the RPCSession requests over a communication channel. - RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API. In the new design, the local machine is presented as a special case of RPCSession. The remote is just another client session that calls into RPCEndPoint. The RPC communication path is as follows. ``` client -> ClientSession -> EndPoint[client@n0] -> networking[between n0 <=> n1] -> EndPoint[server@n1] -> LocalSession[@n1] ``` Because of the new modular design, we can now chain more sessions together. For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor). ``` client -> ClientSession -> Endpoint[client@n0] -> networking[between n0 <=> n1] -> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1] -> networking[between n1 <=> n2] -> Endpoint[server@n2] -> LocalSession[@n2] ``` We can also implement other types of Sessions. As an example, We introduced a PopenSession that communicates with the another process via a pipe. We also add more comments about the internal of the RPC. The communication protocol is simplfied using a similar convention as PackedFunc. This allows us to further reduce the amount of special remote syscalls. Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly. This PR also introduces a versioning mechanism to the current RPC procotol, so that future upgrade will be produce more user friendly with error messages. * Address review comments * Remove ld library path
This PR refactors the RPC protocol to make it more modularized.
In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.
Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).
We can also implement other types of Sessions.
For example, we could make uTVM session a special case of the RPCSession
and use the same mechanism for session management.
We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.
Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly.
This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.