Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra #5484

Merged
merged 4 commits into from
May 5, 2020

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Apr 30, 2020

This PR refactors the RPC protocol to make it more modularized.

  • RPCSession: represent a set of features that need to be implemented
  • RPCEndPont: End point that forwards the RPCSession requests over a communication channel.
  • RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.

In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.

client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).

client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]

We can also implement other types of Sessions.
For example, we could make uTVM session a special case of the RPCSession
and use the same mechanism for session management.

We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.

Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol. It means that the client and server needs to be upgraded to together in order for it to function correctly.

This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.

@tqchen
Copy link
Member Author

tqchen commented Apr 30, 2020

@tqchen
Copy link
Member Author

tqchen commented Apr 30, 2020

@tqchen tqchen force-pushed the rpc branch 5 times, most recently from 50921a4 to c4f8342 Compare April 30, 2020 21:44
@tqchen
Copy link
Member Author

tqchen commented May 2, 2020

Update: added a minrpc implementation and example for using popen to create a rpc

@tqchen tqchen force-pushed the rpc branch 3 times, most recently from fbd5b2b to 3bc9db5 Compare May 2, 2020 18:27
This PR refactors the RPC protocol to make it more modularized.

- RPCSession: represent a set of features that need to be implemented
- RPCEndPont: End point that forwards the RPCSession requests over a communication channel.
- RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.

In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.

```
client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

```

Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).

```
client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]
```

We can also implement other types of Sessions.
As an example, We introduced a PopenSession that communicates with
the another process via a pipe.

We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.

Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol.
It means that the client and server needs to be upgraded to together in order for it to function correctly.

This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.
lib.TVMGetLastError.restype = ctypes.c_char_p
# Put the libpath to LD_LIBRARY_PATH
# will be useful for pipe session to find libtvm
os.environ["LD_LIBRARY_PATH"] = "%s:%s" % (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just to make minrpcserver work? if so, could we just add in the search path in that binary rather than changing LD_LIBRARY_PATH for all library loads here? or at least, can you revert the change when you're done?

i think this means that all hooks that invoke a subprocess inherit the modified LD_LIBRARY_PATH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it makes sense to make libtvm available to subprocesses, I think we can leave it as it is. While setting rpath for the binary would work if the binary is on the same machine, it may not work after we send it through an RPC, so it might be helpful to keep the env variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm worried that something else might be in the same directory as libtvm.so, which is where the subtle bug may come from. I don't think it's a good idea to muck with LD_LIBRARY_PATH unless you're being really explicit about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the change to LD_LIBRARY_PATH won't affert the execution after the python script exits. I still think it makes sense to make the libary path in the installation to be available for sub-processes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if i'm wrong, doesn't this mean that any python script that imports tvm and then launches a subprocess will launch a subprocess with LD_LIBRARY_PATH set? we really don't want that. if the subprocess depends on any library that happens to be in the same dir as libtvm.so, it will wind up loading that instead of the system-installed copy. this is a really subtle bug that is very difficult to track down.

@@ -90,7 +90,8 @@ def get_target_triple():
def cross_compiler(compile_func,
options=None,
output_format=None,
get_target_triple=None):
get_target_triple=None,
add_files=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little bit of a weird place (compiler configuration time) to add extra files (I.e. invocation configuration time). why can't this be pushed to the invocation of fcompile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is an API tradeoff. The main advantage of functor decorator style is that it is composable.

For example, we could pass use the following to attach minserver to the ndk compiler.

mod.export_library("xyz", tvm.rpc.with_minrpc(cc.ndk))

Another reason is that in the case of minrpc, we need to attach a special property(need_system_lib) to tell the export_library to check for the --system-lib option.

We might need a bit more thoughts to make the alternative API(as follows) work.

mod.export_library("xyz", cc.ndk, libs=[tvm.rpc.minrpc()])

Finally, such functor decorator is easier to work with in the AutoTVM(we just have to pass the decorated version as build_func, instead of passing an additional argument)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very attached to the argument though. Pehaps changig to something like default_libs or add_libs would alleviate the concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like based on how this is used now, the problem is that we are trying to make export_library build a binary in a few small cases (but, which will become increasingly common as µTVM is more integrated). i think it's reasonable for export_library to control the compiler invocation up to the point of generating a .o. beyond that ldflags and additional statically-linked dependencies (which will have their own cflags and compiler toolchains) seem like a bit much of an API abuse. maybe this will be better addressed by the compilation and linking work i'm doing? if so, is there an intermediate hack we can do that avoids adding to this API (if you agree that's the right direction to go)?

Copy link
Member Author

@tqchen tqchen May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mainly a difference in API choices.

  • A0 pushing more flags to export_library
  • A1 using functional style programming decorator, to decorate a existing compiler by adding ldflags implicitly. Pass in fcompile variants, or use fcompile->fcompile functors to decorate new compilers.

My take is that A1 is as powerful as A0, and the complexity of compiler customization goes to the construction process of fcompile. It also have the advantage of moving customization to a single place (fcompile) so that we don't have to worry about customizing the export_library step in autotvm and downstream toolchains that uses export_library.

We can certainly de-couple the APIs in A1. Right now compilation and linking are coupled, and a more powerful way to construct A1 might be as follows, and we can work toward that direction.

fcompile = attach_linker_opts(get_compiler())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I think de-coupling the APIs in A1 makes sense. if we do that though, then only autotvm (and other user scripts and any µTVM veneers) need to care about linker flags and linker scripts, correct? we can remove that complexity from export_library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your understanding is correct

python/tvm/error.py Outdated Show resolved Hide resolved
python/tvm/rpc/client.py Outdated Show resolved Hide resolved
os.chmod(path_exec, stat.S_IXUSR)
path_exec = os.path.abspath(path_exec)
else:
path_exec = os.path.abspath(binary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could at least do a existence check and check it is an executable file here, since otherwise i'm not sure what error the user might see from C? also, is abspath necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good pt about existence check. abspath makes the execution more consistent, e.g. if we need to chpwd in the subsequent functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i'm wondering why we would need to do that, and wondering if it's more motivated by your rpc server test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that there is none atm, we could certainly skip this part.

python/tvm/rpc/client.py Outdated Show resolved Hide resolved
The path to the min server library.
"""
curr_dir = os.path.dirname(os.path.realpath(os.path.expanduser(__file__)))
source_dir = os.path.abspath(os.path.join(curr_dir, "..", "..", ".."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a util function we can use for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary hack. Ideally we should ship min_server to a location (e.g. build) that get installed when we do setup.py install instead of relying on the availablity of the source. Or we just ship the source with the installation.

When we figure that out(possibly in micro tvm link), we can have a consolidated library as in the libinfo.find_library()

python/tvm/rpc/minrpc.py Outdated Show resolved Hide resolved
allow_clean_shutdown_ = true;

this->Read(&packet_len);
if (packet_len == 0) continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a max we should also test against?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now there is no maximum len limit


char* data_ptr;
int call_ecode = 0;
if (ctx.device_type == kDLCPU) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we don't need this check anymore, right? we always want the else case? should we add a CHECK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code can be used to host a TVM runtime with GPU support

this->ReadArray(dptr, num_bytes);
} else {
char* temp_data = this->ArenaAlloc<char>(num_bytes);
this->ReadArray(temp_data, num_bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the micro runtime, we will need to avoid this copy step, here and in CopyFromRemote. is there a corresponding change we need to make to TVMDevice API to facilitate that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only happens in the non-CPU case(which need the copy).

Copy link
Contributor

@areusch areusch May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for micro would we similarly override kDLMicroDev?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, on device microTVM rpc server, I believe it is better to set the device code to directly be kDLCPU(as it is CPU from the pt of view of the microcontroller).

If the code is kDLMicroDev, that means the current machine is a host machine that drives the micro tvm, and the temp memory is on the host

/*!
* \brief IOHandler based on posix API.
*/
class PosixIOHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there also be a flush? it would be helpful in implementing simpler e.g. ethernet, usb, etc transports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, perhaps we can add that as a followup.

@@ -2,9 +2,10 @@
__pycache__/
*.py[cod]
*$py.class

*.S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to ignore assembly files generated in commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. I guess this means we can't check in assembly without removing it, but we can do that later too.

@areusch
Copy link
Contributor

areusch commented May 4, 2020

I think this looks good to me now!

@tqchen
Copy link
Member Author

tqchen commented May 4, 2020

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually marking LGTM

@@ -2,9 +2,10 @@
__pycache__/
*.py[cod]
*$py.class

*.S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. I guess this means we can't check in assembly without removing it, but we can do that later too.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tqchen tqchen merged commit 95e06b3 into apache:master May 5, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 9, 2020
* Update dmlc-core which was mistakenly overriden

* [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra.

This PR refactors the RPC protocol to make it more modularized.

- RPCSession: represent a set of features that need to be implemented
- RPCEndPont: End point that forwards the RPCSession requests over a communication channel.
- RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.

In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.

```
client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

```

Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).

```
client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]
```

We can also implement other types of Sessions.
As an example, We introduced a PopenSession that communicates with
the another process via a pipe.

We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.

Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol.
It means that the client and server needs to be upgraded to together in order for it to function correctly.

This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.

* Address review comments

* Remove ld library path
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 18, 2020
* Update dmlc-core which was mistakenly overriden

* [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra.

This PR refactors the RPC protocol to make it more modularized.

- RPCSession: represent a set of features that need to be implemented
- RPCEndPont: End point that forwards the RPCSession requests over a communication channel.
- RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.

In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.

```
client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

```

Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).

```
client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]
```

We can also implement other types of Sessions.
As an example, We introduced a PopenSession that communicates with
the another process via a pipe.

We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.

Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol.
It means that the client and server needs to be upgraded to together in order for it to function correctly.

This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.

* Address review comments

* Remove ld library path
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 18, 2020
* Update dmlc-core which was mistakenly overriden

* [REFACTOR][RPC][PROCOTOL-CHANGE] Modularize the RPC infra.

This PR refactors the RPC protocol to make it more modularized.

- RPCSession: represent a set of features that need to be implemented
- RPCEndPont: End point that forwards the RPCSession requests over a communication channel.
- RPCModule: Exposes an RPCSession as an rpc device in the TVM Runtime API.

In the new design, the local machine is presented as a special case of RPCSession.
The remote is just another client session that calls into RPCEndPoint.
The RPC communication path is as follows.

```
client -> ClientSession -> EndPoint[client@n0]
-> networking[between n0 <=> n1]
-> EndPoint[server@n1] -> LocalSession[@n1]

```

Because of the new modular design, we can now chain more sessions together.
For example, we can now run the following proxy setup (testcase in test_runtime_rpc.test_session_constructor).

```
client -> ClientSession -> Endpoint[client@n0]
-> networking[between n0 <=> n1]
-> Endpoint[server@n1] -> ClientSession -> Endpoint[client@n1]
-> networking[between n1 <=> n2]
-> Endpoint[server@n2] -> LocalSession[@n2]
```

We can also implement other types of Sessions.
As an example, We introduced a PopenSession that communicates with
the another process via a pipe.

We also add more comments about the internal of the RPC.
The communication protocol is simplfied using a similar convention as PackedFunc.
This allows us to further reduce the amount of special remote syscalls.

Due to the major improvement and simplification, we are making a non-compatible update to the RPC protocol.
It means that the client and server needs to be upgraded to together in order for it to function correctly.

This PR also introduces a versioning mechanism to the current RPC procotol,
so that future upgrade will be produce more user friendly with error messages.

* Address review comments

* Remove ld library path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants