-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime] EdgeTPU runtime for Coral Boards #4698
Conversation
python/tvm/contrib/tflite_runtime.py
Outdated
@@ -18,7 +18,7 @@ | |||
from .._ffi.function import get_global_func | |||
from ..rpc import base as rpc_base | |||
|
|||
def create(tflite_model_bytes, ctx): | |||
def create(tflite_model_bytes, ctx, target_edgetpu=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a boolean argument, try to use a target string for future expansion: target='edge_tpu'/'cpu'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, I've made the changes
python/tvm/contrib/tflite_runtime.py
Outdated
return TFLiteModule(fcreate(bytearray(tflite_model_bytes), ctx)) | ||
fcreate = get_global_func("tvm.tflite_runtime.create") | ||
if target_edgetpu: | ||
fcreate = get_global_func("tvm.edgetpu_runtime.create") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these two create function share the same arguments, we can unify them as one create function with different returned runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unification here is less desired due to the fact that we won't always want to build the edgeTPU runtime when building the TFLite runtime. The limitation is that we need to build TVM with the edgeTPU library which comes in a separate repo; it's an extra software dependence that is not always wanted for users of vanilla TFLite.
ctx_ = ctx; | ||
} | ||
// Build interpreter | ||
if (tflite::InterpreterBuilder(*model, resolver)(&interpreter_) != kTfLiteOk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can define macro for TFLite error checking: CHECK_STATUS(cond, msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion, this should be fixed by now
for allocate, does tflite runtime removed the |
@ZihengJiang thanks for the feedback! The TFLite Interpreter still has |
29bb50b
to
bbcf752
Compare
@ZihengJiang I should have addressed all of your comments by now; let me know if you're happy with the changes |
Looks good! Thanks! @tmoreau89 |
@tmoreau89
And on the host machine, it says
The inference directly on the edge TPU works fine. |
|
This PR extends the TFLite runtime to support edgeTPU-equipped Coral boards in order to measure inference time of models on edgeTPU with TVM RPC.
Instructions to run the EdgeTPU runtime experiments
Coral Board setup
You'll need to follow these instructions: https://coral.ai/docs/dev-board/get-started/
Cross compile tflite static library on x86 machine
Build TVM runtime on Coral Board
Execute the RPC server on Coral
First, follow this guide to set up a tracker for your remote devices: https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#start-rpc-tracker.
On the coral, once TVM runtime has been built, execute:
Evaluate MobileNet on Coral board
Execute the following python script:
Upon running it, you'll get:
It took 143.74ms to run mobilenet
Now, set
target = "edge_tpu"
and you'll get:It took 3.22ms to run mobilenet
Notable interface changes
allocate()
method anymore, and tensor allocation is done as part of the initialization process.