Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rpc.connect error #2475

Closed
tpoisonooo opened this issue Jan 21, 2019 · 8 comments
Closed

[BUG] rpc.connect error #2475

tpoisonooo opened this issue Jan 21, 2019 · 8 comments

Comments

@tpoisonooo
Copy link
Contributor

[Description]
According to Deploy the Pretrained Model on Raspberry Pi,
I have compiled the project with llvm-4.0, but deploy_model_on_rasp.py crashed on tvm/src/runtime/rpc/rpc_socket_impl.cc:47:

 47   CHECK(sock.Connect(addr))
 48       << "Connect to " << addr.AsString() << " failed";

It seems like some network error caused assertion, then I checked the commit log and revert it to e4b9f986dab8c48ba109a52106565fc4be6b67c4, recompile the project, it works well !

Please check the commit 0806b69e3fb136226fa1dafad00bd2c606cc998d carefully, here be dragons.

[Environment]
I compile tvm on macOS Mojave 10.14.2, deploy net.tar to Nvidia tx1.

[Document]
There is a spell error in Deploy the Pretrained Model on Raspberry Pi

    # target = tvm.target.create('llvm -devcie=arm_cpu -model=bcm2837 -target=armv7l-linux-gnueabihf -mattr=+neon')

device not devcie

@tpoisonooo tpoisonooo changed the title bug: remote.connect error bug: rpc.connect error Jan 21, 2019
@tpoisonooo tpoisonooo changed the title bug: rpc.connect error [BUG] rpc.connect error Jan 21, 2019
@eqy
Copy link
Contributor

eqy commented Jan 22, 2019

Can you share some more details of your network setup? Is it IPv6 or IPv4? We will need to add a regression test for this.

@tpoisonooo
Copy link
Contributor Author

tpoisonooo commented Jan 22, 2019

[Evnvironment]
IPv4, normal local area network:

tx1 <------- WI-FI ------> Router <------- WI-FI ------> MacOS

Nvidia tx1 IP: 10.100.31.193
MacOS IP: 10.100.31.156

[Operation]

  1. Open a MacOS terminal, login tx1, startup tvm server:
➜  awesome-department-document git:(master) ✗ ssh tx1
[email protected]'s password:
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.38-tegra aarch64)

 * Documentation:  https://help.ubuntu.com/

322 packages can be updated.
0 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Mon Jan 21 08:50:29 2019 from 10.100.31.156
ubuntu@tegra-ubuntu:~$ python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090
INFO:root:If you are running ROCM/Metal, fork will cause compiler internal error. Try to launch with arg ```--no-fork```
INFO:RPCServer:bind to 0.0.0.0:9090

tx1 is reachable and it is listening 9090:

ubuntu@tegra-ubuntu:~$ netstat -anpl | grep 9090
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:9090            0.0.0.0:*               LISTEN      23861/python
  1. Execute deploy_model_on_rasp.py, this python code has been changed, in accordance to aarch64 and my network setup:
    https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/deploy_model_on_rasp.py

If this bug cannot reappear stably, please tell me the log you need.

@eqy
Copy link
Contributor

eqy commented Jan 22, 2019

Great, thanks. I have a macOS machine and will try to reproduce this tomorrow.

@eqy
Copy link
Contributor

eqy commented Jan 22, 2019

I have reproduced the bug, it seems to be related to the wrong size being passed into Connect, will submit a PR later today.

@eqy
Copy link
Contributor

eqy commented Jan 22, 2019

A fix is in #2484 that you can try. Unfortunately due to another issue in CI (#2480, #2482), CI and merging of the fix into mainline might be a little delayed.

@eqy
Copy link
Contributor

eqy commented Jan 23, 2019

Also, feel free to submit a PR for the spelling fix.

@tpoisonooo
Copy link
Contributor Author

By the way, what is the meaning of "PR" ?

@eqy
Copy link
Contributor

eqy commented Jan 23, 2019

(pull request)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants