Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate m3 dsmgpty with hypervisor and various fixes. #166

Merged
merged 33 commits into from
Mar 4, 2020
Merged

Integrate m3 dsmgpty with hypervisor and various fixes. #166

merged 33 commits into from
Mar 4, 2020

Conversation

evanlinjin
Copy link
Contributor

@evanlinjin evanlinjin commented Feb 21, 2020

Did you run make format && make check?

Yes

Changes (closes #155):

  • Fixed RPCClientDialer logic to not return on hypervisor disconnect.
  • Updated hypervisor config format.
  • Added hypervisor /pty/{pk} endpoint.
  • Added logic to prepare unix file.
  • Improved logging.

How to test this:

Changes (fixes #160, fixes #161, fixes #171):

  • managedTransport.close() now optionally calls mt.wg.Wait() to avoid hangs if called in mt.Serve().
  • transport.Manager deletes mTp entry on SaveTransport if mTp has already stopped serving.
  • Improved various logs.
  • Added some comments.

How to test this:

  • Checkout repo skywire-services@feature/m3-dmsgpty-hypervisor
  • Uncomment replace statements in go.mod for skywire-services and skywire-mainnet repos.
  • Run the generic integration ENV.
  • Read further instructions in the respective tickets.

Changes (closes #68):

  • mTp.redial exponential backoff.
  • mTp.updateStatuses looks at discovery response codes to see if tp is still registered.

@jdknives
Copy link
Member

Please add some documentation for @Senyoret1 before merging so that he can integrate these changes with the frontend.

志宇 added 8 commits February 24, 2020 22:27
The following has been commented out:

* WIP: Visor endpoints exposed for hypervisor switched to RESTful.

* WIP: Various new helper functions for having http over dmsg.
* Updated hypervisor config format.

* Added hypervisor /pty/{pk} endpoint.

* Added logic to prepare unix file.

* Improved logging.
@evanlinjin evanlinjin marked this pull request as ready for review February 27, 2020 08:57
@evanlinjin evanlinjin changed the title [WIP] Integrate m3 dsmgpty with hypervisor pty web ui. Integrate m3 dsmgpty with hypervisor pty web ui. Feb 27, 2020
@evanlinjin evanlinjin changed the title Integrate m3 dsmgpty with hypervisor pty web ui. Integrate m3 dsmgpty with hypervisor. Feb 27, 2020
Copy link
Contributor

@nkryuchkov nkryuchkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Regarding the services repo, we also need to adjust integration/hypervisor.json for hypervisor config changes. The script for testing the /exec endpoint is to be modified.

I'm getting the following error on attempt to test the PR:

[2020-02-27T21:15:59+01:00] INFO [snet.dmsgC]: ClientSession.acceptStream() temporary error, continuing... error="dmsg error 306 - request has no associated listener" session=035915c609f71d0c7df27df85ec698ceca0cb262590a54f732e3bbd0cc68d89282

Did I set up anything wrong?

pkg/hypervisor/config.go Outdated Show resolved Hide resolved
pkg/hypervisor/hypervisor.go Outdated Show resolved Hide resolved
pkg/hypervisor/hypervisor.go Show resolved Hide resolved
pkg/visor/rpc.go Outdated Show resolved Hide resolved
pkg/visor/visor.go Outdated Show resolved Hide resolved
pkg/visor/visor.go Outdated Show resolved Hide resolved
pkg/visor/visor.go Show resolved Hide resolved
@evanlinjin
Copy link
Contributor Author

evanlinjin commented Feb 28, 2020

Good job!

Regarding the services repo, we also need to adjust integration/hypervisor.json for hypervisor config changes. The script for testing the /exec endpoint is to be modified.

I'm getting the following error on attempt to test the PR:

[2020-02-27T21:15:59+01:00] INFO [snet.dmsgC]: ClientSession.acceptStream() temporary error, continuing... error="dmsg error 306 - request has no associated listener" session=035915c609f71d0c7df27df85ec698ceca0cb262590a54f732e3bbd0cc68d89282

Did I set up anything wrong?

@nkryuchkov Did you test this with skywire-services@feature/m3-dmsgpty-hypervisor? I've slightly updated the integration env to accommodate these changes.

志宇 added 2 commits February 28, 2020 15:08
* managedTransport.close() now optionally calls mt.wg.Wait() to avoid hangs if called in mt.Serve().

* transport.Manager deletes mTp entry on SaveTransport if mTp has already stopped serving.

* Improved various logs.

* Added some comments.
@evanlinjin evanlinjin changed the title Integrate m3 dsmgpty with hypervisor. Integrate m3 dsmgpty with hypervisor and various fixes. Feb 28, 2020
@nkryuchkov
Copy link
Contributor

@evanlinjin Yes. I'll have a look again, thanks.

@nkryuchkov
Copy link
Contributor

@evanlinjin

I've tested again. The hypervisor.json file in skywire-services@feature/m3-dmsgpty-hypervisor still needs fixing. Instead of

"interfaces": {
  "http_address": ":8080",
  "rpc_addr": ":7080"
}

we need

"http_addr": ":8080"

An error related to accessing the removed /exec endpoint is still present.

Testing against #160, got the following:

Make the calls of step 3 again. You will see the transport is removed from the 0348c... visor, due to the call made on step 4. However, the transport will still be present in the 024ec... visor, which appears to be an error.

Reproduced. Is it intended behavior now?

Repeat the call from step 2, to create the transport again. This time the call will take much more time. In my case the first call is completed in just some ms, but the second one takes about 20 secs.

Could not reproduce. It seems to be fixed.

Make the calls of step 3 again. You will see the transport on the 0348c... visor, but the request for the 024ec... will never be completed.

Could not reproduce.

Call GET /api/visors to get the info about all the visors. The call will never return a response.

Could not reproduce.

@evanlinjin
Copy link
Contributor Author

Note that this PR should pass manual integration tests specified in #152 as well.

志宇 added 4 commits March 2, 2020 20:18
* transport.Manager now deletes from 'tps' when the associated mTp stops serving.

* The initiating settlement handshake no longer updates transport discovery status. This logic is now moved to ManagedTransport.

* ManagedTransport now has a better mechanism to update statuses.

* Only the least-significant edge of a transport can redial the underlying connection.
@evanlinjin
Copy link
Contributor Author

evanlinjin commented Mar 2, 2020

@evanlinjin

I've tested again. The hypervisor.json file in skywire-services@feature/m3-dmsgpty-hypervisor still needs fixing. Instead of

"interfaces": {
  "http_address": ":8080",
  "rpc_addr": ":7080"
}

we need

"http_addr": ":8080"

An error related to accessing the removed /exec endpoint is still present.

Testing against #160, got the following:

Make the calls of step 3 again. You will see the transport is removed from the 0348c... visor, due to the call made on step 4. However, the transport will still be present in the 024ec... visor, which appears to be an error.

Reproduced. Is it intended behavior now?

Repeat the call from step 2, to create the transport again. This time the call will take much more time. In my case the first call is completed in just some ms, but the second one takes about 20 secs.

Could not reproduce. It seems to be fixed.

Make the calls of step 3 again. You will see the transport on the 0348c... visor, but the request for the 024ec... will never be completed.

Could not reproduce.

Call GET /api/visors to get the info about all the visors. The call will never return a response.

Could not reproduce.

I think I fixed these. Please retest!

Copy link
Member

@jdknives jdknives left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are these logs supposed to remain on [Info] level:

[2020-03-02T22:53:37+01:00] INFO [router]: Removed 0 rules
They pop up every 2 seconds it seems.

  1. Fixes [M2] DMSG transports fail to deliver packets after visor restart #171 in integration env.

  2. Does not fix Routes are not being created again #161 for me. But I saw that there is another PR from @Darkren trying to fix that.

@Darkren
Copy link
Contributor

Darkren commented Mar 3, 2020

Confirm fix of #171 . Both STCP and DMSG transports work with all of my test scenarios

@evanlinjin

This comment has been minimized.

* mTp.redial exponential backoff.

* mTp.updateStatuses looks at discovery response codes to see if tp is still registered.

* Various logging improvements.

* tpDisc.Client: Made errors analyzable.
Copy link
Contributor

@nkryuchkov nkryuchkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job overall!

pkg/hypervisor/hypervisor.go Show resolved Hide resolved
pkg/transport/manager.go Outdated Show resolved Hide resolved
pkg/visor/rpc.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
cmd/hypervisor/commands/root.go Outdated Show resolved Hide resolved
pkg/transport/managed_transport.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Darkren Darkren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Makefile Outdated Show resolved Hide resolved
pkg/router/router.go Outdated Show resolved Hide resolved
@evanlinjin
Copy link
Contributor Author

This is meant to be proof that this PR in fact does fix #160, #161 and #171.
However, YouTube is not showing it in 4k yet.
https://www.youtube.com/watch?v=IUt6ZMzkaBo

Copy link
Contributor

@nkryuchkov nkryuchkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, really great job!

httpAddr = config.Interfaces.HTTPAddr
rpcAddr = config.Interfaces.RPCAddr
)
conf := prepareConfig(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function looks nice! 🙂

Copy link
Member

@jdknives jdknives left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a really great job! All of the issues seem fixed in integration env. Will test in deployment after merge. Only small issues that we may fix:

  1. The terminal art or previous bash history is disappearing bit by bit when using arrow keys to scroll through history. (Note where the cursor is)

Bildschirmfoto 2020-03-04 um 10 51 43

  1. There does not seem to be an attempt to actually redial transport upon remote shutdown. When starting remote visor though, the transport is initialized properly by the remote, which means there is no problem with overall functionality right now. Also, we still get this error log upon redial from remote after restart.
[2020-03-04T10:52:57+01:00] WARN [tp:031b80]: Failed to read packet. error="EOF" src="read_loop"
[2020-03-04T10:53:09+01:00] INFO [tp_manager]: recv transport connection request: type(dmsg) remote(031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055)
[2020-03-04T10:53:09+01:00] DEBUG [tp_manager]: TP found, accepting...
[2020-03-04T10:53:09+01:00] DEBUG [tp:031b80]: Performing settlement handshake...
[2020-03-04T10:53:09+01:00] ERROR [transport]: Failed to register transports error="(500)Internal Server Error: {"error":"ID already registered"}

Overall however, great work.

@jdknives jdknives merged commit 52fe903 into skycoin:milestone2 Mar 4, 2020
@evanlinjin
Copy link
Contributor Author

Overall a really great job! All of the issues seem fixed in integration env. Will test in deployment after merge. Only small issues that we may fix:

1. The terminal art or previous bash history is disappearing bit by bit when using arrow keys to scroll through history. (Note where the cursor is)
Bildschirmfoto 2020-03-04 um 10 51 43
1. There does not seem to be an attempt to actually redial transport upon remote shutdown. When starting remote visor though, the transport is initialized properly by the remote, which means there is no problem with overall functionality right now. Also, we still get this error log upon redial from remote after restart.
[2020-03-04T10:52:57+01:00] WARN [tp:031b80]: Failed to read packet. error="EOF" src="read_loop"
[2020-03-04T10:53:09+01:00] INFO [tp_manager]: recv transport connection request: type(dmsg) remote(031b80cd5773143a39d940dc0710b93dcccc262a85108018a7a95ab9af734f8055)
[2020-03-04T10:53:09+01:00] DEBUG [tp_manager]: TP found, accepting...
[2020-03-04T10:53:09+01:00] DEBUG [tp:031b80]: Performing settlement handshake...
[2020-03-04T10:53:09+01:00] ERROR [transport]: Failed to register transports error="(500)Internal Server Error: {"error":"ID already registered"}

Overall however, great work.

  1. I'm not sure what can be done.
  2. Only the least-significant edge is responsible for redialing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants