Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging LC picks up PFS's from the disjoint RSBs at random #2521

Closed
agatsoh opened this issue Jan 21, 2021 · 20 comments
Closed

Staging LC picks up PFS's from the disjoint RSBs at random #2521

agatsoh opened this issue Jan 21, 2021 · 20 comments
Labels
bug 🕷️ Something isn't working

Comments

@agatsoh
Copy link
Contributor

agatsoh commented Jan 21, 2021

Thanks for filing a bug report :-)

Steps to Reproduce

Environment setup:
node1(LC) <----> hub(python client) <------> node2 (LC)

  1. establish bi-directonal channels between node1 and hub, node2 and hub
  2. Hub has been specifically configured to run on the following
    2.1 PFS on pfs.demo001.env.raiden.network
    2.2 Transport on transport.demo001.env.raiden.network
  3. Node1 and Node2 select the
    3.1 Transport as transport.demo001.env.raiden.network but
    3.2 PFS they sometimes select
    3.2 a pfs.demo001.env.raiden.network and this is time when the tranfer has correct route and the transfer goes through. At the time the nodes dont see the other PFSs and act as if pfs.demo001.env.raiden.network is the only PFS available. Unfortunately i did not take a screenshot for this.
    3.2 b Sometimes the nodes use all the other PFS's like pfs.transport0x.raiden.network (where x is a number) but do not see the pfs.demo001.env.raiden.network as if it never exists.

node1_disjoint_pfs
The next step they get No route between you and target
no_route_disjoint_screen

This causes one way transfers even though capacity exists either way.
Lots of users on gitter are facing the same issue and are utterly confused why they cannot make a simple transfer and why the hub is always offline or something which can happen when hub is on transport.transport0x.raiden.network and pfs.transport0x.raiden.network

Expected Result

  • Each time depending on the transport server that has been picked up the correct PFS should be picked up.

Actual Result

  • Has been described above
@taleldayekh taleldayekh added the bug 🕷️ Something isn't working label Jan 21, 2021
@taleldayekh taleldayekh added this to the Product Backlog milestone Jan 21, 2021
@taleldayekh taleldayekh removed this from the Product Backlog milestone Jan 21, 2021
@agatsoh
Copy link
Contributor Author

agatsoh commented Jan 26, 2021

Observations using local LC nodes as opposed to the staging ones
using the staging environment file .env.staging for locally deployed nodes
yarn workspace raiden-dapp run serve --mode staging

  1. the nodes pickup transport02 server for transport and query all the PFS's at the time of route selection.
  2. The hub needs to be on the transport0x servers otherwise the again we would get No route between you and the target above.

Finally if we use a environment file for example like this

VUE_APP_PFS=https://pfs.transport02.raiden.network
VUE_APP_MATRIX_SERVER=https://transport.transport02.raiden.network
VUE_APP_HUB=hub.raiden.eth
VUE_APP_ALLOW_MAINNET=false
VUE_APP_IMPRINT=https://raiden.network/privacy.html

and start the servers
yarn workspace raiden-dapp run serve --mode testnet

in this way the PFS selection is forced in the dapp to use only that one PFS which we have specified in the environment file.

LC_testnet_env_pfs_forced

I reach the conclusion that till we have this disjoint transport servers with staging environment not explicitly specifying a PFS this confusion is there to remain
@Talel

@taleldayekh
Copy link
Contributor

@weilbith Describe issue and our solution to the problem.

@taleldayekh taleldayekh assigned weilbith and unassigned agatsoh Feb 9, 2021
@weilbith
Copy link
Contributor

Alight, let's give this a try.

The root of all the described symptoms is that while the development and staging environment share the same contract deployment, their associated RSB deployments are different. Thereby we get an awkward 50% connection of both environments.

In our discussions we pointed out two possible solutions. Either make both fully connected (100%) or completely separate them as two (0% connection). Merging both environments completely seems to be a bad idea as they have different purposes. The staging environment is meant to be used for test network demonstrations of the Raiden clients. As a environment for external developers to try the system and get familiar with the libraries to build their own products on top. Therefore it is meant to be stable and reliably working. Though it has not as strict reliability constraints as the mainnet.
The development environment is meant to be used for internal engineering. Basically the step after the end-to-end tests, like for the Scenario Player. An environment where the RSB is always on latest version. Where unstable/unreleased features of the services can be tested. Thereby it is less stable and reliable than the staging environment. But all eventual issues stay internal and only hinder our own daily development flow. Deployments can be way less formal and hacky if necessary.

Therefore it is the currently favored approach to separate both networks. As the stable version the staging environment should stay as is and the development environments pops out. This requires us to

  • make a new contract deployment on Goerli
  • the current active services for the development need to change their contract pointers and probably a fresh state
  • all client configurations related to the different environments need to adopted

The following steps on this topic are:

  • align on the decision and solution approach
  • complete the task list
  • decide if each client implementation needs its own development environment
  • create actual issues for the tasks in the list
  • assign issues and prioritize them in the imperative order

@palango @andrevmatos could you eventually provide some feedback, especially regarding the task list, as I'm not familiar with the Raiden service bundle.

@palango
Copy link
Contributor

palango commented Feb 10, 2021

Looks good to me, I think it's important to first align on the approach, we can work out the details afterwards.

@andrevmatos
Copy link
Contributor

andrevmatos commented Feb 10, 2021

I think that should be about it. Some observations:

  • I'd like the both clients to use the same network, it makes it easier to cross-test.
  • at least on LC, the state key/db/name uses contains the TokenNetworkRegistry address, so a new deployment automatically generates a new state
  • Please, let's try to re-use as much contracts as possible; e.g. at least the tokens and service token contracts can be reused without issues, and save us calls to mint them again.
    • Maybe SecretRegistry is safe to be reused as well.
  • the clients (at least LC) picks automatically the goerli addresses from the JSON file from the raiden-contracts repo. I think the file there should remain pointing to the stable deployment, but then we need to ensure that the right contracts are used on development/staging servings of the LC and PC. The LC supports auto-detecting the addresses from a single entry-point contract (UDC address), we just need to expose this on dApp to pick from the env vars, and have them set in the development/staging .envs.

@weilbith
Copy link
Contributor

weilbith commented Feb 10, 2021

Looks good to me, I think it's important to first align on the approach, we can work out the details afterwards.

@palango who would you suggest to take into account? So we could just mention them here.

Please, let's try to re-use as much contracts as possible; e.g. at least the tokens and service token contracts can be reused without issues, and save us calls to mint them again.

Good idea. Should be safe to re-use the tokens from my perspective. In my opinion they are actually not part of the Raiden contracts themselves.

The LC supports auto-detecting the addresses from a single entry-point contract (UDC address), we just need to expose this on dApp to pick from the env vars, and have them set in the development/staging .envs.

Yes. I remember. We just have to do it. 😬

@weilbith
Copy link
Contributor

@eorituz @ezdac @karlb @ulope could you have a look from here and provide some feedback? We could also schedule a meeting if the opinions diverge too much or there are many questions.

@karlb
Copy link
Contributor

karlb commented Feb 11, 2021

I'd like the both clients to use the same network, it makes it easier to cross-test.

I agree.

  • Maybe SecretRegistry is safe to be reused as well.

It is and there are plans to reuse it even on mainnet for a long time: raiden-network/raiden-contracts#1182

e.g. at least the tokens and service token contracts can be reused without issues

Reusing the tokens is definitively a good idea and has been advocated multiple times already. It would cause less confusion and make the testnets more similar to the mainnet, since the mainnet also uses the same tokens on each deployment. The necessary changes to the deployment have just never been done.

@karlb
Copy link
Contributor

karlb commented Feb 11, 2021

Using different contract deployments will provide a nice clean split. Coordinating the contract changes across all the different repos could be error-prone, though. Does anyone have a clear view on how that would work in detail, yet?

@taleldayekh
Copy link
Contributor

@taleldayekh organise meeting about this.

@weilbith weilbith removed their assignment Feb 12, 2021
@karlb
Copy link
Contributor

karlb commented Feb 16, 2021

Another approach would be to reject PFSs that are not operating in the same federation (we could add a federation id to the info endpoint if the existing info is not sufficient) and look for another PFS. That behavior is useful anyway and would avoid the need to add any new configuration. This would be less prone to strange errors due to misconfiguration.

@taleldayekh taleldayekh added this to the Product Backlog milestone Feb 17, 2021
@konradkonrad
Copy link

konradkonrad commented Feb 22, 2021

I would suggest to move the demo environment to rinkeby.

Pros:

  • completely separate deployment
  • no extra code necessary

Cons:

  • don't know?

karlb added a commit to raiden-network/raiden that referenced this issue Mar 25, 2021
Allow switching between the two contract deployments on goerli and
default to the old/demo contracts. See
raiden-network/light-client#2521 on why we
want to do this.

Closes #6862
@karlb karlb self-assigned this Mar 26, 2021
karlb added a commit to karlb/raiden-services that referenced this issue Mar 26, 2021
karlb added a commit to karlb/scenario-player that referenced this issue Mar 26, 2021
karlb added a commit to raiden-network/scenario-player that referenced this issue Apr 1, 2021
karlb added a commit to raiden-network/raiden-services that referenced this issue Apr 8, 2021
karlb added a commit to karlb/raiden-services that referenced this issue Apr 8, 2021
Since raiden-network#951, we can
use the unstable contracts environment. We want to do that for the
development machines that are run using the docker-compose. Since that
environment only exists for goerli, we only enable it for the goerli
services.

Relates to raiden-network/light-client#2521
karlb added a commit to karlb/raiden-services that referenced this issue Apr 9, 2021
Since raiden-network#951, we can
use the unstable contracts environment. We want to do that for the
development machines that are run using the docker-compose. Since that
environment only exists for goerli, we only enable it for the goerli
services.

Relates to raiden-network/light-client#2521
ulope pushed a commit to raiden-network/raiden-services that referenced this issue Apr 9, 2021
Since #951, we can
use the unstable contracts environment. We want to do that for the
development machines that are run using the docker-compose. Since that
environment only exists for goerli, we only enable it for the goerli
services.

Relates to raiden-network/light-client#2521
@karlb
Copy link
Contributor

karlb commented Apr 12, 2021

The new contracts are set up and used in the services-dev services and the services on transport05 as well as by the SP CI development.json environment.

Remaining TODOs:

  • Update services on transport01-04 to use these contracts
  • Allow LC to use these contracts
  • Configure LC to use these contracts when run in SP CI
  • Demo env and development_environment: unstable work. Service discovery does not work in demo env (until the next contract deployment), but is also not used, usually. Decide if that is enough.
  • Should we try a different service when there is a contract mismatch between PFS and raiden node? Right now we fail with an error.
  • Update docs if neccessary

@karlb karlb removed their assignment Apr 12, 2021
karlb added a commit to raiden-network/raiden that referenced this issue Apr 23, 2021
Allow switching between the two contract deployments on goerli and
default to the old/demo contracts. See
raiden-network/light-client#2521 on why we
want to do this.

Closes #6862
@taleldayekh
Copy link
Contributor

Can we close this ones our next branch has been merged to master? @taleldayekh

@taleldayekh
Copy link
Contributor

Closing, next is noe merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🕷️ Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants