Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think of more sophisticated algorithm than blocking XCM channel between sending chain and bridge hub #2315

Closed
svyatonik opened this issue Aug 2, 2023 · 14 comments
Assignees

Comments

@svyatonik
Copy link
Contributor

#2312 has backported the pallet-xcm-bridge-hub router 1:1 from v1 version (#2294). But @bkontur has raised a valid concern here:#2313 (comment). E.g. if the same chain is bridging with two different consensus systems, is it fine to block both bridges if at least one lane in one of bridges is blocked. With single remote consensus, it is maybe fine, because logically they're using the single channel across the pipeline - it is a bridge specific that messages are delivered over multiple lanes. But if we are bridging with other consensus system, then both bridges will stall until we resolve a single lane issue. It doesn't looks good to me.

OTOH for the sending chain, the channel for all outbound messages is the same - it is the channel to the sibling/child bridge hub. So maybe it is fine?

This needs some more thinking, so I'm filing the separate issue.

@svyatonik
Copy link
Contributor Author

Just thoughts:

  1. so we need to implement logical XCM channels over single physical. Does it violate the "XCM messages are guaranteed to be delivered and interpreted accurately, in order"? I mean - if we'll be pausing Kusama -> Polkadot messages, but deliver Kusama -> Ethereum messages, is it fine? In other words, this "in order" guarantee works for a logical channel, or for a physical channel;
  2. ideally, we'll want to support those logical channels directly in HRMP/DMP/UMP. I.e. in addition to Resume and Suspend signals, we'll add Resume(<some-id>) and Suspend(<some-id>) and use it when required;
  3. other option is to send XCM messages back to the sending chain. But it has its own drawbacks, so I'd prefer to avoid that.

@bkontur
Copy link
Contributor

bkontur commented Aug 2, 2023

well, yes, actually with HRMP which uses recipient/sender as ParaId we dont have a possibility to differ,
and I dont know about coming pallet_message_queue,

e.g. for handling MessageDispatch we added BridgeInstance (which is pallet_bridge_message::Pallet instance id) here

our actual Bridges: ExporterFor configuration uses bridge location as parachain:

MultiLocation { parents: 1, X1(Parachain(1002))

so it seems, that we should add also target pallet instance for sending, like:

MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))

e.g. 53 is pallet_bridge_messages instance for Kusama->Polkadot bridging here

so e.g. for ethereum we will have different pallet:

MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

but I dont know yet how pallet_message_queue works and if it works with MultiLocation instead of HRMP recipient/sender* as ParaId?

@svyatonik
Copy link
Contributor Author

@bkontur The problem is not in splitting messages stream into several substreams when they reach the Source.BH. It is that in the current dynamic fees implementation, any bridge (aka lane if we omit the Eth bridge) may suspend the channel with the Source.AH (sending chain), effectively halting all other bridges that are using the same XCMP channel:

                                            /---> (1) Pallet for EthereumBridge
                                           /
Source.AH <--- XCMP channel ---> Source.BH -----> (2) Lane with Polkadot.AssetHub
                                           \
                                            \---> (3) Lane with Polkadot.Collectives

So in the picture above e.g. if relayer with Polkadot.Collectives is not working or if Polkadot.Collectives has some other troubles, eventually the "lane with Polkadot.Collectives" will suspend the "XCMP channel" with Source.AH and messages to other bridges won't even be delivered to the Source.BH until "lane with Polkadot.Collectives" will return back to the operational mode. It doesn't matter who is the recipient for those messages - they are simply not physically delivered to the Source.BH.

I've been thinking we may live with that, because logically we may look at the channel between Source.AH and Target.AH as a single channel (with-Polkadot bridge), which splits at the end. So we can't actually do anything here. I even left the big comment here: https://github.com/paritytech/parity-bridges-common/blob/master/modules/xcm-bridge-hub-router/src/lib.rs#L134-L163

But if we talk about multiple remote consensus systems, this analogy doesn't work anymore. Why the with-Polkadot bridge should affect the with-Ethereum bridge. So we probably need to introduce some multiplexing protocol in v2 instead of physically suspending the channel between sending chain and its sibling BH.

@bkontur
Copy link
Contributor

bkontur commented Aug 2, 2023

But if we talk about multiple remote consensus systems, this analogy doesn't work anymore. Why the with-Polkadot bridge should affect the with-Ethereum bridge. So we probably need to introduce some multiplexing protocol in v2 instead of physically suspending the channel between sending chain and its sibling BH.

I guess, thats what I was talking about, but probably wrong way :D

I was thinking about something like this:

if with-Polkadot bridge has congestion problem (what ever lane to Polkadot), then suspend all pallet_message_queue related to the Polkadot bridging pallet specified by MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))

if with-Ethereum bridge has congestion problem, then suspend all pallet_message_queue related to the Ethereum bridging pallet specified by MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

but this assumes that between AssetHub and BridgeHub exists several independent queues:

  1. pallet_message_queue for BridgeHub+with-Polkadot bridge
  2. pallet_message_queue for BridgeHub+with-Ethereum bridge

instead of one XCMP channel like now


maybe I have wrong expectation, that new pallet_message_queue will replace existing XCMP channel (HRMP channel between two paraIds)

like to open dedicated channels:

  • AssetHub and MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(53))
  • AssetHub and MultiLocation { parents: 1, X2(Parachain(1002), PalletInstance(123))

I really need to check pallet_message_queue PR

@svyatonik
Copy link
Contributor Author

Yes, the single HRMP channel is a problem

@svyatonik
Copy link
Contributor Author

Ok, I'm proposing the following solutions.

For v2:

  1. keep backpressure mechanism for the Target.BH -> Target.AH queue. So we still reject delivery transaction if Target.BH -> Target.AH is overloaded;
  2. when we see many messages in the Source.BH -> Target.BH queue, we are sending XCM message to the Source.AH instead of physically suspending the channel Source.AH -> Source.BH;
  3. the XCM would be Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) }. This call will be the part of the pallet-xcm-bridge-hub-router at Source.AH and it will only be callable by the bridge hub;
  4. we change the pallet-xcm-bridge-hub-router to maintain multiple fee factors for every route. So e.g. fee for Source.AH -> Target.AH and Source.AH -> Target.Collectives will have the different fee for the same message;
  5. when the Source.BH -> Target.BH queue becomes uncongested, we send similar message to the Source.AH: Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: false) };
  6. if we have the congestion report (via report_congestion), then instead of sending messages over HRMP channel immediately, we keep them in the local storage of the pallet-xcm-bridge-hub-router. Once we receive the report_congestion(is_congested: false) call, we can send all messages from the local storage to the Source.BH. I.e. - return to normal operational mode;
  7. if we keep receiving messages from Source.AH at Source.BH after we have called report_congestion, we are closing the bridge.

For v1 - let's do 1-5. We can trust that the asset hub respect our rules.

@svyatonik
Copy link
Contributor Author

Also - can we (easily) enable free execution of just some single call (report_congestion) at the asset hub? I don't want to add more accounts that need to be topped (in this case: sovereign account of bridge hub at asset hub)? For v1 at least it should be fine, because they are system parachains. For v2 we'll probably charge the SA of sending chain at BH when sending this message (or else: close the bridge).

@serban300
Copy link
Collaborator

serban300 commented Aug 2, 2023

Overall the strategy sounds very good. Just 1 question:

7. if we keep receiving messages from Source.AH at Source.BH after we have called report_congestion, we are closing the bridge.

I don't think the Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) } will reach the Source.AH immediately. It takes at least 1 more block for the hrmp message to be executed by the Source. If there is some congestion or other problems, maybe it takes more. Can we know when the Source has received the Transact ?

@bkontur
Copy link
Contributor

bkontur commented Aug 2, 2023

yes, exactly that I wanted to say, that we just need SA of BH on AH for Transact { call: XcmBridgeHubRouter::report_congestion(, but yes we could also allow UnpaidExecution for v1 maybe

I think 1-5 looks ok,

and yes, I prepared also the same question for 7.

@bkontur
Copy link
Contributor

bkontur commented Aug 2, 2023

Can we know when the Source has received the Transact ?

xcm is fire and forget, but we could use ReportStatus or some xcm query feature

@svyatonik
Copy link
Contributor Author

yes, exactly that I wanted to say, that we just need SA of BH on AH for Transact { call: XcmBridgeHubRouter::report_congestion(, but yes we could also allow UnpaidExecution for v1 maybe

@bkontur Ok, cool, thank you. Do you know if it is possible/easy to configure XCM in a way that only such XCM program will be executed for free? Or we need some new barrier for that?

I don't think the Transact { call: XcmBridgeHubRouter::report_congestion(destination_location, is_congested: true) } will reach the Source.AH immediately. It takes at least 1 more block for the hrmp message to be executed by the Source. If there is some congestion or other problems, maybe it takes more. Can we know when the Source has received the Transact ?

and yes, I prepared also the same question for 7.

@bkontur @serban300 I was thinking of some window before closing the bridge. We'll think details later

@bkontur
Copy link
Contributor

bkontur commented Aug 2, 2023

@bkontur Ok, cool, thank you. Do you know if it is possible/easy to configure XCM in a way that only such XCM program will be executed for free? Or we need some new barrier for that?

afaik, we need new barrier, I can do that, on my way to asset transfer I created at least two hacky Barriers :D, so no problem :D :D
or we can allow UnpaidExecution for all BridgeHub xcm calls, but temporray hacky barrier seem ok for v1

@svyatonik
Copy link
Contributor Author

afaik, we need new barrier, I can do that, on my way to asset transfer I created at least two hacky Barriers :D, so no problem :D :D
or we can allow UnpaidExecution for all BridgeHub xcm calls, but temporray hacky barrier seem ok for v1

No, it's fine - I'll make myself.Thought that maybe there's already some and you're our XCM magician, which knows about it

@svyatonik
Copy link
Contributor Author

For v1 it is resolved by #2318
For v2 we'll have the paritytech/polkadot-sdk#5062
So I'm closing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants