Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shutdown upon parentchain code update #1634

Merged
merged 5 commits into from
Nov 6, 2024
Merged

Conversation

brenzi
Copy link
Collaborator

@brenzi brenzi commented Nov 5, 2024

closes #1633

code updates can cause extrinsics to fail unless we can update metadata and extrinsics AdditionalParams.
Our architecture currently doesn't allow to update these dynamically.

Shutting down the service gracefully (assuming automatic restart) is safe and clean and probably doesn't even cause additional downtime as we would need to pause TOP's anyway to avoid race conditions

caveat:

@brenzi brenzi requested a review from clangenb November 5, 2024 09:59
@brenzi brenzi added A0-core Affects a core part B1-releasenotes C1-low 📌 Does not elevate a release containing this beyond "low priority" E0-breaksnothing labels Nov 5, 2024
@brenzi brenzi changed the title shutdown on parentchain code update shutdown upon parentchain code update Nov 5, 2024
Copy link
Contributor

@clangenb clangenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a legit approach to me, given the caveats we are aware of that you mentioned.

app-libs/parentchain-interface/src/lib.rs Outdated Show resolved Hide resolved
sidechain/consensus/slots/src/lib.rs Outdated Show resolved Hide resolved
service/src/enclave/tls_ra.rs Outdated Show resolved Hide resolved
@brenzi brenzi merged commit 98accb3 into master Nov 6, 2024
13 of 27 checks passed
@brenzi
Copy link
Collaborator Author

brenzi commented Nov 12, 2024

field test almost successful. Paseo had a runtime upgrade today:
https://paseo.subscan.io/block/3796856

the worker did shut down:

[L1Event:TargetA] CodeUpdated. Initiating service shutdown to allow clean restart
[L1Event:TargetA] Subscription terminated
....
[!] Sidechain block pruning loop has terminated
....
[!] Sidechain block production loop has terminated
....
[!] [Integritee] parentchain block syncing has terminated
......
[!] [TargetA] parentchain block syncing has terminated
....
[L1Event:Integritee] Subscription terminated
[!] waiting for 4 sensitive threads to shut down gracefully
[!] All threads stopped gracefully.

and restart:

aesm_service: warning: Turn to daemon. Use "--no-daemon" option to execute in foreground.
[2024-11-12T12:43:27.150Z INFO  integritee_service::config] Starting service in existing directory /opt/sidechain.
....
[Integritee] last synced parentchain block: 1520817
....
[>] DCAP setup: register QE collateral
...
[2024-11-12T12:43:33.189Z WARN  substrate_api_client::rpc::tungstenite_client::client] Expected subscription, but received an id response instead: Object {"error": Object {"code": Number(1014), "data": String("The transaction has too low priority to replace another transaction already in the pool."), "message": String("Priority is too low: (1353 vs 333)")}, "id": String("1"), "jsonrpc": String("2.0")}

from then on, the validateer was stuck (rpc did respond!) and needed a manual restart. See #1647

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0-core Affects a core part B1-releasenotes C1-low 📌 Does not elevate a release containing this beyond "low priority" E0-breaksnothing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shut down worker service gracefully whenever any parentchain dispatches System.CodeUpdated event
2 participants