FM-150: Subnet deployment scripts #222

adlrocha · 2023-08-28T08:58:49Z

This PR adds a makefile to run two local test networks: one for one node and the second for four nodes.

WIP

The idea is to leverage cargo-make to configure the deployment and trigger the end-to-end.

aakoshh · 2023-09-03T07:54:29Z

I think these can live in just a top level infra directory, next to docker and scripts, and leave the fendermint directory for Rust code.

docs/localnet.md

fendermint/app/src/cmd/key.rs

fendermint/app/src/options/key.rs

dnkolegov · 2023-09-18T19:58:54Z

As discussed sync, I haven't been able to run it in Linux. I keep seeing errors when starting cometbft due to some permission issue, and restarting the infra leads to having to remove certain directories manually as sudo. I will keep trying to debug it myself but would love your input here.

Fixed.

dnkolegov · 2023-09-18T20:16:15Z

@adlrocha New commits fix UID/GID issues on Linux. Please test if it works for you.

I was also thinking of cargo-make as the orchestration front-end for all the infra. We can have the high-level recipes in the Makefile and delegate the core logic of the scripts on independent files if that is possible.

Then I do not understand why we need cargo-make at all. Should we not switch to classic make? Scripts are embedded mechanisms of cargo-make and it is strange for me to use external scripts and call them from cargo-make. That is what make can do.

If I understood @aakoshh's comment right, he meant the following:

there are a lot of legacy scripts I have not removed
docker-compose items can be written externally
cargo-make is the best tool to use here: I do not know, maybe. For me, it works fine. But cargo-make was chosen as a tool for implementation, not by me.

The PR says

The idea is to leverage cargo-make to configure the deployment and trigger the end-to-end

That is what PR does right now: it configures deployment using cargo-make.

I do not mind clearing all the old stuff and improving cargo-make, but I do not see how creating new "core" scripts and then calling them can help us, especially when we do not have much time. Moreover, this will not affect the functionality or final result.

The main question is whether testnet is useful and can be used.

If you or @aakoshh believe it is incorrect and unacceptable to have cargo-make file like that, then I would like to see concrete technical requirements on how this should be implemented otherwise, we can improve this forever.

UPDATE:

I separated all functionality using extend mechanism. Hope the new version looks better for you.

aakoshh · 2023-09-21T13:28:53Z

I agree that it's difficult to say when make is better than cargo-make, and how much to put into the Makefile or Makefile.toml versus scripts.

I can only speak from experience that I put too much in the Makefile, trying to rely on variables declared at the top to do cute commands as dependencies but they were much harder to understand than a chain of script invocations with explicit parameter passing would have been. In retrospect I would have tried to keep the Makefile as a set of memorable commands do very high level stuff and delegate to scripts ASAP.

The benefit of cargo-make to me were things like:

profile based overrides for env vars
unconditional cleanup task
Rust scripts would have been cool, but I did not need them

But I would not have attempted to do the steps I put into the init.sh to set up accounts, create keys, etc. That's where the Makefile in this PR started to break down for me: too many little steps that were chained together as dependencies, which reminded me of how I did the e2e tests for the agent.

I agree with your that you want to get testing ASAP but it's also about maintainability, hopefully we don't have to go in circles forever.

aakoshh · 2023-09-21T13:44:12Z

infra/scripts/examples.toml

+  --secret-key-from fendermint/testing/smoke-test/test-data/fendermint/keys/emily.sk \
+  --secret-key-to   fendermint/testing/smoke-test/test-data/fendermint/keys/eric.sk


Does this setup crate these keys at all?

No. I was waiting for Alfonso's examples. Removed not to add more questions.

aakoshh · 2023-09-21T13:48:36Z

infra/scripts/testnet.toml

+[tasks.testnet-config]
+dependencies = [
+    "testnet-script-new-genesis",
+    "testnet-script-add-peers",
+    "testnet-script-new-key",
+    "testnet-script-new-account",
+    "testnet-script-new-gateway",
+    "testnet-script-share-genesis"
+]


I reckon it's these and the node equivalent steps that benefit least from being in the Makefile.toml and chained with dependencies as individual tasks, as opposed to just one script that does it all.

But it's much better now that you have split it out into multiple .toml files, thanks for that!

aakoshh · 2023-09-21T13:49:14Z

infra/scripts/testnet.toml

+    "cometbft-pull",
+    "testnet-mkdir",
+    "testnet-cometbft-init",
+    "testnet-mkdir",


Seems like a duplicate.

aakoshh · 2023-09-21T13:50:48Z

infra/scripts/node.toml

+    --top-down-check-period 10 \
+    --bottom-up-check-period 10 \
+    --msg-fee 10 \
+    --majority-percentage 66


Minor but strictly speaking 66% is less than 2/3.

aakoshh · 2023-09-21T13:53:08Z

I don't see any blockers. Have you given any thought to this suggestion to remove the duplicated service definitions? It would make it easier to set up networks with arbitrary number of nodes.

dnkolegov · 2023-09-21T20:04:55Z

#!/bin/bash

docker network create localnet || echo "network already exists"

for i in {0..1}; do
  docker-compose --env-file node${i}.env --project-name node${i} up -d
done

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

I also tried include - It does not work either since it requires different names for all services.

In the future, we can write a simple generator and use it - https://github.com/cometbft/cometbft/blob/main/cmd/cometbft/commands/testnet.go

aakoshh · 2023-09-25T08:36:16Z

Thanks for trying. Maybe some simple poor-man's templating solution with sed could help.

adlrocha · 2023-09-25T10:54:19Z

Hey, @dnkolegov, thank you for all the changes, amazing job. Dividing the jobs into different toml files makes it way more readable.

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

Regarding this outstanding issue, my take would be to use this approach even if it takes around a minute to bootstrap the network. We can address this in a follow up PR once we have the full end-to-end deployment of subnets fleshed out.
either through simple code generation or using some kind of simple templating scheme as @aakoshh suggests.

At this early stage, I personally rather have non-performant but maintainable test so we don't introduce a lot of debt too soon (as was the case in the agent). So I would say, let's introduce the approach that used loops and let's get this one merged. Once I've taken it for a spin I will open tickets with potential follow-ups. WDYT?

dnkolegov · 2023-09-25T19:16:41Z

Hey, @dnkolegov, thank you for all the changes, amazing job. Dividing the jobs into different toml files makes it way more readable.

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

Regarding this outstanding issue, my take would be to use this approach even if it takes around a minute to bootstrap the network. We can address this in a follow up PR once we have the full end-to-end deployment of subnets fleshed out. either through simple code generation or using some kind of simple templating scheme as @aakoshh suggests.

At this early stage, I personally rather have non-performant but maintainable test so we don't introduce a lot of debt too soon (as was the case in the agent). So I would say, let's introduce the approach that used loops and let's get this one merged. Once I've taken it for a spin I will open tickets with potential follow-ups. WDYT?

Fixed

aakoshh · 2023-09-26T11:21:26Z

infra/up.sh

+	PORT3=$((PORT3+1))
+done
+
+wait $(jobs -p)


Never seen this before, very clever 👍

adlrocha

@dnkolegov , one last fix before merging, when running it in Linux from scratch I get: network testnet declared as external, but could not be found

adlrocha and others added 2 commits August 28, 2023 10:55

wip: first few strokes

06040d1

Merge branch 'main' into fm-150-deployment-scripts

198f0ef

dnkolegov added 7 commits September 4, 2023 16:18

Update makefile

44d0db1

update

67f74ea

fix misprints

a0cad1c

start working on testnet config

d81a940

add persistent peers

d594806

fmt

14dbaba

Add docs and polish

2f56b51

dnkolegov marked this pull request as ready for review September 7, 2023 12:49

dnkolegov requested a review from aakoshh September 7, 2023 12:49

dnkolegov added 4 commits September 7, 2023 15:09

Merge branch 'main' into fm-150-deployment-scripts

842981e

Fix misprint in key.rs

88d1e51

Remove unused imports key.rs

2b9d626

move infra

60ac4f4

aakoshh reviewed Sep 7, 2023

View reviewed changes

docs/localnet.md Outdated Show resolved Hide resolved

aakoshh reviewed Sep 7, 2023

View reviewed changes

docs/localnet.md Outdated Show resolved Hide resolved