Skip to content
This repository has been archived by the owner on Jan 11, 2024. It is now read-only.

FM-150: Subnet deployment scripts #222

Merged
merged 44 commits into from
Sep 26, 2023
Merged

FM-150: Subnet deployment scripts #222

merged 44 commits into from
Sep 26, 2023

Conversation

adlrocha
Copy link
Contributor

@adlrocha adlrocha commented Aug 28, 2023

This PR adds a makefile to run two local test networks: one for one node and the second for four nodes.

Closes consensus-shipyard/ipc#240

WIP

The idea is to leverage cargo-make to configure the deployment and trigger the end-to-end.

@aakoshh
Copy link
Contributor

aakoshh commented Sep 3, 2023

I think these can live in just a top level infra directory, next to docker and scripts, and leave the fendermint directory for Rust code.

@dnkolegov dnkolegov marked this pull request as ready for review September 7, 2023 12:49
@dnkolegov dnkolegov requested a review from aakoshh September 7, 2023 12:49
docs/localnet.md Outdated Show resolved Hide resolved
docs/localnet.md Outdated Show resolved Hide resolved
docs/localnet.md Outdated Show resolved Hide resolved
docs/localnet.md Outdated Show resolved Hide resolved
docs/localnet.md Outdated Show resolved Hide resolved
@dnkolegov
Copy link
Contributor

As discussed sync, I haven't been able to run it in Linux. I keep seeing errors when starting cometbft due to some permission issue, and restarting the infra leads to having to remove certain directories manually as sudo. I will keep trying to debug it myself but would love your input here.

Fixed.

@dnkolegov
Copy link
Contributor

dnkolegov commented Sep 18, 2023

@adlrocha New commits fix UID/GID issues on Linux. Please test if it works for you.

I was also thinking of cargo-make as the orchestration front-end for all the infra. We can have the high-level recipes in the Makefile and delegate the core logic of the scripts on independent files if that is possible.

Then I do not understand why we need cargo-make at all. Should we not switch to classic make? Scripts are embedded mechanisms of cargo-make and it is strange for me to use external scripts and call them from cargo-make. That is what make can do.

If I understood @aakoshh's comment right, he meant the following:

  • there are a lot of legacy scripts I have not removed
  • docker-compose items can be written externally
  • cargo-make is the best tool to use here: I do not know, maybe. For me, it works fine. But cargo-make was chosen as a tool for implementation, not by me.

The PR says

The idea is to leverage cargo-make to configure the deployment and trigger the end-to-end

That is what PR does right now: it configures deployment using cargo-make.

I do not mind clearing all the old stuff and improving cargo-make, but I do not see how creating new "core" scripts and then calling them can help us, especially when we do not have much time. Moreover, this will not affect the functionality or final result.

The main question is whether testnet is useful and can be used.

If you or @aakoshh believe it is incorrect and unacceptable to have cargo-make file like that, then I would like to see concrete technical requirements on how this should be implemented otherwise, we can improve this forever.

UPDATE:

I separated all functionality using extend mechanism. Hope the new version looks better for you.

@aakoshh
Copy link
Contributor

aakoshh commented Sep 21, 2023

I agree that it's difficult to say when make is better than cargo-make, and how much to put into the Makefile or Makefile.toml versus scripts.

I can only speak from experience that I put too much in the Makefile, trying to rely on variables declared at the top to do cute commands as dependencies but they were much harder to understand than a chain of script invocations with explicit parameter passing would have been. In retrospect I would have tried to keep the Makefile as a set of memorable commands do very high level stuff and delegate to scripts ASAP.

The benefit of cargo-make to me were things like:

But I would not have attempted to do the steps I put into the init.sh to set up accounts, create keys, etc. That's where the Makefile in this PR started to break down for me: too many little steps that were chained together as dependencies, which reminded me of how I did the e2e tests for the agent.

I agree with your that you want to get testing ASAP but it's also about maintainability, hopefully we don't have to go in circles forever.

Comment on lines 14 to 15
--secret-key-from fendermint/testing/smoke-test/test-data/fendermint/keys/emily.sk \
--secret-key-to fendermint/testing/smoke-test/test-data/fendermint/keys/eric.sk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this setup crate these keys at all?

Copy link
Contributor

@dnkolegov dnkolegov Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I was waiting for Alfonso's examples. Removed not to add more questions.

Comment on lines +92 to +100
[tasks.testnet-config]
dependencies = [
"testnet-script-new-genesis",
"testnet-script-add-peers",
"testnet-script-new-key",
"testnet-script-new-account",
"testnet-script-new-gateway",
"testnet-script-share-genesis"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon it's these and the node equivalent steps that benefit least from being in the Makefile.toml and chained with dependencies as individual tasks, as opposed to just one script that does it all.

But it's much better now that you have split it out into multiple .toml files, thanks for that!

"cometbft-pull",
"testnet-mkdir",
"testnet-cometbft-init",
"testnet-mkdir",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a duplicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

--top-down-check-period 10 \
--bottom-up-check-period 10 \
--msg-fee 10 \
--majority-percentage 66
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but strictly speaking 66% is less than 2/3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@aakoshh
Copy link
Contributor

aakoshh commented Sep 21, 2023

I don't see any blockers. Have you given any thought to this suggestion to remove the duplicated service definitions? It would make it easier to set up networks with arbitrary number of nodes.

@dnkolegov
Copy link
Contributor

dnkolegov commented Sep 21, 2023

#!/bin/bash

docker network create localnet || echo "network already exists"

for i in {0..1}; do
  docker-compose --env-file node${i}.env --project-name node${i} up -d
done

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

I also tried include - It does not work either since it requires different names for all services.

In the future, we can write a simple generator and use it - https://github.com/cometbft/cometbft/blob/main/cmd/cometbft/commands/testnet.go

@aakoshh
Copy link
Contributor

aakoshh commented Sep 25, 2023

Thanks for trying. Maybe some simple poor-man's templating solution with sed could help.

@adlrocha
Copy link
Contributor Author

Hey, @dnkolegov, thank you for all the changes, amazing job. Dividing the jobs into different toml files makes it way more readable.

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

Regarding this outstanding issue, my take would be to use this approach even if it takes around a minute to bootstrap the network. We can address this in a follow up PR once we have the full end-to-end deployment of subnets fleshed out.
either through simple code generation or using some kind of simple templating scheme as @aakoshh suggests.

At this early stage, I personally rather have non-performant but maintainable test so we don't introduce a lot of debt too soon (as was the case in the agent). So I would say, let's introduce the approach that used loops and let's get this one merged. Once I've taken it for a spin I will open tickets with potential follow-ups. WDYT?

@dnkolegov
Copy link
Contributor

Hey, @dnkolegov, thank you for all the changes, amazing job. Dividing the jobs into different toml files makes it way more readable.

I tried your suggestion. It did not work well for me when I tried it for 12 containers. The main reason is that instances are started sequentially, it took 40 seconds for me to start the testnet since we use a health check for comet-bft nodes.

Regarding this outstanding issue, my take would be to use this approach even if it takes around a minute to bootstrap the network. We can address this in a follow up PR once we have the full end-to-end deployment of subnets fleshed out. either through simple code generation or using some kind of simple templating scheme as @aakoshh suggests.

At this early stage, I personally rather have non-performant but maintainable test so we don't introduce a lot of debt too soon (as was the case in the agent). So I would say, let's introduce the approach that used loops and let's get this one merged. Once I've taken it for a spin I will open tickets with potential follow-ups. WDYT?

Fixed

infra/up.sh Outdated
PORT3=$((PORT3+1))
done

wait $(jobs -p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never seen this before, very clever 👍

Copy link
Contributor Author

@adlrocha adlrocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnkolegov , one last fix before merging, when running it in Linux from scratch I get: network testnet declared as external, but could not be found

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPC: Child subnet deployment scripts
3 participants