Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triage: prover node quotes aren't getting gossiped #9660

Closed
Tracked by #9712
just-mitch opened this issue Nov 1, 2024 · 3 comments
Closed
Tracked by #9712

triage: prover node quotes aren't getting gossiped #9660

just-mitch opened this issue Nov 1, 2024 · 3 comments
Assignees

Comments

@just-mitch
Copy link
Collaborator

just-mitch commented Nov 1, 2024

A fresh bot is sending a transaction which gets added to the mempool by the bootnode, but then is dropped with no explanation.

The reason was in part because the TX was submitted, then a reorg occurred.

The reorg occurred in part because the prover node's quote wasn't gossiped beyond the boot node.

Steps to repro

./scripts/earthly-local ./yarn-project/+export-e2e-test-images

cd yarn-project/end-to-end

AZTEC_DOCKER_TAG=yournewtag NAMESPACE=smoke FRESH_INSTALL=true VALUES_FILE="48-validators.yaml" ./scripts/network_test.sh ./src/spartan/smoke.test.ts

Assuming it comes up, restart the bot and watch its logs.

Explore-logs-2024-11-01 14_27_30.json

@just-mitch just-mitch added P-high 🔥 Priority: high. Do this task next. S-needs-triage Status: This new issue/PR needs to be triaged. T-bug Type: Bug. Something is broken. labels Nov 1, 2024
@just-mitch just-mitch added this to the Sequencer & Prover Testnet milestone Nov 1, 2024
@just-mitch just-mitch self-assigned this Nov 1, 2024
@just-mitch
Copy link
Collaborator Author

just-mitch commented Nov 1, 2024

Re-ran the repro with aztec:libp2p_service logs enabled on the boot node. I'm seeing:

Already connected to peer 16Uiu2HAkxo...
Handling discovered peer 16Uiu2HAkxo...

Multiple times per second. I filed #9674

@just-mitch
Copy link
Collaborator Author

just-mitch commented Nov 2, 2024

It's looking like the transaction got dropped after a reorg. At least 2 issues come out of this:

  1. I saw a node try to build the block when it only had like 5 seconds left. I filed node needs a minimum time for block building #9675
  2. The prover node's quote isn't getting gossiped. I can see it lands on the boot node but goes no further.

I will repurpose this issue to address point 2. Next triage is actually to re-run with libp2p debug logs on all the nodes to ensure they in fact receive nothing (though I don't see anyone attempting to even claim the quote, but that could be due to other issues).

I also need to enable discv5 logs on the prover node- it should be broadcasting to more than just the boot node.

There is, however, still a serious, separate issue: I started the 48 validator test in AWS and got TX timeouts from the bot before a reorg was possible. I tracked that on in #9676

@just-mitch just-mitch changed the title TX getting dropped for no apparent reason Prover node quotes aren't getting gossiped Nov 2, 2024
@just-mitch just-mitch changed the title Prover node quotes aren't getting gossiped triage: prover node quotes aren't getting gossiped Nov 7, 2024
@just-mitch just-mitch removed P-high 🔥 Priority: high. Do this task next. S-needs-triage Status: This new issue/PR needs to be triaged. T-bug Type: Bug. Something is broken. labels Nov 7, 2024
@just-mitch
Copy link
Collaborator Author

I ran the 48 validator setup, and we had over 75 consecutively proven epochs. Closing this until we have a new concrete occurrence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant