Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist and load peers from separate database #1935

Merged
merged 6 commits into from
Jul 16, 2024
Merged

Conversation

weiihann
Copy link
Contributor

@weiihann weiihann commented Jul 9, 2024

Description

This PR adds a P2P feature such that peers discovered and stored in the DHT table are persisted in the database upon node stop. When node is started, the previously stored peers are loaded from the database.

Rationale

See #1908.

Nodes may reconnect to each other once they are back online. However, this is not immediate. For a better user experience, nodes should immediately reconnect to the peers that they have found previously.

Implementation Details

Peers info is stored in the mainbase. Peers info is stored as a key-value pair, where the key is the prefix + peer ID and the value is the encoded list of multi addresses. The encoding is done through the gob data format with the encoding/gob module.

Future Improvement

The current implementation persists peers only when the node is stopped. If a peer is disconnected, it will not be persisted. We should persist the peers as we discover them.

This is technically possible, but slightly troublesome. We can subscribe to libP2P events when a peer is connected, but the event only surfaces the peer ID but not the multi-address. Therefore, we have to lookup the table using the peer ID to obtain the relevant multi-addresses. This adds slight complexity.

Or, another simpler idea is just to persist the peer store every certain interval.

Copy link

codecov bot commented Jul 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.48%. Comparing base (5698ff5) to head (47e41eb).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1935   +/-   ##
=======================================
  Coverage   75.48%   75.48%           
=======================================
  Files          97       97           
  Lines        8675     8676    +1     
=======================================
+ Hits         6548     6549    +1     
+ Misses       1546     1545    -1     
- Partials      581      582    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

node/node.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
@weiihann weiihann force-pushed the weiihann/persist-peers branch from 3ceeb4d to e425c14 Compare July 16, 2024 08:57
@weiihann weiihann force-pushed the weiihann/persist-peers branch from e425c14 to d4ec95a Compare July 16, 2024 09:10
node/node.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
p2p/p2p.go Outdated Show resolved Hide resolved
Copy link
Contributor

@IronGauntlets IronGauntlets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kirugan kirugan merged commit 50f6c92 into main Jul 16, 2024
10 checks passed
@kirugan kirugan deleted the weiihann/persist-peers branch July 16, 2024 14:23
kirugan added a commit that referenced this pull request Jul 23, 2024
Remove unused imports, added versioned constants for 0.13.1.1 to rust constants

Add concurrency_mode parameter to VM methods

Fix calls in tests, add StateMaps -> StateDiff conversion

debugging issue

Replace dispatch token witn app one in pipelines (#1916)

Remove ForceFetchingTracesForBlocks feature (#1910)

l1: avoid spam calling L1 RPC subscription (#1918)

* Add retry logic for retrieving L1 finalised height

* Sanitize eth_getBlockByNumber unmarshal method

* l1: add EthSubscriber test

Start using Artifactory for CI/CD in favour of Docker Registry (#1917)

Due security reasons, we had to stop using the dispatch token and
start using the GitHub App in order to trigger the deployment in
argo.  Because argo is a private repository, we can't trigger from
a public one (juno), so then we start to change the approach to first
push the docker images to jFrog Artifactory, then argo will be notified
that a new image was pushed, then it will trigger the deployment

Extra Tasks:
- Run YAML formatter on build-and-deploy workflow:
Having a well formated file makes it easier to read and for people
to contribute

- Remove unnecessary IMAGE_TAG from build-and-deploy.yml:
Instead of using both env.DOCKER_IMAGE_TAG and output.IMAGE_TAG, only
use one of them.

- Improve readability of stages in build-and-deploy.yml:
Rename stages to make it easier to understand what's going on.
For example from 'deploy_to_dev' to 'validate_dev' in order to
include that some tests will be run on the environment

- Set common env var in the root of the file:
Some of the env vars are being used in multiple stages,
so instead of having to hard-code some small differences
in multiple places, bring it all back to a root level
where it's easier to see what changes for what environment.

updating verify proof range to handle empty proof keys (#1901)

* implement VerifyRangeProof

* Start using Artifactory for CI/CD in favour of Docker Registry

Due security reasons, we had to stop using the dispatch token and
start using the GitHub App in order to trigger the deployment in
argo.  Because argo is a private repository, we can't trigger from
a public one (juno), so then we start to change the approach to first
push the docker images to jFrog Artifactory, then argo will be notified
that a new image was pushed, then it will trigger the deployment

Extra Tasks:
- Run YAML formatter on build-and-deploy workflow:
Having a well formated file makes it easier to read and for people
to contribute

- Remove unnecessary IMAGE_TAG from build-and-deploy.yml:
Instead of using both env.DOCKER_IMAGE_TAG and output.IMAGE_TAG, only
use one of them.

- Improve readability of stages in build-and-deploy.yml:
Rename stages to make it easier to understand what's going on.
For example from 'deploy_to_dev' to 'validate_dev' in order to
include that some tests will be run on the environment

- Set common env var in the root of the file:
Some of the env vars are being used in multiple stages,
so instead of having to hard-code some small differences
in multiple places, bring it all back to a root level
where it's easier to see what changes for what environment.

---------

Co-authored-by: rian <[email protected]>
Co-authored-by: Mario Apra <[email protected]>

Update CI/CD pipeline naming, image repository, and secrets var (#1919)

Fix pipeline deployment with correct access token (#1921)

Use Nubia user instead of Angkor for artifactory

Add --versioned-constants-file flag (#1920)

Misc Fixes - ignore juno artifacts in .gitignore and add requirement for implicit dependency on `pkg-config` (#1923)

* chore: Ignore Juno artifacts in .gitignore

* chore: Update system dependencies in README.md

Turns out pkg-config is required:

```
$ make juno
...
jemalloc: exec: "pkg-config": executable file not found in $PATH
```

Update db snapshot for mainnet (#1924)

Use ctx in BucketMigrator (#1521)

Fix corrupted db snapshot (#1927)

Fix how to staging promotion (#1928)

Fix how to promote to staging and prod environments

Fix how to get DOCKER_IMAGE_TAG:

  By using GITHUB_ENV, only the steps on the same job will be
able to get the value.  When trying to get it from steps from
different jobs the value is empty.

  This was causing issues to promote to staging because it was
trying to promote EVERYTHING in dev to staging.  Luckly there
was a check to not allow to overwrite the same tag, so things
that were already promoted couldn't be overwritten

  By using GITHUB_OUTPUT instead, it is now possible to get the
variable on any step from any subsequent job

Add deployment version verification for staging and production:

  The CI/CD pipeline has been updated to include a new step for
verifying the deployment version in the staging and production
environments.
  This ensures that the correct version of the
Docker image is deployed to each environment before running the
tests

Add set -ux to deployment verification script:

- -u will treat any unitialized variable as an error
- -x will print the command that is being executed

  This will make sure that cases where the EXPECTED_VERSION
not being passed won't cause any problem.
  One example where the variable was not set and the job succeeded:
https://github.com/NethermindEth/juno/actions/runs/9800486115/job/27062611554#step:3:15

  By using -x, it will be useful to understand exactly what's
the expected version, so no confusion will be made

Add Manual gate to run tests in production

  Since the deployment to production is a manual process, creating
a artificial environment makes it possible for the tests to run
only after a deployment is actually done.
  This avoids the issue where the tests will start to run without
the environment be ready.

Fix indentation in ci-cd-pipeline.yml (#1931)

The test_in_production job had two extra spaces, making
the YAML file invalid, since it couldn't detect that this
was a job but as a step, and steps can't have the same
attributes as jobs

Update p2p implementation according to recent spec (#1858)

Update docs.starknet.io links (#1929)

Some of the links were broken and some of them were redirecting.
There were also cases where I couldn't find where it was refering
to, so I decided to just delete the link

This commit is co-authored.  I got the inspiration from
#1564 and I improved a
little bit

Co-authored-by: Krauspt <[email protected]>

Fix how to promote to staging/prod (#1932)

- Add checkout step to promote_to_staging in ci-cd-pipeline.yml

This is necessary because at the Verify Deployment Version step,
there is a call for a file that is in the repository

- Stop using jf cli to do the promotion

For some reason the jf rt dpr moves/copies only ‘image’ without manifests.
This means that instead of relying on jFrog to do the promotion, even
though it will be sub-optimal, relying on Docker seems to be safer

Safely enable uploading report to Codecov (#1939)

This is done by saving the coverage output as an workflow artifact
then loading it on a separate job and uploading it to codecov.

The reason why this is necessary is because when running the juno-test
workflow from a fork / untrusted dev, the secrets is not available.
This is done in order to secure the secrets to being exposed from
an atacker that might want to create a PR and get them.
More info: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

The second workflow will be trigger once the juno-test is completed
so it only needs to download the artifact and upload to codecov.
Since this workflow source code is only from main, in order for it
to contain malicious code, it would've been required to go through
a PR.

I've also added a small script to comment to the PR, so it's clear
that the codecov will be uploaded shortly.  This can be reverted
later on if it starts to become too spammy.

Fix how to upload coverage to codecov (#1943)

When I copied the Upload coverage to Codecov step from the juno-test
I forgot to remove the condition that it would only do it when running
on ubuntu-latest machine.

Also, the Comment on PR didn't work, so I'm just removing it since
it doesn't provide much benefit

Bump version of github actions (#1937)

Bump version of github actions

- actions/setup-go: Bumping it to v5.
With the v5, cache is done by default, so there is no need to
specify cache: true

- actions/checkout: bumping it to v4
A lot of different improvements, but the one that made me decide
to upgrade was to use node20 instead of node16.
Node16 has en end of life on 11 Sep 2023

- codecov/codecov-action: bumping it to v4 (v4.5.0)
It was hard-coded v4.2.0, meaning that the fixes around detecting
if running from a fork (v4.3, v4.4 and v.4.5) were not backported,
therefore making the job not being able to detect when it was
running from a fork

Remove duplicate libraries linked on MacOS (#1944)

Explanation: -ldl is dynamic linking library and -lm is math library. Both are not needed for Mac OS because the OS use different linking mechanism and math is part of C stdlib anyway.

Bump github.com/rs/cors from 1.10.1 to 1.11.0 (#1930)

Bumps [github.com/rs/cors](https://github.com/rs/cors) from 1.10.1 to 1.11.0.
- [Commits](rs/cors@v1.10.1...v1.11.0)

---
updated-dependencies:
- dependency-name: github.com/rs/cors
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bump github.com/ethereum/go-ethereum from 1.13.10 to 1.13.15 (#1859)

Bumps [github.com/ethereum/go-ethereum](https://github.com/ethereum/go-ethereum) from 1.13.10 to 1.13.15.
- [Release notes](https://github.com/ethereum/go-ethereum/releases)
- [Commits](ethereum/go-ethereum@v1.13.10...v1.13.15)

---
updated-dependencies:
- dependency-name: github.com/ethereum/go-ethereum
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Add new linters and fix their errors (#1936)

Update blockifier, fix todos, update fgw response objects

Draft impl of new block hashing scheme

Update blockifier to 0.8.0-rc.0

Persist and load peers from separate database (#1935)

Rename trie/TransactionStorage to Storage (#1788)

Separate tests with coverage and without coverage (#1953)

Running code coverage slows down the run by ~20%.
The commit separates the tests into two different jobs
based on the operating system. The "Tests (Coverage)"
job runs the tests with coverage and is only triggered
on the "ubuntu-latest" OS, while the "Tests (No Coverage)"
job runs the tests without coverage on other OSes.
This change improves the test workflow and ensures that
coverage only runs on the os that does uploads the result
to Codecov.

Daniil/linter-update (#1957)

* golangci-lint updated to v1.59.1
* The configuration option `run.skip-dirs` is deprecated, use `issues.exclude-dirs`
* The configuration option `linters.govet.check-shadowing` is deprecated. Enabled `shadow` instead
* The linter 'gomnd' is deprecated (since v1.58.0) due to: The linter has been renamed. Replaced by mnd.
* directive `//nolint:goconst` is unused for linter "goconst". Deleted comments.

Add enumer tool. Generate using it helper functions for Bucket type (#1958)

Add test race detection to pipeline. Fix race conditions in tests (#1956)

Fix linter issues

Update cargo dependencies

Implement receipt Hash()

Fix receipt commitment

Remove vm

Update vm

Disable make test-race in github workflow (#1960)

Update juno command's Use field

Update TestConfigPrecedence comment

Add db-size subcommand to juno command

Use cmd.OutorStdout() for printing in cmd/juno

Add CalculatePrefixSize()

Add db/pebble.NewWithOptions

Calculate Juno's DB size

Move GenP2PKeyPari to genp2pkeypair.go

Move DBSize to dbsize.go

Include semantic version in agent name (#1906)

* Include client version in p2p agent name

* Construct agent name in p2p package

* Use fmt.Sprintf to construct agent name

Use previously Unused bucket for Peers' storage (#1962)

Fix linter issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants