Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crates/sui): add --with-data-ingestion to 'sui start' #20472

Merged
merged 6 commits into from
Nov 29, 2024

Conversation

unmaykr-aftermath
Copy link
Contributor

@unmaykr-aftermath unmaykr-aftermath commented Nov 29, 2024

Description

Adds the --data-ingestion-dir option to sui start that allows users to set a custom data ingestion directory where checkpoint files will be written to.

This is useful, among other things, for testing a custom local reader indexing setup with low overhead.

This is not a breaking change, as sui start --with-indexer works as usual by creating a temporary directory for data ingestion, but now one can:

  • Use sui start --data-ingestion-dir <path> --with-indexer to also have checkpoints written to <path>. May be useful for inspecting the data
  • Use sui start --data-ingestion-dir <path> to only have checkpoints written to <path>, but not start an indexer service

P.s.: every time I ran sui start --with-data-ingestion=<path> (no other arguments), the network started producing checkpoints from 0. Is this intended or was the network supposed to pick up from where it left off? Perhaps it's the full node syncing.

Test plan

I tested it locally to verify that the checkpoints are indeed written to the configured directory. See the screenshot below. sui start --with-indexer still runs without problems and I verified that checkpoints were committed to the DB.

image

I set export SUI_CONFIG_DIR=~/.sui/test for the above.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI: Added the --data-ingestion-dir option to sui start to set a custom directory where to write the checkpoint files.
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Nov 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 29, 2024 8:57pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 29, 2024 8:57pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 29, 2024 8:57pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 29, 2024 8:57pm

Copy link
Member

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unmaykr-aftermath ! The change seems reasonable. I've left some suggestions for how to preserve the backwards compatibility.

I'm not very familiar with how the local file based data ingestion is supposed to work, so cc @bmwill and @phoenix-o to judge whether the behaviour of always starting from the genesis checkpoint is expected.

crates/sui/src/sui_commands.rs Outdated Show resolved Hide resolved
Comment on lines 644 to 650
// the indexer requires to set the fullnode's data ingestion directory
// note that this overrides the default configuration that is set when running the genesis
// command, which sets data_ingestion_dir to None.
ensure!(
with_data_ingestion.is_some(),
"Cannot start the indexer without --with-data-ingestion."
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do something like this instead to retain the backwards compatibility:

Suggested change
// the indexer requires to set the fullnode's data ingestion directory
// note that this overrides the default configuration that is set when running the genesis
// command, which sets data_ingestion_dir to None.
ensure!(
with_data_ingestion.is_some(),
"Cannot start the indexer without --with-data-ingestion."
);
}
if with_indexer.is_some() && with_data_ingestion.is_none() {
with_data_ingestion = Some(tempdir()?.into_path())
}

@@ -734,7 +749,8 @@ async fn start(
pg_address.clone(),
None,
None,
Some(data_ingestion_path.clone()),
// We checked above that --with-data-ingestion is set if --with-indexer is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to change this comment along with the suggestion above.

Cargo.toml Outdated
@@ -604,7 +604,11 @@ anemo-tower = { git = "https://github.com/mystenlabs/anemo.git", rev = "e609f769

# core-types with json format for REST api
# sui-sdk-types = { version = "0.0.1", features = ["hash", "serde", "schemars"] }
sui-sdk-types = { git = "https://github.com/MystenLabs/sui-rust-sdk.git", rev = "31bd9da32a8057edc45da8f5e6a8f25b83919b93", features = ["hash", "serde", "schemars"] }
sui-sdk-types = { git = "https://github.com/MystenLabs/sui-rust-sdk.git", rev = "31bd9da32a8057edc45da8f5e6a8f25b83919b93", features = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't notice it. I had the taplo LSP running

Cargo.toml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the suggested change to how we handle the default/None case, we can get rid of all changes here.

@stefan-mysten
Copy link
Contributor

Thanks @unmaykr-aftermath. Just a couple of nits as I was thinking about this, mostly food for thought.
The way I thought of with- flags is that they correspond to specific services. In that sense, I'd think more of the with-data-ingestion to be more like set-ingestion-dir (or data-ingestion-dir or some variation), as this does not start a specific service but instead sets up a service's config in some specific way similarly to what we do with --pg-port` and other args. What do you think?

Finally, I second the suggestion to not break existing way of starting the local network with sui start --with-indexer as it will break a lot of people's configs (and our own CI - which can be fixed, but it's always annoying as people will be forced to learn what's the new correct way).

@stefan-mysten
Copy link
Contributor

Ah, and we should update the release notes with a description of what changed in the CLI section. I'll leave it to you and will double check before merging. Thanks again! @unmaykr-aftermath

@unmaykr-aftermath
Copy link
Contributor Author

Thanks @unmaykr-aftermath. Just a couple of nits as I was thinking about this, mostly food for thought. The way I thought of with- flags is that they correspond to specific services. In that sense, I'd think more of the with-data-ingestion to be more like set-ingestion-dir (or data-ingestion-dir or some variation), as this does not start a specific service but instead sets up a service's config in some specific way similarly to what we do with --pg-port` and other args. What do you think?

I agree. I'll go with --data-ingestion-dir. And I'll also see where to write the release notes. Thanks for the feedback

@stefan-mysten
Copy link
Contributor

And I'll also see where to write the release notes

This is right in the PR after Test Plan - the CLI checkbox is already selected, so we just need to add some description with the change.

The name is more semantically aligned and the mechanics are now backwards compatible
@unmaykr-aftermath
Copy link
Contributor Author

@amnn @stefan-mysten I updated the PR's desc. and made the suggested code changes

@unmaykr-aftermath unmaykr-aftermath temporarily deployed to sui-typescript-aws-kms-test-env November 29, 2024 20:19 — with GitHub Actions Inactive
@stefan-mysten
Copy link
Contributor

Thanks for the quick changes @unmaykr-aftermath. I'll give it a spin locally and then we can merge this!

@stefan-mysten
Copy link
Contributor

@unmaykr-aftermath looks good. I will fix the failing CI issues (tests + fmt) and then merge. Thanks for this.

@stefan-mysten stefan-mysten temporarily deployed to sui-typescript-aws-kms-test-env November 29, 2024 21:00 — with GitHub Actions Inactive
@stefan-mysten stefan-mysten enabled auto-merge (squash) November 29, 2024 21:01
@stefan-mysten stefan-mysten merged commit 4355b42 into MystenLabs:main Nov 29, 2024
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants