Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Das batch store chunking #2341

Merged
merged 20 commits into from
Jun 6, 2024
Merged

Das batch store chunking #2341

merged 20 commits into from
Jun 6, 2024

Conversation

Tristan-Wilson
Copy link
Member

@Tristan-Wilson Tristan-Wilson commented May 25, 2024

Add RPC for chunked send of DAS batches

This adds support for the batch poster in AnyTrust configuration to send batches to the DA committee in chunks of a configurable maximum HTTP POST body size. It adds new RPC methods to the daserver executable's RPC server, das_startChunkedStore, das_sendChunk, das_commitChunkedStore, which the clients created by the batch poster will automatically use if they are detected to be available on the committee server, otherwise it will fall back to the legacy das_store method. This allows an easy roll out of either the client or server first.

The payloads of the new RPC methods are all signed by the batch poster. As basic DoS prevention, at most 10 uncommitted stores can be outstanding, uncommitted stores expire after a minute, and a das_startChunkedStore with the same arguments is not replayable after a minute. The batch poster should only be trying to store one batch at a time, so this should be sufficient.

The new option --node.data-availability.rpc-aggregator.max-store-chunk-body-size is expressed in terms of the HTTP POST body size that the operator wants the chunk requests to stay under. 512B of padding is also added to whatever the user operator specifies here, since some proxies or endpoints may additionally count headers. This is orthogonal to settings like --node.batch-poster.max-size which control the maximum uncompressed batch size assembled by the batch poster. This should allow the batch poster to create very large batches which are broken up into small chunks to be sent to the committee servers.

Once the client has received confirmation to its das_startChunkedStore request, it sends chunks in parallel using das_sendChunk, then once all chunks are sent uses das_commitChunkedStore to cause the data to be stored in the server and to retrieve the signed response to aggregate into the Data Availability Certificate.

Server-side metrics are kept largely the same between chunked and non-chunked stores to minimize dashboard/alerting changes. In the context of chunked transfers, the metrics mean as follows:
arb_das_rpc_store_requests Count of initiated chunked transfers
arb_das_rpc_store_success Successful commits of chunked transfers
arb_das_rpc_store_failure Failure at any stage of the chunked transfer
arb_das_rpc_store_bytes Bytes committed
arb_das_rpc_store_duration Total duration of chunked transfer (ns)

Additionally two new metrics have been added to count individual das_sendChunk requests:
arb_das_rpc_sendchunk_success
arb_das_rpc_sendchunk_failure

These are separate concerns, but were mixed together in one class.
This adds support for the batch poster in AnyTrust configuration to send
batches to the DA committee in chunks of a configurable maximum HTTP
POST body size. It adds new RPC methods to the daserver executable's RPC
server, das_startChunkedStore, das_sendChunk, das_commitChunkedStore,
which the clients created by the batch poster will automatically use if
they are detected to be available on the committee server, otherwise it
will fall back to the legacy das_store method. This allows an easy roll
out of either the client or server first.

The payloads of the new RPC methods are all signed by the batch
poster. As basic DoS prevention, at most 10 uncommitted stores can be
outstanding, uncommitted stores expire after a minute, and a
das_startChunkedStore with the same arguments is not replayable after a
minute. The batch poster should only be trying to store one batch at a
time, so this should be sufficient.

The new option
--node.data-availability.rpc-aggregator.max-store-chunk-body-size is
expressed in terms of the HTTP POST body size that the operator wants
the chunk requests to stay under. 512B of padding is also added to
whatever the user operator specifies here, since some proxies or
endpoints may additionally count headers. This is orthogonal to settings
like --node.batch-poster.max-size which control the maximum uncompressed
batch size assembled by the batch poster. This should allow the batch
poster to create very large batches which are broken up into small
chunks to be sent to the committee servers.

Once the client has received confirmation to its das_startChunkedStore
request, it sends chunks in parallel using das_sendChunk, then once all
chunks are sent uses das_commitChunkedStore to cause the data to be
stored in the server and to retrieve the signed response to aggregate
into the Data Availability Certificate.

Server-side metrics are kept largely the same between chunked and
non-chunked stores to minimize dashboard/alerting changes. In the
context of chunked transfers, the metrics mean as follows:
arb_das_rpc_store_requests - Count of initiated chunked transfers
arb_das_rpc_store_success  - Successful commits of chunked transfers
arb_das_rpc_store_failure  - Failure at any stage of the chunked transfer
arb_das_rpc_store_bytes    - Bytes committed
arb_das_rpc_store_duration - Total duration of chunked transfer (ns)

Additionally two new metrics have been added to count individual
das_sendChunk requests:
arb_das_rpc_sendchunk_success
arb_das_rpc_sendchunk_failure
@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label May 25, 2024
amsanghi
amsanghi previously approved these changes Jun 3, 2024
Copy link
Contributor

@amsanghi amsanghi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

anodar
anodar previously approved these changes Jun 3, 2024
ganeshvanahalli
ganeshvanahalli previously approved these changes Jun 3, 2024
Copy link
Contributor

@ganeshvanahalli ganeshvanahalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joshuacolvin0 joshuacolvin0 dismissed stale reviews from amsanghi, ganeshvanahalli, and anodar via b1edb9b June 4, 2024 01:13
@Tristan-Wilson Tristan-Wilson enabled auto-merge June 6, 2024 16:24
@Tristan-Wilson Tristan-Wilson merged commit 2363b04 into master Jun 6, 2024
11 checks passed
@Tristan-Wilson Tristan-Wilson deleted the das-batch-store-chunking branch June 6, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants