Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: make disperser client reuse same grpc connection #826

Conversation

samlaf
Copy link
Contributor

@samlaf samlaf commented Oct 22, 2024

Why are these changes needed?

grpc "connections" are actually virtual connections: the docs/specs calls them channels.
They are not a physical underlying tcp/http connection but actually manage underlying connections. Hence, they are not meant to be redialed on every request. They should be created once, and reused for multiple requests, which is what this PR does.

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

}

var _ DisperserClient = &disperserClient{}

func NewDisperserClient(config *Config, signer core.BlobRequestSigner) DisperserClient {
// DisperserClient maintains a single underlying grpc connection to the disperser server,
// through which it sends requests to disperse blobs, get blob status, and retrieve blobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should take care of the max concurrent requests, see https://learn.microsoft.com/en-us/aspnet/core/grpc/performance?view=aspnetcore-8.0#connection-concurrency.

If it reaches the max, the new call get blocked. It can happen if OP uses concurrent blobs spinning up 100 conn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch. We can bump that limit or make it a configuration actually.
But I really don't think it's that big of a worry. OP would have to literally make 100 requests at the same time, which I think is unlikely? How long does a typical DisperseBlobAuthenticated take? < 5secs hopefully?
Also what happens if that limit is reached? it returns a 500 or something? Then batcher would just resubmit the blob a little bit later once the other connections are done processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisperseBlobAuthenticated takes about 300ms for 1MB blob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment: ef1c7a5

@samlaf samlaf requested a review from ian-shim October 22, 2024 21:35
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for these clients?

api/clients/disperser_client.go Show resolved Hide resolved
//
// // Subsequent calls will use the existing connection
// status2, requestId2, err := client.DisperseBlob(ctx, otherData, otherQuorums)
func NewDisperserClient(config *Config, signer core.BlobRequestSigner) *disperserClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we switch the return type to struct pointer vs. interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.m.wikipedia.org/wiki/Robustness_principle instantiated in golang is "take interfaces as input, and return structs" (see https://bryanftan.medium.com/accept-interfaces-return-structs-in-go-d4cab29a301b for eg).

This is not a breaking change because the pointer implements the interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good read. Thanks!

@samlaf
Copy link
Contributor Author

samlaf commented Oct 23, 2024

Do we have tests for these clients?

No we don't! :O
Didn't notice this.. we only have unit tests for the eigenda_client, which use a mock disperser_client.
We do have an eigenda_client e2e test, that I'm trying to add to CI in #820

What do you recommend is best way to test the disperser client? Writing a fake grpc server is prob the easiest way to go right?

EDIT: spoke with @ian-shim, we just merged #820. I just rebased so e2e test should run on this branch to at least test disperser_client.

@samlaf samlaf force-pushed the samlaf/refactor--make-disperser-client-reuse-same-grpc-connection branch from 79e97f1 to fb3d28d Compare October 23, 2024 16:55
@samlaf samlaf merged commit 6894828 into master Oct 24, 2024
9 checks passed
@samlaf samlaf deleted the samlaf/refactor--make-disperser-client-reuse-same-grpc-connection branch October 24, 2024 13:46
ian-shim pushed a commit to ian-shim/eigenda that referenced this pull request Oct 25, 2024
samlaf added a commit that referenced this pull request Oct 30, 2024
A previous commit 6894828 (#826) caused regression by forgetting to set grpc max msg size in retrieveBlob call

Caught by integration test from eigenda-proxy

Added unit-test here to catch any future regression on this feature
samlaf added a commit that referenced this pull request Oct 30, 2024
A previous commit 6894828 (#826) caused regression by forgetting to set grpc max msg size in retrieveBlob call

Caught by integration test from eigenda-proxy

Added unit-test here to catch any future regression on this feature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants