-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow custom configuration of S3Client
#102
Conversation
3e4f24a
to
acc3610
Compare
S3Client
acc3610
to
a382843
Compare
@@ -406,7 +410,7 @@ async fn start( | |||
// We require to stream blocks consistently, so we need to try to load the block again. | |||
|
|||
let pending_block_heights = stream_block_heights( | |||
&lake_s3_client, | |||
&*lake_s3_client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dereference the Box
, and pass a reference to the underlying trait object.
fn deref(&self) -> &Self::Target { | ||
&self.0 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially some helpers to make these errors easier to work with
Ok(()) | ||
} | ||
|
||
pub fn s3_client<T: S3Client + 'static>(self, s3_client: T) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom setter which allows passing S3Client
directly, rather than requiring user to wrap in Box
.
}, | ||
#[error("Failed to convert integer: {error:?}")] | ||
IntConversionError { | ||
#[from] | ||
error: std::num::TryFromIntError, | ||
}, | ||
#[error("AWS Smithy byte_stream error: {error:?}")] | ||
AwsSmithyError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3GetError
/S3ListError
now contain both AwsError
and AwsSmithyError
as discussed in "Type Erasure".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great! Thank you for improving the code base ❤️
I am going to try out this changes with some of the existing indexers to see the amount of changes required for the users to adopt it (mostly to figure out how painful it is around the error handling part). I will update you after that (ETA today-tomorrow)
Thanks @khorolets! I already have a PR which integrates this in to QueryAPI, you can take a look here. Edit: That probably isn't that helpful actually, it's essentially the same as the implementation here. I'm working on adding caching, and can ping you once that's done if you're interested. |
Co-authored-by: Bohdan Khorolets <[email protected]>
Co-authored-by: Bohdan Khorolets <[email protected]>
@khorolets Implementation with cache here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the code you've shared and tried this code out and have no objections to release it.
One thing that bothers me a bit is that it will break error handling for those who uses exposed methods from the library since the error have changed a bit.
@morgsmccauley may I ask you to bump the version and update the CHANGELOG.md? Thus I will be able to release the new version with your changes after merging this PR. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bumped the version to 0.7.8
Depends on near/near-lake-framework-rs#102 This PR exposes a new metrics which counts the number of Get requests made to S3 by `near-lake-framework`. I wanted to start tracking this metric _before_ I merge the change which reduces them, so I can measure the impact of that change. The easiest way to track these requests was to pass a custom `S3Client` to `near-lake-framework`, so we can hook in to the actual requests made. The custom `S3Client` (`LakeS3Client`) is exactly the same as the default implementation in `near-lake-framework` itself, but with the added metric. This is essentially part 1 for #419, as the "reduction" in requests will build on this custom client, adding caching/de-duplication.
Previously, the
S3Client
trait is used to mock requests made to S3 so that we can unit test our S3 related logic. This PR allows custom implementations of this trait to be configured and used within Lake Framework. This grants users greater control over the S3 requests made, which would otherwise not be possible via configuration ofs3_config
alone. The main motive for this change is to add caching of requests, but the solution is generic enough that it could be extended to many different use cases.S3Client
trait updatesS3Client
has been updated to make it easier to work with from external crates. Previously, it was a thin wrapper overGetObject
andListObjects
, but it has been abstracted further so the user does not have to deal with the complex return types of these operations.On success, the
S3Client
methods now return primitive types, as opposed to high level request wrappers such asGetObjectOutput
. This makes both caching and mocking much easier.Type Erasure
On failure, we return
dyn std::error::Error
. This allows any error to be returned, rather than the complexSdkError<GetObjectError>
. Without this, every error returned from the method would be re-constructed as an S3 error, which doesn't make sense if the error it self is related to, for example,Mutex
locking.This comes with the downside of errors being "opaque", so responding to specific errors is more difficult. If concrete error types are needed, this opaque error will need to be
downcast
, as you'll see ins3_fetchers.rs
.Each method returns a "newtype" wrapper around
dyn std::error::Error
to provide some separation between the two methods, and make them a little easier to work with.Arc<Error>
instead ofBox<Error>
While
Box
is cheaper and easier to work with, it isn't thread safe. This makes it harder to work with in an async context. Therefore, I've opted to useArc
instead, even though it comes with a slight runtime penalty.Dynamic Dispatch
Previously, the code used static dispatch (
impl S3Client
) to determineS3Client
, meaning the underlying type was known at compile time.impl
is really just short-hand for generics, so makingS3Client
configurable meant that this generic needed to be propagated through all public facing methods - essentially creating a breaking change.To mitigate the above, I have replaced the implementation to use dynamic dispatch
dyn S3Client
. This removes the need for generics, meaning this change can be released without a major version bump.Dynamic dispatch comes with a runtime penalty as the concrete type is determined at runtime. But as most s3 requests are made ahead of time (prefetch), this additional overhead should be negligible.