Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datalake inputstream #21322
Datalake inputstream #21322
Changes from 19 commits
ef843cf
474b2f3
cc4ef6e
67aa7c1
60dbe03
bf4fa8b
bd18062
04cf919
a6cddf9
4036751
316f15a
b0e5443
04bec4d
3219ba3
33142ad
9b07d27
ecb524e
fd4359b
4683228
5dee388
2325932
0a04c1a
5c43a92
c5fa811
e0bac3f
529e09a
18939cc
8b7c0a4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fileClient delegtes reads to blockblob client internally.
Maybe we should delegate opening a stream to blockblob client as well and just make this class an adapter, i.e. keep reference to stream from block blob and just proxy calls there? If we were returning plain InputStream from API we could just return stream from blobclient, but since we expose some extra properties adapter would be needed.
I'd consider this - writing an adapter is easier than maintaining two versions of logic that works on bytebuffers and offsets.
This is what dotnet does https://github.com/Azure/azure-sdk-for-net/blob/3f38e290bfc1b1579baa4abf329a3861355796f1/sdk/storage/Azure.Storage.Files.DataLake/src/DataLakeFileClient.cs#L3869-L3873
Or maybe we don't look at blobs and just return InputStream ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So most of the decisions were made with the intent to mirror the blob API shape. It's been expressed multiple times throughout this PR that the blob API for this is undesirable. If we make the decision to break from that design, we can return a class that holds a plain InputStream and the datalake properties separately, then use the block blob inputstream as the base implementation to wrap. Is this acceptable to people? @kasobol-msft @alzimmermsft @gapra-msft @rickle-msft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of what would be returned:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. This should be called
DataLakeFileInputStreamResult
(or whatever matches Result pattern).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be part of the public interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we're kinda locked into this based on the equivalent API in blobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this consistency control graduate into
azure-storage-common
as it should be re-usable in Blobs and maybe Files if we choose to add open read/write functionality there as wellThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the blob package already has their own copy of this. So if we put one in common then there will be confusion in the blobs package. Something we'd need to have caught before that API GA'd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datalake takes dependency on blobs directly? Don't we end up having both types anyway on class path ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong but I think we go out of our way to avoid using blob types in the datalake public API.