AWS S3 added listObjects endpoint including common prefixes for a delimiter #2023

an-tex · 2019-11-20T09:00:25Z

Purpose

The listObjects endpoint allows to specify a delimiter, which is used to return only one hierarchy level of objects. This allows for a directory style browsing (like in the AWS S3 console) without having to retrieve all objects.

References

References #2021

Changes

added listObjects function taking an optional delimiter parameter

Background Context

I've choosen to create a new function instead of changing the existing listBucket one. The listBucket function returns a ListBucketResultContents, which can't be a common prefix. Compared to the actual listObjects endpoint, which can return both, ListBucketResultContents and ListBucketResultCommonPrefixes (API, Guide).

To reflect the two kinds of stream events, the new listObjects Source emits a ListBucketResultBase trait, which can either be a ListBucketResultContents or ListBucketResultCommonPrefixes.

This change would have broken existing implementations, that's why I opted to create a new function. Does this make sense?

I postponed updating the documentation until this approach is validated.

lightbend-cla-validator · 2019-11-20T09:00:27Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

an-tex · 2019-11-20T09:04:31Z

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

I've added the missing email address and signed the CLA

an-tex · 2019-11-21T08:02:49Z

I'm not sure what the story with the failing build is. The same test is failing for me in Intellij, but even before my commits. When I run it in SBT directly it's fine for both.

seglo · 2019-11-21T12:58:20Z

I'm not sure what the story with the failing build is. The same test is failing for me in Intellij, but even before my commits. When I run it in SBT directly it's fine for both.

That's alright. We've previously identified this as a flakey test. It needs further investigation, but it may just need a longer timeout since travis VMs can be rather underpowered.

#1802

Thanks for the PR! I'll do a review later today.

seglo

It seems that there is a lot of duplicate code between the listBucket and listObjects DSL and implementation. It looks like it would be natural to merge ListBucketResultCommonPrefixes into ListBucketResultContents and make the prefix property optional and only set when a prefix is used in the original query. The only downside to this is that it would require modifying ListBucketResultContents which would break backwards compatibility for users who use this type in tests, but we could add constructor overloads to support the additional field. Also, since it's optional, it has an obvious base case in the existing constructor of None.

Before I review any further I want to understand your rationale and general thoughts about consolidating this return type in case I misunderstood something.

EDIT: I thought about it a little more and I see how it makes sense to treat them differently. It would be nice to DRY up the code in S3Stream though. Maybe you could pass a function from listObjects and listBucket to a consolidated implementation method in S3Stream that will construct the return type you want. For the object with prefix results, you could return a tuple of (ListBucketResultContents, ListBucketResultCommonPrefixes) instead of trying to combine them together with the new base trait you added.

seglo · 2019-11-21T19:34:10Z

s3/src/main/scala/akka/stream/alpakka/s3/javadsl/S3.scala

+   *
+   * The `alpakka.s3.list-bucket-api-version` can be set to 1 to use the older API version 1
+   *
+   * @see https://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html  (version 1 API)


I assume this is actually the v2 API. Can you update this reference and the one in listBucket javadoc?

Those links seem to be outdated and redirect to https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html (and https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html accordingly). I'll update the links and fix the version numbers

Please do, see eg. #2024

done. i've updated the other s3 api links too

seglo · 2019-11-21T19:36:59Z

s3/src/main/scala/akka/stream/alpakka/s3/javadsl/S3.scala

+   * @param s3Headers any headers you want to add
+   * @return [[akka.stream.scaladsl.Source Source]] of [[ListBucketResultBase]]
+   */
+  def listObjects(bucket: String,


It seems that this should be another listBucket overload. Is there a reason you gave it a different name?

Or better yet, listBucketWithPrefix to accommodate the different return type.

listBucketWithPrefixes could work. Then we could make the delimiter parameter even non-optional.

I've taken the listObjects name from the actual S3 API Naming as I was shortly confused myself which alpakka S3 function is linked to which API call. But I guess this only makes sense if in the long run the listBucket function gets deprecated and completely replaced by listObjects*. Until then it would probably just confuse more. So yea lets make it listBucketWithPrefixes

Ahh it seems this "list contents/prefixes" endpoint used to be called "GET Bucket (List Objects)" in the documentation but has been renamed to "ListObjectsV2". Now I get where the original name is coming from.

We still stick to listBucket and listBucketWithPrefixes? (or better listBucketWithCommonPrefixes to be accurate?)

an-tex · 2019-11-22T08:34:49Z

Thanks for the feedback. Sorry about the DRY, it seemed some discussion would follow so I wanted to keep it WET and wait with premature optimisations until we've decided the final interface. Probably should have mentioned that ;) Once it's in place I'll follow your guide.

For the object with prefix results, you could return a tuple of (ListBucketResultContents, ListBucketResultCommonPrefixes) instead of trying to combine them together with the new base trait you added.

One streaming event can be EITHER a Contents OR a CommonPrefixes item. So I'm not sure how returning a tuple would work here. The only way to keep it separate would be to return a tuple of Sources with those types.

seglo · 2019-11-22T15:26:45Z

Thanks for the feedback. Sorry about the DRY, it seemed some discussion would follow so I wanted to keep it WET and wait with premature optimisations until we've decided the final interface. Probably should have mentioned that ;) Once it's in place I'll follow your guide.

Sounds good!

One streaming event can be EITHER a Contents OR a CommonPrefixes item. So I'm not sure how returning a tuple would work here. The only way to keep it separate would be to return a tuple of Sources with those types.

I see. Thanks for the clarification. I understand now why you took this approach. Since we're exploding the contents results there's no real appropriate place to include the prefixes result too, so you've added them to the stream as elements. This leaves it to the end user to match on the correct concrete type. I suppose one alternative would be to add a reference to the prefixes in every ListBucketResultContents emitted for that response. That would keep the return type the same. WDYT?

an-tex · 2019-11-22T15:31:24Z

I suppose one alternative would be to add a reference to the prefixes in every ListBucketResultContents emitted for that response. That would keep the return type the same. WDYT?

You mean every ListBucketResultContents element contains an additional field of type Source[ListBucketResultCommonPrefixes] as the reference?

seglo · 2019-11-22T15:44:21Z

You mean every ListBucketResultContents element contains an additional field of type Source[ListBucketResultCommonPrefixes] as the reference?

I mean iterate over ListBucketResult.contents and emit downstream a new type like ListBucketResultContents, but that includes a Seq[ListBucketResultCommonPrefixes].

Though your reply made me think of another approach. Instead of merging results and common prefixes into a single stream you could return a Source for each, which the user could then handle separately.

an-tex · 2019-11-22T16:16:45Z

I mean iterate over ListBucketResult.contents and emit downstream a new type like ListBucketResultContents, but that includes a Seq[ListBucketResultCommonPrefixes].

I think the CommonPrefixes should be Elements of a Source as well as there might still be a lot of them.

Though your reply made me think of another approach. Instead of merging results and common prefixes into a single stream you could return a Source for each, which the user could then handle separately.

That's what I meant earlier by a tuple of sources. I prolly should use more code as examples instead of confusing descriptions ;)

How about this interface:

  def listBucketWithCommonPrefixes(
                  bucket: String,
                  delimiter: String,
                  prefix: Option[String] = None,
                  s3Headers: S3Headers = S3Headers.empty
): Source[(ListBucketResultCommonPrefixes, ListBucketResultContents), NotUsed]

This way the user could consume both independently. Even though I wonder if we should internally keep the common base trait. Then the user could still opt in to concat both sources and do pattern matching on a single stream (similar to the original approach).

seglo · 2019-11-22T17:48:42Z

I think the CommonPrefixes should be Elements of a Source as well as there might still be a lot of them.

True, but they've already been deserialized to a Seq from that page/response, so referencing that Seq multiple times shouldn't be a concern, though it could be confusing if the common prefixes are paged as well.

Source[(ListBucketResultCommonPrefixes, ListBucketResultContents), NotUsed]

Do you mean this:

Source[(Source[ListBucketResultCommonPrefixes, NotUsed], Source[ListBucketResultContents, NotUsed]), NotUsed]

an-tex · 2019-11-22T19:09:17Z

True, but they've already been deserialized to a Seq from that page/response, so referencing that Seq multiple times shouldn't be a concern, though it could be confusing if the common prefixes are paged as well.

Exactly, the CommonPrefixes should be treated just like the Contents especially due to the paging.

Source[(ListBucketResultCommonPrefixes, ListBucketResultContents), NotUsed]

Do you mean this:
Source[(Source[ListBucketResultCommonPrefixes, NotUsed], Source[ListBucketResultContents, NotUsed]), NotUsed]

Ah sorry, I actually meant:

  def listBucketWithCommonPrefixes(
                                    bucket: String,
                                    delimiter: String,
                                    prefix: Option[String] = None,
                                    s3Headers: S3Headers = S3Headers.empty
                                  ):
  (
    Source[ListBucketResultCommonPrefixes, NotUsed],
    Source[ListBucketResultContents, NotUsed]
  ) = ???

Here are a few usage use cases I'd see with this interface:

// I don't care about the contents, only commonPrefixes
val (commonPrefixes, _) = listBucketWithCommonPrefixes("bucket","/")
commonPrefixes.runForeach(???)

// I care about both separately
val (commonPrefixes, contents) = listBucketWithCommonPrefixes("bucket","/")
commonPrefixes.runForeach(???)
contents.runForeach(???)

// That's not really a use case as you can just use the existing listBucket function if you're only interested in the contents
val (_, contents) = listBucketWithCommonPrefixes("bucket","/")

// if we keep the ListContentsResultBase trait one could even merge them to one single stream and use pattern matching later
val (commonPrefixes, contents) = listBucketWithCommonPrefixes("bucket","/")
val merged: Source[ListBucketResultBase, NotUsed] = commonPrefixes.merge(contents)
  merged.runForeach {
    case contents: ListBucketResultContents => ???
    case prefixes: ListBucketResultCommonPrefixes => ???
  }

EDIT: For javadsl we could use akka.japi.Pair

seglo · 2019-11-22T20:33:10Z

I see. That could work. I would have thought this would be the most common use case:

// That's not really a use case as you can just use the existing listBucket function if you're only interested in the contents
val (_, contents) = listBucketWithCommonPrefixes("bucket","/")

Because you're using the delimiter to essentially filter one level of results, but you don't care about the common prefixes.

Do you want to experiment with this API?

seglo · 2019-11-22T20:35:12Z

// if we keep the ListContentsResultBase trait one could even merge them to one single stream and use pattern matching later
val (commonPrefixes, contents) = listBucketWithCommonPrefixes("bucket","/")
val merged: Source[ListBucketResultBase, NotUsed] = commonPrefixes.merge(contents)
  merged.runForeach {
    case contents: ListBucketResultContents => ???
    case prefixes: ListBucketResultCommonPrefixes => ???
  }

I don't think keeping ListBucketResultBase is that useful. It seems odd to me to have these two different things in one stream. If the user really wants to do that they can do it themselves.

an-tex · 2019-11-25T08:53:16Z

Because you're using the delimiter to essentially filter one level of results, but you don't care about the common prefixes.

True!

Do you want to experiment with this API?

Yep I'll update the PR

an-tex · 2019-11-25T09:30:23Z

I don't think keeping ListBucketResultBase is that useful. It seems odd to me to have these two different things in one stream. If the user really wants to do that they can do it themselves.

My current use case is similar to the S3 console inside AWS: A "directory" based browser, listing both, files and commonPrefixes. Pretty much like an ls in the shell. So I need exactly that one stream ;)

Doing it yourself without a common base trait would involve mapping to an Either and then pattern match later again. It's not too bad but seems rather unnecessarily clunky:

  val (commonPrefixes, contents) = listBucketWithCommonPrefixes("bucket","/")
  
  val commonPrefixesEither = commonPrefixes.map(Right(_))
  val contentsEither = contents.map(Left(_))
  
  commonPrefixesEither.merge(contentsEither).runForeach {
    case Left(content) => ???
    case Right(commonPrefix) => ???
  }

Or am I missing some easier way?

I'd have thought leaving a common base trait wouldn't affect the API directly anyway so it does not hurt but makes it easier for the end user in case he needs it. In any case, that's not a show stopper for me I'll continue doing the other changes.

an-tex · 2019-11-25T12:54:24Z

Well, the implementation of the Tuple Source seems to be tricky. I'm now down to having a

Source[(Seq[ListBucketResultCommonPrefixes], Seq[ListBucketResultContents]), NotUsed]

(every element is a page from the result)
but I don't know how to turn that into the desired

(
    Source[ListBucketResultCommonPrefixes, NotUsed],
    Source[ListBucketResultContents, NotUsed]
  )

without forcing the client to use custom graph logic like e.g. described in https://stackoverflow.com/questions/38438983/split-akka-stream-source-into-two

Do you have any suggestions @seglo ?

Otherwise returning to the original single source using a base trait might be the better solution after all? You can still fairly easy consume e.g. only the contents:

 def listBucketWithCommonPrefixes(
                                    bucket: String,
                                    delimiter: String,
                                    prefix: Option[String] = None,
                                    s3Headers: S3Headers = S3Headers.empty
                                  ): Source[ListBucketResultBase, NotUsed] = ???

  val contentsOnly: Source[ListBucketResultContents, NotUsed] =
    listBucketWithCommonPrefixes("bucket", "/").collect {
      case contents: ListBucketResultContents => contents
    }

EDIT: The only things comes to mind is using two Source.queue and feeding them but that seems a pretty dodgy workaround

seglo · 2019-11-25T19:56:00Z

My current use case is similar to the S3 console inside AWS: A "directory" based browser, listing both, files and commonPrefixes. Pretty much like an ls in the shell. So I need exactly that one stream ;)

I see. I'm unfamiliar with this AWS S3 utility. When you do a ls does it mix common prefixes with results? Could you paste a snippet?

Well, the implementation of the Tuple Source seems to be tricky. I'm now down to having a
Source[(Seq[ListBucketResultCommonPrefixes], Seq[ListBucketResultContents]), NotUsed]
(every element is a page from the result)
but I don't know how to turn that into the desired

I think the return type you mention here is fine. I tried playing around with this as well and discovered that it doesn't really make sense to return the nested sources since they would be emitted for every result anyway.

I realize this makes your use case a little more tricky, but I think it's still the best option over mixing the elements together with a common base trait. We could also forego returning a tuple/pair and just return the concrete upstream result object (or something like it).

I think you already have this code, but in case it's useful I pushed my experimental code as a demonstration.

https://github.com/an-tex/alpakka/pull/1/files

…e empty

an-tex · 2019-11-26T11:54:08Z

Alright, I've pushed it all. The only main change I made to your code was making the delimiter parameter mandatory. As otherwise the commonPrefixes would always be empty and the function would behave exactly like the original listBucket.

Otherwise:

seglo

Looking really good. I left a few comments.

Something else I was considering was adding a new listBucket overload that only returns ListBucketResultContents for the one hierarchy level since that's going to be a common use case. It could have the same return type as the other listBuckets because it's just flattening out the one sequence and skipping common prefixes. We can do this in another PR though.

Thanks for your patience!

docs/src/main/paradox/s3.md

s3/src/main/scala/akka/stream/alpakka/s3/scaladsl/S3.scala

s3/src/main/scala/akka/stream/alpakka/s3/model.scala

seglo · 2019-11-26T14:02:22Z

s3/src/test/scala/docs/scaladsl/S3SourceSpec.scala

+  it should "list keys and common prefixes for a given bucket with a prefix and delimiter using the version 1 api" in {
+    mockListBucketAndCommonPrefixesVersion1()
+
+    //#list-bucket-and-common-prefixes-attributes


I don't see a reference to this code snippet in docs. These types of comments anchor code snippets that get extracted into paradox.

true, that was c&p from the listBucket which is used for the S3 Attributes example. I guess it's still ok to test the passing of the attributes but no need for the paradox comments. I'll get rid of them

Co-Authored-By: Sean Glover <[email protected]>

an-tex · 2019-11-26T15:47:25Z

Something else I was considering was adding a new listBucket overload that only returns ListBucketResultContents for the one hierarchy level since that's going to be a common use case. It could have the same return type as the other listBuckets because it's just flattening out the one sequence and skipping common prefixes. We can do this in another PR though.

I've added it. Have a look if that's what you're after 3ea4954 .

seglo · 2019-11-26T16:13:12Z

I've added it. Have a look if that's what you're after 3ea4954 .

Exactly what I was looking for. Thanks!

…ting

seglo · 2019-11-26T20:56:08Z

Some stage 1 check failures. If you look at the right-most field it gives you the command you can run locally.

https://github.com/akka/alpakka/pull/2023/checks?check_run_id=321556931

…llection conversions

an-tex · 2019-11-27T11:12:04Z

Some stage 1 check failures. If you look at the right-most field it gives you the command you can run locally.

Thanks for the tip. Was some back and forth due to cross compilation, looking good now!

seglo

LGTM

seglo · 2019-11-27T14:03:04Z

Thanks for the contribution @an-tex ! I appreciated your patience during the discussion :)

an-tex · 2019-11-27T14:11:58Z

Thanks for the contribution @an-tex ! I appreciated your patience during the discussion :)

Glad I could contribute! Thanks for the guidance @seglo 👍

an-tex added 2 commits November 20, 2019 09:37

added listObjects endpoint including common prefixes for a delimiter

7c78fed

ListBucketResultBase should be sealed

e43c5be

probot-autolabeler bot added the p:aws-s3 label Nov 20, 2019

seglo self-assigned this Nov 21, 2019

seglo reviewed Nov 21, 2019

View reviewed changes

updated s3 api links and version typo fix

397dcef

Return tuple of contents and common prefixes

e7fd31c

an-tex added 4 commits November 26, 2019 09:57

delimiter should be mandatory, otherwise commonPrefixes will always b…

5719fe8

…e empty

added S3SourceSpec for listBucketAndCommonPrefixes

75cbd55

refactoring removing duplicated code in listBucket* functions

e72e765

added listBucketAndCommonPrefixes documentation

a93fefe

probot-autolabeler bot added the documentation label Nov 26, 2019

Merge remote-tracking branch 'upstream/master'

1ce94bd

seglo reviewed Nov 26, 2019

View reviewed changes

an-tex and others added 7 commits November 26, 2019 15:27

Update docs/src/main/paradox/s3.md

e481d1f

Co-Authored-By: Sean Glover <[email protected]>

Update s3/src/main/scala/akka/stream/alpakka/s3/scaladsl/S3.scala

3d5ee4a

Co-Authored-By: Sean Glover <[email protected]>

removed base trait ListBucketResultBase

fda7776

Merge branch 'master' of github.com:an-tex/alpakka

ff4b64d

removed unnecessary paradox comments

f194074

updated scaladoc return types

cde64ff

added listBucket overload taking a delimiter

3ea4954

fixed scala 2.13 compilation as it enforces a stricter collection cas…

f668010

…ting

cross compilation fix as scala 2.13 changed behaviour of immutable co…

97265ab

…llection conversions

seglo self-requested a review November 27, 2019 13:59

seglo approved these changes Nov 27, 2019

View reviewed changes

seglo merged commit eab5486 into akka:master Nov 27, 2019

This was referenced Nov 27, 2019

Fixes typo in ScalaDoc #2024

Closed

AWS S3 List only one hierarchy level of objects by using the delimiter parameter #2021

Closed

AWS S3 added listObjects endpoint including common prefixes for a delimiter #2023

AWS S3 added listObjects endpoint including common prefixes for a delimiter #2023

Conversation

an-tex commented Nov 20, 2019

Purpose

References

Changes

Background Context

lightbend-cla-validator commented Nov 20, 2019

an-tex commented Nov 20, 2019

an-tex commented Nov 21, 2019

seglo commented Nov 21, 2019

seglo left a comment • edited Loading

Choose a reason for hiding this comment

seglo Nov 21, 2019

Choose a reason for hiding this comment

an-tex Nov 22, 2019

Choose a reason for hiding this comment

ennru Nov 22, 2019

Choose a reason for hiding this comment

an-tex Nov 22, 2019 • edited Loading

Choose a reason for hiding this comment

seglo Nov 21, 2019

Choose a reason for hiding this comment

seglo Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

an-tex Nov 22, 2019

Choose a reason for hiding this comment

an-tex Nov 22, 2019 • edited Loading

Choose a reason for hiding this comment

an-tex commented Nov 22, 2019

seglo commented Nov 22, 2019

an-tex commented Nov 22, 2019

seglo commented Nov 22, 2019

an-tex commented Nov 22, 2019

seglo commented Nov 22, 2019

an-tex commented Nov 22, 2019 • edited Loading

seglo commented Nov 22, 2019

seglo commented Nov 22, 2019 • edited Loading

an-tex commented Nov 25, 2019

an-tex commented Nov 25, 2019

an-tex commented Nov 25, 2019 • edited Loading

seglo commented Nov 25, 2019

an-tex commented Nov 26, 2019

seglo left a comment

Choose a reason for hiding this comment

seglo Nov 26, 2019

Choose a reason for hiding this comment

an-tex Nov 26, 2019

Choose a reason for hiding this comment

an-tex commented Nov 26, 2019

seglo commented Nov 26, 2019

seglo commented Nov 26, 2019

an-tex commented Nov 27, 2019

seglo left a comment

Choose a reason for hiding this comment

seglo commented Nov 27, 2019

an-tex commented Nov 27, 2019

seglo left a comment •

edited

Loading

an-tex Nov 22, 2019 •

edited

Loading

seglo Nov 21, 2019 •

edited

Loading

an-tex Nov 22, 2019 •

edited

Loading

an-tex commented Nov 22, 2019 •

edited

Loading

seglo commented Nov 22, 2019 •

edited

Loading

an-tex commented Nov 25, 2019 •

edited

Loading