Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the design of removeObjects #810

Closed
KevinSmile opened this issue Nov 6, 2019 · 4 comments
Closed

the design of removeObjects #810

KevinSmile opened this issue Nov 6, 2019 · 4 comments

Comments

@KevinSmile
Copy link
Contributor

as described in PR #809

MinioClient.removeObjects() is kind of strange and of low efficiency.

MinioClient.removeObjects() returns an Iterable.
This method is doing a DELETE job, but the actual delete job won't execute if you don't iterate over the returned Iterable-value.
Call MinioClient.removeObjects() alone deletes nothing, which is strange.

@vadmeste
Copy link
Member

vadmeste commented Nov 6, 2019

@KevinSmile thanks for your contribution. It is always better to discuss a new change before implementing it to save your time.

I don't see how this could be low efficiency, iterating should be much faster than calling a web service and delete 1000 elements.

Maybe it is your code which is doing something which takes time between iterations.

@KevinSmile
Copy link
Contributor Author

@vadmeste thanks for for your replay!
I think you didn't get my point.

(There are 2 kinds of iterator in this issue.
One is the return value of MinioClient.removeObjects();
the other is about how to generate a batch-toDelete-subList.
batch-toDelete-subList's size should be less than 1000, which is the limit set by s3.
I believe batch operation is faster than one-by-one delete since it saves http-calls.
In fact, the original code also used 1000 as batch-size(in a while loop

while (objectNameIter.hasNext() && i < 1000) {
)
As mentioned in #809, I believe using Guava's Iterables.partition and java8's stream to generate a batch-toDelete-subList is better than the original while-loop.

see batch-size limit:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
https://stackoverflow.com/questions/54255990/cheapest-way-to-delete-2-billion-objects-from-s3-ia)

However, I believe the strange usage of Iterable is the key problem here.

It's a common practice that using iterator to traverse elements of a collection without exposing its underlying representation. I believe iterator should be used as kind of GET method, you should not do something hack (e.g. DELETE) during the iterating, as the original did in the hasNext and next method:

public boolean hasNext() {

public Result<DeleteError> next() {

If you just call MinioClient.removeObjects() without traverse the returned iterator after your call, you will not trigger the hasNext and next method, so the populate won't be executed, thus noting would be DELETED, which is strange.

private synchronized void populate() {

@KevinSmile KevinSmile changed the title the design removeObjects the design of removeObjects Nov 6, 2019
@balamurugana
Copy link
Member

@KevinSmile

As mentioned in #809, I believe using Guava's Iterables.partition and java8's stream to generate a batch-toDelete-subList is better than the original while-loop.

You could send this betterment PR separately

However, I believe the strange usage of Iterable is the key problem here.

It's a common practice that using iterator to traverse elements of a collection without exposing its underlying representation. I believe iterator should be used as kind of GET method, you should not do something hack (e.g. DELETE) during the iterating, as the original did in the hasNext and next method:

This is a design choice we made. Delete objects are unrecoverable, hence its not recommended to delete them in a background. That's the reason of this lazy eval.

@harshavardhana
Copy link
Member

removeObjects is lazy eval clarified in #811 - closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants