-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MinIO cleanup #1
Comments
Comments? Suggestions? |
I guess I can add a CREATE verb for the endpoint but it doesn't seem useful given that other services are talking directly to minio |
I am not sure that we need a persistent microservice for this. There is a K8s resource named CronJob which allows you to schedule recurring tasks. We could probably just add one of these to the ServiceX Helm chart. It's not in scope, but I believe @ivukotic has mentioned that our x509 proxy microservice could also be refactored into one of these CronJobs. One thing of note that I see in the CronJob docs:
I don't think this will be an issue for cleanup/archival of old transformation requests and MinIO storage. |
This sounds like a thin wrapper around the MinIO client itself. I think that we should put all such logic in the ObjectStoreManager (which is really a MinIO adaptor). Then the API server can make calls to the appropriate methods when needed. Basically, I don't think we should be thinking of this as deleting directly from MinIO. Instead, we should think of it as archiving ServiceX transformation requests, which would entail setting a flag in the database as well as deleting the output files from object storage. We should try to make this an atomic operation. |
I see this as a building block for future functionality that'll need a persistent microservice. Although a cronjob can do some of this, the bucket deletion will be useful for transform request removal functionality. Basically the removal functionality could just make the REST call and then work on the other things needed to remove a transform. Also, the bucket size calculation could be called when the api endpoint knows that the transform is completed. It can trigger the calculation and then add that to postgresql while it's updating the transform request information. This amortizes the cost of traversing the buckets. Neither of these are possible with a k8s cronjob. I don't have any problem with putting the functionality in the object store and then getting the functionality through there. I don't think we can make updating the DB and deleting the object files atomic. MinIO requires you to delete the objects in a bucket individually and then the empty bucket. Without additional support from MinIO, I don't see how we can make this atomic. I see this microservice as a compositional tool that can be used by the servicex_api server to do the archival. The microservice will handle removing the saved data for a transformation and the servicex_api server can use this with potentially other bits to create a transform request archive workflow. |
Don't we have output size known and reported at the end of each transformer? Do we have this summed up per request somewhere in the DB? If not, it would be great to have, and shown on the web dashboard. Next we should have a nice endpoint that does the deep cleanup of a request:
Once this is in place, and we have a configuration parameter saying what is the size of the storage (in GB not percents), we could simply sum up output size of all the requests. Before processing any new request, we do the sum up and if the sum is above the threshold we call the deep cleanup on the oldest request. |
I agree with @AndrewEckart - this should be a simple cronjob with a new endpoint on the App server. I'm a firm believer in The YAGNI Principal - don't build stuff now for what we might or might not eventually need. I don't understand the idea of an endpoint to cleanup running transformers. That code should just be in the What we do need is an endpoint to archive an transform. Eventually we would want an endpoint to restore a transform by re-generating the output. One principal of micro service architecture is to never share the database between services. All interactions with our database should go through endpoints on the app. With this in mind, I don't see how this cleanup service could need a separate database. So I propose:
We might want an endpoint to report total space used in the object store. We want this in the app since we have stories to use external object stores and don't want Minio libraries to extend beyond the app. |
Yeah, I would suggest that we put this at either If we are going to keep the records in the database and simply flag them with Suchandra makes a good point in that we may not be able to make it truly atomic. From a quick search, it doesn't seem like MinIO has any transaction features. We also need to distinguish between cleanup that happens when all files are done, versus cleanup that happens when a request is archived/deleted. Ilija lists 5 tasks. These 3 tasks should be done as soon as all files are transformed:
The last 2 tasks should only happen when the request is archived (first one only) or deleted (both):
So one of the first decisions we need to make is whether we want to archive requests or delete them outright. If both options are desired, we could have two endpoints. |
it is not an endpoint to cleanup running transformers but to completely cleans up a request. Imagine request fails halfway. You still want running transformers killed and that bucket cleaned up. Or a user decides that (s)he made a mistake and does not need that request at all. There should be a way to deep clean up.
This might need. But it is actually this that falls under YAGNI. I would not call it archiving but a "shallow" clean. Again, imagine request fails halfway, I might way to re-try it. So we would need to clean up everything but a configmap, and request source.
Nobody asked for creation of another database. Current database should have info on how big is the request output. |
Ah, I see, @ivukotic - this is just a terminate running transform endpoint which would be called by the dashboard if you click cancel. I agree completely that we need this. Consequently this action also needs to delete the transform deployment to shut down all running workers |
|
|
Going all the way back to the start... for keeping the Here are the AF and user situations I can think of, given a query's data has been deleted from
So - is there a reason to more tightly couple things inside I'm definitely ➕1 on the idea of a cancel button. Sometimes it is annoying when I see myself wasting resources and contributing to global warming via a type-o. ;-) Once caching exists, one could imagine a clean up corn job that would look for different request-id's with the same queries - e.g. duplicate data. |
The available minio policies is a time-based one for buckets. I think that'll be suboptimal because it might delete results before we really need to. The minio policies for handling quota limits a bucket to a hard size or will delete oldest objects in the bucket as new objects get created in order to stay under the quota. The problem with letting minio do cleanup is that servicex won't know the status of the results since there's no communication when minio deletes objects. I do agree with the use cases that you've outlined and think that they're valid. What I'd like to initially do is just to implement a microservice that keeps servicex storage usage under a given threshold. We can build from there, but I think this is probably the minimum we'll need to get this working in production. I think the bigger discussion about terminating workflows is useful to have but I'm hoping to keep this focused primarily on how to handle storage needs. As @AndrewEckart suggested, I'm open to using the |
Does Do we know how large the |
Minio has a policy to delete a objects within a bucket based on age, I don't think there's a way to delete the old bucket though. We've been using non-persistent minio storage so far so everything gets cleared when you redeploy an instance. |
Background
ServiceX currently doesn't cleanup the results from a transform request. To work effectively in production, we need to add the ability for ServiceX to cleanup after itself. Initially this will focus on MinIO persistent storage but should be general enough that we can easily extend this to other storage solutions in the future.
MinIO's builtin facilities don't help with this. Bucket lifecycle policies will delete contents of a bucket after a fixed time. FIFO quota policies will delete older objects in order to free space for newer objects when the quota is reached. Both methods delete objects without informing ServiceX
Proposed Solution
A solution for this would be to create a microservice that handles tracking objects stored in persistent storage and deleting them as needed in order to keep storage utilization under a specified quota.
Tracking Storage Utilization
MinIO doesn't have any easy way to track the space used by a bucket. Consequently, it requires some effort to even get the space used by all the transformation results. The current best practice is to iterate over objects within a bucket and then sum up the space used by each object.
Given the ServiceX workflow, we can have this service scan the MinIO storage once a day and get the bucket sizes for any new buckets. Since the outputs of a transform are immutable, we can store this information within the postgresql database and only need to scan a bucket once to get it's size.
Deletion Policy
The service can initially use a policy of deleting the oldest buckets until the storage used is under a configurable threshold in order to ensure that transforms don't run out of space. The service should probably default to a high water mark of 85% but this will be configurable.
API Interface
The service will have a single endpoint that exposes a simple API that should suffice for ServiceX activities:
GET /transform_data/size?id=ID&type=minio
will scan a specified bucket and return the size of that bucketDELETE /transform_data?id=ID&type=minio
will attempt to delete a specified bucketThe text was updated successfully, but these errors were encountered: