implementing storage cleaningsystem with options #311

Gregory-Pereira · 2024-04-28T22:54:23Z

Addresses: #305
/cc @vishnoianil @nerdalert Please take a look.

I am struggling still with the event based strategy. I built a channel that will watch for disk-pressure, but I didn't know how to pass the working directory of the job that pushed the usage past the limit and triggered the channel event.

The .github/workflows/test-at.yaml workflow file will be dropped once I can verify that the at binary ships with Ubuntu 22.04.

Also looking for thoughts on the enum I tried to setup with the CleanupStrategyType, didn't see / know of any comparable Enum examples in our codebase.

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira · 2024-04-28T22:56:13Z

worker/cmd/generate.go

+						pressureEvent = mediumDiskPressureEvent
+					} else {
+						pressureEvent = highDiskPressureEvent
+


Maybe I could set the strategy to immediate if it hits highDiskPressureEvent...?

Gregory-Pereira · 2024-04-29T03:20:45Z

Confirmed at does not ship with ubuntu 22.04 out of the box, will need to find another way. Considering cron but I dont want these jobs kicking around for specific filepaths running every 3 days or so, could result in hundreds of cronjobs. Another potential solution is a seperate queue for cleanup jobs that is time based ...

vishnoianil · 2024-04-30T07:00:41Z

@Gregory-Pereira

Immediate will delete the local data once all the files are uploaded to s3 bucket.
Lazy - It should take duration as an input. For example, it will wake up every two weeks (set by user) and cleanup all the data that is being stored for more than 2 weeks.
Event - It should delete the oldest directories till the disk pressure goes down to 70%.
And i think it would be better if we can leverage the os/fs packages (or any other stable go pkg) for finding the target directories and deleting it, that way we don't have to take care of the OS specific issues, until and unless we hit to road block there.

I think at this point of time, we can just implement a simple Lazy policy and see how well it works. We can implement rest of the policies in the future if needed.

Gregory-Pereira · 2024-05-01T13:18:57Z

@vishnoianil I am confused if this cleanup will take place at a static directory? I was under the assumption that both of these cleaning operations would be after pre-check / generate, in which case wouldn't it be a path to the data specific to that call? It seems odd to use a lazy strategy with regard to data for a specific run. Please let me know if I am misunderstanding something (almost certain I am).

implementing storage cleaningsystem with options

b48d701

Signed-off-by: greg pereira <[email protected]>

Gregory-Pereira force-pushed the 305-worker-post-upload-cleanup-strategy branch from 449407a to b48d701 Compare April 28, 2024 22:54

Gregory-Pereira commented Apr 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementing storage cleaningsystem with options #311

implementing storage cleaningsystem with options #311

Gregory-Pereira commented Apr 28, 2024

Gregory-Pereira Apr 28, 2024

Gregory-Pereira commented Apr 29, 2024

vishnoianil commented Apr 30, 2024

Gregory-Pereira commented May 1, 2024

implementing storage cleaningsystem with options #311

Are you sure you want to change the base?

implementing storage cleaningsystem with options #311

Conversation

Gregory-Pereira commented Apr 28, 2024

Gregory-Pereira Apr 28, 2024

Choose a reason for hiding this comment

Gregory-Pereira commented Apr 29, 2024

vishnoianil commented Apr 30, 2024

Gregory-Pereira commented May 1, 2024