Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clean up job working directory as celery task #15816

Merged
merged 14 commits into from
Nov 18, 2024

Conversation

sanjaysrikakulam
Copy link
Contributor

@sanjaysrikakulam sanjaysrikakulam commented Mar 16, 2023

WHAT:
Adds a new celery task to clean up job working directories for failed jobs that are older than X days
See the initial PR: #15618 for the discussion

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

fixes #15977

@github-actions github-actions bot added this to the 23.1 milestone Mar 16, 2023
Comment on lines 430 to 431
galaxy_log_dir = "/var/log/galaxy"
log_file_name = f"jwds_cleanup_{datetime.datetime.now().strftime('%d_%m_%Y-%I_%M_%S')}.log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, I think this should just be log.info to the system log. I don't want more files i need to manage and ensure are properly rotated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will modify that and push a new commit

sanjaysrikakulam added a commit to sanjaysrikakulam/galaxyproject_galaxy that referenced this pull request Mar 16, 2023
@mvdbeek
Copy link
Member

mvdbeek commented Mar 16, 2023

You don't need to merge dev into your branch unless there are conflicts.

@@ -1,7 +1,10 @@
import json
from concurrent.futures import TimeoutError
import datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make format can fix that for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that and push a new commit.

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

log.error(f"Error deleting job working directory: {jwd_path} : {e.strerror}")

# days should be converted to a config option
days = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move that into the task signature ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

lib/galaxy/celery/tasks.py Outdated Show resolved Hide resolved
lib/galaxy/celery/tasks.py Outdated Show resolved Hide resolved
lib/galaxy/celery/tasks.py Outdated Show resolved Hide resolved
@hexylena
Copy link
Member

Does this supersede #15618 ? should we close that one?

def get_failed_jobs():
failed_jobs = {}
jobs = sa_session.query(model.Job).filter(
model.Job.state == "error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to check against all terminal states here (not in ok, queued, running, new), but maybe there's a better way to do that. I know that @natefoo and I occasionally manually fail jobs with a state liked manually_failed and it'd be good to make sure we're getting all terminal jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can tweak this after the first version is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me

@sanjaysrikakulam
Copy link
Contributor Author

Does this supersede #15618 ? should we close that one?

Yes, I will close that PR now

@bernt-matthias
Copy link
Contributor

Wondering what the advantage of this is over onsuccess plus a cron job that deletes everything older than X days?

@bgruening
Copy link
Member

The cron job can not easily know if the job is still running or when it failed in a generic way.

@jdavcs
Copy link
Member

jdavcs commented Jun 20, 2023

Bumping this to 23.2

@jdavcs jdavcs modified the milestones: 23.1, 23.2 Jun 20, 2023
@bgruening
Copy link
Member

@sanjaysrikakulam can you rebase this PR.

What is needed to get this one in?

@sanjaysrikakulam
Copy link
Contributor Author

@sanjaysrikakulam can you rebase this PR.

Was unable to rebase, so fixed the conflicts manually.

@mvdbeek mvdbeek modified the milestones: 23.2, 24.0 Dec 19, 2023
@jdavcs jdavcs modified the milestones: 24.0, 24.1 Feb 26, 2024
@mvdbeek mvdbeek removed this from the 24.1 milestone May 14, 2024
@mvdbeek mvdbeek self-requested a review May 14, 2024 14:06
@bgruening
Copy link
Member

@sanjaysrikakulam can you please rebase this again, or merge manually?

1. datetime.timedelta takes only float
2. sqlalchemy has .isnot for filtering
…tion

Fixes a type error where Optional[int] caused incompatibility with datetime.timedelta(days=...). Removed the Optional annotation from the days parameter since it always defaults to 5, ensuring days is consistently treated as an integer and avoiding MyPy complaints.
@sanjaysrikakulam
Copy link
Contributor Author

@sanjaysrikakulam can you please rebase this again, or merge manually?

I have rebased it again and fixed lint issues and errors.

@mvdbeek please review.

@mvdbeek mvdbeek merged commit c069907 into galaxyproject:dev Nov 18, 2024
51 of 53 checks passed
@mvdbeek
Copy link
Member

mvdbeek commented Nov 18, 2024

Thanks @sanjaysrikakulam!

Copy link

This PR was merged without a "kind/" label, please correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] New cleanup method: ondelay
6 participants