Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cpu hours limit on job submission #186

Merged
merged 1 commit into from
Jan 21, 2025
Merged

Add cpu hours limit on job submission #186

merged 1 commit into from
Jan 21, 2025

Conversation

MrCreosote
Copy link
Member

100 hours default is very conservative, but we can bump that up easily. By way of contrast, 1 NERSC node hour is 256 cpu hours assuming you're using the entire node

100 hours default is very conservative, but we can bump that up easily.
By way of contrast, 1 NERSC node hour is 256 cpu hours assuming you're
using the entire node
@MrCreosote MrCreosote requested a review from Tianhao-Gu January 21, 2025 22:45
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 55.26316% with 17 lines in your changes missing coverage. Please review.

Project coverage is 51.20%. Comparing base (23962d7) to head (55cdab5).
Report is 36 commits behind head on main.

Files with missing lines Patch % Lines
cdmtaskservice/config.py 27.27% 8 Missing ⚠️
cdmtaskservice/job_state.py 33.33% 4 Missing ⚠️
cdmtaskservice/nersc/manager.py 0.00% 3 Missing ⚠️
cdmtaskservice/app_state.py 0.00% 1 Missing ⚠️
cdmtaskservice/models.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #186      +/-   ##
==========================================
- Coverage   51.38%   51.20%   -0.19%     
==========================================
  Files          37       37              
  Lines        2590     2605      +15     
==========================================
+ Hits         1331     1334       +3     
- Misses       1259     1271      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

def get_total_compute_time_sec(self) -> float:
"""
Return the total compute time as seconds for the job, calculated as
cpus * containers * runtime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is runtime a user input estimation? In practice, there are numerous factors that can influence the runtime making get_total_compute_time_sec never accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there's no way to get it otherwise AFAIK - the service doesn't know anything about the containers it's running. And yes, I totally agree, but I don't know how to do better without a lot more work. This is just a check to make sure the user isn't doing something insane; it doesn't need to be super accurate.

One thing I've thought about doing was adding some sort of metadata to the image registration that could be used to estimate a runtime based on the file count and sizes, but doing anything like that would be far in the future.

@MrCreosote MrCreosote merged commit f1f917f into main Jan 21, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants