Releases: mle-infrastructure/mle-scheduler
Releases Β· mle-infrastructure/mle-scheduler
Minor fixes π§
[v0.0.8] - [08/2024]
- Fix
module load
for Slurm
Minor fixes π§
[v0.0.7] - [08/2024]
- Minor fixes in cleaning up
Minor fixes π§
[v0.0.6] - [03/2023]
- Minor fixes in notebook
`delete_config`, `debug_mode`, `automerge_configs`
- Adds
MLEQueue
option to delete config after job has finished (delete_config
) - Adds
debug_mode
option to storestdout
&stderr
to files - Adds merging/loading of generated logs in
MLEQueue
w.automerge_configs
option - Use system executable python version
`mle-logging` merging & multi-partition/queue scheduling πΊ
- Track config base strings for auto-merging of mle-logs & add
merge_configs
- Allow scheduling on multiple partitions via
-p <part1>,<part2>
& queues via-q <queue1>,<queue2>
Welcome to MLE-Scheduler v0.0.3 π€
Fix imports
Welcome to MLE-Scheduler v0.0.2 π€
Pump version string
Welcome to MLE-Scheduler
First release π€ implementing core API of MLEJob
and MLEQueue
# Each job requests 5 CPU cores & 1 V100S GPU & loads CUDA 10.0
job_args = {
"partition": "<SLURM_PARTITION>", # Partition to schedule jobs on
"env_name": "mle-toolbox", # Env to activate at job start-up
"use_conda_venv": True, # Whether to use anaconda venv
"num_logical_cores": 5, # Number of requested CPU cores per job
"num_gpus": 1, # Number of requested GPUs per job
"gpu_type": "V100S", # GPU model requested for each job
"modules_to_load": "nvidia/cuda/10.0" # Modules to load at start-up
}
queue = MLEQueue(
resource_to_run="slurm-cluster",
job_filename="train.py",
job_arguments=job_args,
config_filenames=["base_config_1.yaml",
"base_config_2.yaml"],
experiment_dir="logs_slurm",
random_seeds=[0, 1]
)
queue.run()