Skip to content

Releases: mle-infrastructure/mle-scheduler

Minor fixes πŸ”§

27 Aug 07:36
Compare
Choose a tag to compare

[v0.0.8] - [08/2024]

  • Fix module load for Slurm

Minor fixes πŸ”§

26 Aug 13:19
Compare
Choose a tag to compare

[v0.0.7] - [08/2024]

  • Minor fixes in cleaning up

Minor fixes πŸ”§

08 Mar 14:17
Compare
Choose a tag to compare

[v0.0.6] - [03/2023]

  • Minor fixes in notebook

`delete_config`, `debug_mode`, `automerge_configs`

05 Jan 15:32
674bae3
Compare
Choose a tag to compare
  • Adds MLEQueue option to delete config after job has finished (delete_config)
  • Adds debug_mode option to store stdout & stderr to files
  • Adds merging/loading of generated logs in MLEQueue w. automerge_configs option
  • Use system executable python version

`mle-logging` merging & multi-partition/queue scheduling πŸ”Ί

07 Dec 11:36
Compare
Choose a tag to compare
  • Track config base strings for auto-merging of mle-logs & add merge_configs
  • Allow scheduling on multiple partitions via -p <part1>,<part2> & queues via -q <queue1>,<queue2>

Welcome to MLE-Scheduler v0.0.3 πŸ€—

12 Nov 14:54
Compare
Choose a tag to compare

Welcome to MLE-Scheduler v0.0.2 πŸ€—

12 Nov 14:38
Compare
Choose a tag to compare

Welcome to MLE-Scheduler

12 Nov 14:18
Compare
Choose a tag to compare

First release πŸ€— implementing core API of MLEJob and MLEQueue

# Each job requests 5 CPU cores & 1 V100S GPU & loads CUDA 10.0
job_args = {
    "partition": "<SLURM_PARTITION>",  # Partition to schedule jobs on
    "env_name": "mle-toolbox",  # Env to activate at job start-up
    "use_conda_venv": True,  # Whether to use anaconda venv
    "num_logical_cores": 5,  # Number of requested CPU cores per job
    "num_gpus": 1,  # Number of requested GPUs per job
    "gpu_type": "V100S",  # GPU model requested for each job
    "modules_to_load": "nvidia/cuda/10.0"  # Modules to load at start-up
}

queue = MLEQueue(
    resource_to_run="slurm-cluster",
    job_filename="train.py",
    job_arguments=job_args,
    config_filenames=["base_config_1.yaml",
                      "base_config_2.yaml"],
    experiment_dir="logs_slurm",
    random_seeds=[0, 1]
)
queue.run()