Skip to content

TurboSched: A Modern and Configurable Job Scheduling System

License

Notifications You must be signed in to change notification settings

w568w/TurboSched

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TurboSched

TurboSched: A Modern and Configurable Job Scheduling System

TurboSched aims to be a modern alternative to the traditional job schedulers, e.g. SLURM, PBS, etc. It is designed to be highly configurable and extensible, but mainly focuses on GPU clusters.

TurboSched is still under heavy development and is far from even a working prototype at the moment. Any contributions are welcome!

Build

  1. Install Protobuf Compiler protoc and Go plugins (guide). Generate Go code from the proto files:
    protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative common/proto/*.proto
  2. Copy config.toml.example to config.toml and modify the configuration file as needed.
  3. Run on the nodes in the following order:
    go run turbod/main.go -c # Controller
    go run turbod/main.go -m # Compute Node
  4. As a client, you can use the following command:
    go run turbo/main.go python # submit an interactive job running python
    go run turbo/main.go stop <job_id> # stop a job immediately

Roadmap

Note: Please do not rely on this roadmap as it is outdated. The issue page is the most up-to-date source of information.

  • Basic single-node execution
  • Basic single-node scheduling
  • Task Cancellation
  • GPU resource management
  • GPU-aware scheduling
  • Basic multi-node scheduling
  • Failure-aware scheduling
  • Multi-node discovery
  • Task accounting

About

TurboSched: A Modern and Configurable Job Scheduling System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages