Skip to content

Commit

Permalink
Update doc and diagram (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
jimbobby5 authored and jankaspar committed Jun 27, 2019
1 parent f712e4a commit 1842105
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,16 @@
# k8s-batch
# K8S Batch
Experimental application to submit and monitor jobs using kubernetes cluster(s), providing Condor-like behaviour.

## Why?
In our Condor clusters we need to handle large spikes of resource requests. Condor queues thousands of jobs per user and slowly works them all off assuring all users get a fair share of resource.
Kubernetes itself is not designed around this use case and multiple components of the system struggle when 10k - 100k pods are created at once.
Some of the issues could be solved by replacing the scheduler or improving other components, but we also need to support large clusters and current Kubernetes official limit for nodes is 5000. We have anecdotal evidence from conferences that Kubernetes does not operate optimally past 1000 nodes without significant tuning.
It would be a benefit to have a solution that supports scaling out using multiple Kubernetes clusters. This allows simple scaling as well as benefit from a maintenance perspective.

## Overview
This application stores queues for users/projects with pod specifications and create these pods once there is available resource in Kubernetes.
To achieve fairness between users we have implemented a Condor like algorithm to divide resources. Each queue has a priority. When pods from a queue use some resources over time, queue priority is reduced so other queues will get more share in the future. When queues do not use resources their priority will eventually get back to initial value.

Current implementation utilises Redis to store queues of jobs. Redis streams are used for job events.

![Diagram](./batch-api.png)
Binary file added batch-api.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1842105

Please sign in to comment.