-
Notifications
You must be signed in to change notification settings - Fork 963
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rework device sharing in volcano (#2643)
* Signed-off-by: limengxuan <[email protected]> Rework device-sharing mechanism to volcano * Signed-off-by: limengxuan <[email protected]> after review #1
- Loading branch information
1 parent
e145aff
commit f4e3e58
Showing
15 changed files
with
774 additions
and
512 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Sharing devices in volcano | ||
|
||
## Introduction | ||
|
||
We implement a common interface for shareable devices(GPU,NPU,FPGA,...) called Devices, and use it to reimplement current gpu-share mechanism. The goal is to let device-sharing easy to implement, and better organised. If you wish to grant vc-scheduler the ability to share another device, all you need is to implement these methods in Devices, and place your logic under pkg/scheduler/api/devices. | ||
|
||
## Backguards | ||
|
||
We intended to provide volcano the ability to share third-party resources link GPU,NPU,etc in the near future. At fitst, I tried to implement these logics based on predicate.gpushare, but i sooner realised that these logics scattered in device_info.go, node_info.go, pod_info.go, and whole predicate folder. if i follow the implementation of predicate.gpushare, i will have no choice but hack deeply into vc-scheduler api. Sooner or later vc-scheduler api will be crowded with various device-sharing logic, which is probably not what we wished. | ||
|
||
## Implementation | ||
|
||
### Interface Devices design | ||
|
||
The design of Devices is shown below: | ||
|
||
``` | ||
type Devices interface { | ||
//following two functions used in node_info | ||
//AddResource is to add the corresponding device resource of this 'pod' into current scheduler cache | ||
AddResource(pod *v1.Pod) | ||
//SubResoure is to substract the corresponding device resource of this 'pod' from current scheduler cache | ||
SubResource(pod *v1.Pod) | ||
//following four functions used in predicate | ||
//HasDeviceRequest checks if the 'pod' request this device | ||
HasDeviceRequest(pod *v1.Pod) bool | ||
//FiltreNode checks if the 'pod' fit in current node | ||
FilterNode(pod *v1.Pod) (bool, error) | ||
//Allocate action in predicate | ||
Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error | ||
//Release action in predicate | ||
Release(kubeClient kubernetes.Interface, pod *v1.Pod) error | ||
//used for debug and monitor | ||
GetStatus() string | ||
} | ||
``` | ||
|
||
The first two method are used for node_info to update cluster status. The following four methods are used in predicate which allocatation and deallocation actually take place. Finally a monitor mothod for debug. | ||
|
||
### Create a seperate package for gpushare related methods, and use Devices method to reimplement it. | ||
|
||
There are two steps we need to do, first, we need to create a new package in "pkg/scheduler/api/devices/nvidia/gpushare", and implement Devices methods in it, then we need to seperate gpushare-related logic from "scheduler.api" and "predicate plugin", and convert them to package "pkg/scheduler/api/devices/nvidia/gpushare". The package contains the following files: device.go(which implement SharedDevicePool interface methods), share.go(which contains private methods for device.go), type.go(which contains const values and definations). | ||
|
||
Details of methods mapping is shown in the table below: | ||
|
||
| origin file | corresponding file(s) in new package | | ||
| ------------- | ------------- | | ||
| pkg/scheduler/api/node_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go | | ||
| pkg/scheduler/api/device_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go | | ||
| pkg/scheduler/api/pod_info.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go | | ||
| pkg/scheduler/plugins/predicates/predicates.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go | | ||
| pkg/scheduler/plugins/predicates/gpu.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go | | ||
|
||
## How to add a new device-share policy | ||
|
||
### 1. Define your device in /pkg/scheduler/api/shared_device_pool.go | ||
|
||
Name your policy and put it in shared_device_pool.go as follows: | ||
|
||
``` | ||
const ( | ||
GPUSharingDevice = "GpuShare" | ||
Your_new_sharing_policy = "xxxxx" | ||
) | ||
``` | ||
|
||
### 2. Create a new package in /pkg/scheduler/api/devices/"your device name"/"your policy name" | ||
|
||
For example, if you try to implement a NPU share policy, then you are recommended to create a package in /pkg/scheduler/api/device/ascend/npushare | ||
|
||
### 3. Implement methods of interface shared_device_pool, and put them in your new package | ||
|
||
Note that, you can't to refer to any struct of methods in scheduler.api to avoid cycle importing. If there is anything in scheduler.api you *must* need, then you should modify the SharedDevicePool interface to pass it. | ||
The methods defined in SharedDevicePool interface and its information is shown in table below: | ||
|
||
| interface | invoker file | information | | ||
| ------------- | ------------ | ------------- | | ||
| AddResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Add the 'pod' and its resources into scheduler cache | | ||
| SubResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Delete the 'pod' and substract its resources from scheduler cache | | ||
| HasDeviceRequest(pod *v1.Pod) bool | pkg/scheduler/plugins/predicates/predicate.go | Check whether this 'pod' request a portion of this device | | ||
| FilterNode(pod *v1.Pod)| pkg/scheduler/plugins/predicates/predicate.go | Check whether the portion of device this pod requests can fit in current node | | ||
| Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Allocate the portion of this device from the current node to this pod | | ||
| Release(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Dellocate the portion of this device from this pod | | ||
| GetStatus() string | none | Used for debug and monitor | | ||
|
||
### 4. Add your initialization code in /pkg/scheduler/api/node_info.go | ||
|
||
This is the *only* place you hack into scheduler.api ,which you have to register your policy during initialization of node_struct. | ||
|
||
``` | ||
// setNodeOthersResource initialize sharable devices | ||
func (ni *NodeInfo) setNodeOthersResource(node *v1.Node) { | ||
ni.Others[GPUSharingDevice] = gpushare.NewGPUDevices(ni.Name, node) | ||
//ni.Others["your device sharing policy name"] = your device sharing package initialization method | ||
} | ||
``` | ||
|
||
### 5. Check if your policy is enabled in /pkg/scheduler/plugins/predicate/predicates.go | ||
|
||
This is the *only* plae you hack into predicates.go, when the scheduler checks if your policy is enabled in scheduler configuration. | ||
|
||
predicates.go: | ||
|
||
``` | ||
... | ||
// Checks whether predicate.GPUSharingEnable is provided or not, if given, modifies the value in predicateEnable struct. | ||
args.GetBool(&gpushare.GpuSharingEnable, GPUSharingPredicate) | ||
args.GetBool(&gpushare.GpuNumberEnable, GPUNumberPredicate) | ||
args.GetBool(&gpushare.NodeLockEnable, NodeLockEnable) | ||
args.GetBool("your policy enable variable","your policy enable parameter") | ||
... | ||
``` | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.