Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
[Hived]: Move Hived from OpenPAI to dedicated repo (#4319)
Browse files Browse the repository at this point in the history
* [Hived]: Move Hived from OpenPAI to dedicated repo

Moved to https://github.com/microsoft/hivedscheduler
  • Loading branch information
yqwang-ms authored Mar 23, 2020
1 parent 1b9635b commit 3a91564
Show file tree
Hide file tree
Showing 2,208 changed files with 3 additions and 883,331 deletions.
1 change: 0 additions & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,5 @@ jobs:
curl -L https://git.io/misspell | sudo bash -s -- -b /bin
- name: Check spelling
run: |
rm -rf ./subprojects/GOPATH/src/github.com/microsoft/hivedscheduler/vendor/
rm -rf ./src/watchdog/GOPATH/src/github.com/microsoft/watchdog/vendor/
misspell -error .
2 changes: 1 addition & 1 deletion docs/hivedscheduler/devops.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Update Scheduling Config, such as CRUD for virtual clusters and gpu types.
virtualClusters: <your_virtual_clusters_config>
```

For how to config them, please check [Config HivedScheduler](../../subprojects/GOPATH/src/github.com/microsoft/hivedscheduler/doc/user-manual.md#Config)
For how to config them, please check [Config HivedScheduler](https://github.com/microsoft/hivedscheduler/blob/master/doc/user-manual.md#config)

3. Push Config and Start Services
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/system_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The failure rules can be updated on-the-fly by the cluster operaters. Wheneven n

OpenPAI provides comprehensive [monitoring tools](./grafana/README.md) to users and cluster admins for job and cluster monitoring. OpenPAI also monitors the status of key OpenPAI components in the cluster and is able to send [alerts](./alerting/README.md) (e.g., as in email) if pre-configed conditions have been triggered.

OpenPAI is a modular platform, which is designed to enable various innovations. With the standard k8s scheduling API, OpenPAI introduces [HiveD](../subprojects/hivedscheduler/README.md), an optional but recommended scheduler designed for deep learning workloads in a multi-tenant GPU cluster. HiveD provides various advantages over standard k8s scheduler. For example, it introduces a notion of "virtual cluster", which allows a team of users to run workload in the virtual cluster as if they reserve a private, dedicated (smaller) GPU cluster.
OpenPAI is a modular platform, which is designed to enable various innovations. With the standard k8s scheduling API, OpenPAI introduces [HiveD](https://github.com/microsoft/hivedscheduler), an optional but recommended scheduler designed for deep learning workloads in a multi-tenant GPU cluster. HiveD provides various advantages over standard k8s scheduler. For example, it introduces a notion of "virtual cluster", which allows a team of users to run workload in the virtual cluster as if they reserve a private, dedicated (smaller) GPU cluster.
HiveD's virtual cluster reserves GPU resource not only in terms of quota (i.e., number of GPU), but also in terms of **topology**. For example, with HiveD a virtual cluster can reserve a GPU node, or a rack of GPU nodes within the same InfiniBand domain, instead of a set of GPUs randomly scatters across the cluster. This is important to preserve the training speed for jobs within the virtual cluster.
With HiveD, OpenPAI also provides better topology-aware gang scheduling with no [resource starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science)). HiveD also supports multi-priority jobs and job preemption.

Expand Down
Loading

0 comments on commit 3a91564

Please sign in to comment.