Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating requiredResources in Application Management API #280

Merged
merged 12 commits into from
Nov 4, 2024
261 changes: 238 additions & 23 deletions code/API_definitions/Edge-Application-Management.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -961,6 +961,231 @@
type: integer
description: Number of GPUs

Flavor:
type: string
description: |
Preset configuration for compute, memory, GPU,
and storage capacity. (i.e - A1.2C4M.GPU8G, A1.2C4M.GPU16G, A1.4C8M,..)
example: A1.2C2M.GPU8G
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a GET API to get a list of flavors so that user knows what the possible flavor names to use are.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @gainsley, we decided in #220 not to use flavours but if it is a better solution to implement it, GET /edge-cloud-zones should return in the response the information about the available flavour for each edge-cloud-zone of interest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tends to agree :-) May be with GET /edge-cloud-zones we can add query parameters to retrieve list of flavors and then in future we can also extend other resources via query parameters. Just a suggestion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll add an entry in GET /edge-cloud-zones to report the flavors.


NodePools:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be NodePool, not NodePools

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! changed

description: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general an issue i see with this approach is that it offers too many choices to the Application Developer to ask for the compute it needs. There can be too many possibilities that the developers can provide in the API and platform needs to find out from where it can serve too many diverse combination of resources or clusters. Also, as a developer I may need to run multiple applications on same cluster so how can I express it here?
So may be compute resource creation could have a different API that could return an identifier or handle for that resource and which could be used in this proposal to indicate the resource where app can be deployed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is to adopt a one-application-to-one-infrastructure-resource approach (VM, Kubernetes cluster, container, Docker Compose). This means we avoid managing infrastructure independently of the application.

For running multiple applications on the same Kubernetes cluster, Helm packages provide a way to bundle them together. A Helm package can contain multiple application charts, such as a database and a web application chart, effectively treating them as a single application for deployment.

This approach aligns well with node pools. Developers can leverage node pools to create clusters with a mix of nodes, such as having one with a GPU and others without, optimizing resource allocation. The application to node pool mapping is done through labels, allowing developers to reference them in Helm chart values for node affinity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be very resource heavy approach by one app to one type of infra like a k8s cluster unless we provide a way to enable in some way deploy multiple applications on one cluster. Also what will happen if cluster creation fails? That means application onboarding failed as both are now one atomic package. And another issue could be once an app along with its given infra accepted I cannot change the infra e.g. reduce or increase the resources if needed.

So I still think specially with cluster type of infra that it will be hard to implement which could mean creating a cluster dynamically which could be a very time consuming process. If we delink infra creation then there could be options like platform offline creates cluster and provide API to retrieve details of cluster ID or even provide infra creation API to manage infra for applications and use the information with the App LCM API to link them together.

Means there could be ways but otherwise in terms of approach it seems to be tightly couple the infra and applications and may reduce reusability. May be more inputs will help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gunjald,

Sounds good. I think it would be interesting to discuss the creation of an API to manage the infrastructure lifecycle (Create, Update, Delete). Enabling the Kubernetes cluster reference within the Application Management API would be easy.

For now, I think it's safe to keep things this way, allowing developers to use a Kubernetes cluster and define the minimum configuration details required by their application. We can then open a discussion about how to design a more comprehensive API for infrastructure management resources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is to adopt a one-application-to-one-infrastructure-resource approach (VM, Kubernetes cluster, container, Docker Compose). This means we avoid managing infrastructure independently of the application.

I think that this should be discussed further as it changes how some of use see the problem we are trying to solve.

While it makes sense for VM and containers, I'm not sure for k8s clusters. It's my understanding that operators want to use the same infra for multiple app providers/app types. In this case, packaging multiple apps in the same Helm Chart, as suggested above, cannot be done.

Set of worker nodes in a Kubernetes cluster.
type: object
required:
- flavor
- numNodes
properties:
name:
type: string
example: nodepool1
description: |
Nodepool Name (Autogenerated if not provided in the request)
flavor:
$ref: '#/components/schemas/Flavor'
numNodes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt it be something like numFlavors for better correlation?

type: integer
example: 1
description: Number of workers that compose the node pool.

K8sAddons:
description: |
Addons for the Kubernetes cluster.
Additional addons should be defined in application the helm chart
(Service Mesh, Serverless, AI).
type: object
properties:
monitoring:
type: boolean
example: true
default: false
description: Enable monitoring for Kubernetes cluster.
ingress:
type: boolean
example: true
default: false
description: Enable ingress for Kubernetes cluster.

VmAddons:
description: |
Addons for the Virtual Machine.
type: object
properties:
dockerCompose:
type: boolean
example: true
default: false
description: |
Enable docker-compose in the virtual machine to deploy applications.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I think it would be better to have docker-compose as a package type, rather than a VM addon. A VM addon to a VM-based deployment means the user will have full access to the VM, and may make changes to the VM that conflict with the state expected by the system which is trying to manage the docker deployment (i.e. worst case the user manually uninstalls docker, and then the system will fail trying to install/uninstall/upgrade docker-compose files.

What I would recommend is adding DOCKER_COMPOSE_ZIP as a type to AppManifest.PackageType. So the user uploads a zip file of all their docker compose files, much like a helm chart. The Operator Platform would deploy a specific VM image and manage it, and the user would not have full access directly to the VM (much like users would not have full access directly to a kubernetes cluster).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll remove the VM addon and create a docker-compose type.


K8sNetworking:
description: |
Kubernetes networking definition
type: object
properties:
primaryNetwork:
description: Definition of Kubernetes primary Network
type: object
properties:
provider:
description: CNI provider name
type: string
example: cilium
version:
description: CNI provider version
type: string
example: "1.13"
additionalNetworks:
description: Additional Networks for the Kubernetes cluster.
type: array
items:
type: object
description: Additional network interface definition
properties:
name:
description: Additional Network Name
type: string
example: net1
interfaceType:
description: |
Type of additional Interface:
netdevice: (SR-IOV) A regular kernel network device in the
Network Namespace (netns) of the container
vfio-pci: (SR-IOV) A PCI network interface directly mounted
in the container
interface: Additional interface to be used by cni plugins
such as macvlan, ipvlan
Note: The use of SR-IOV interfaces automatically
configure the required kernel parameters for the nodes.
type: string
example: vfio-pci
enum:
- netdevice

Check failure on line 1063 in code/API_definitions/Edge-Application-Management.yaml

View workflow job for this annotation

GitHub Actions / MegaLinter

1063:17 [indentation] wrong indentation: expected 18 but found 16
- vfio-pci
- interface

AdditionalStorage:
description: Additional storage for the application.

Check failure on line 1068 in code/API_definitions/Edge-Application-Management.yaml

View workflow job for this annotation

GitHub Actions / MegaLinter

1068:20 [colons] too many spaces after colon
type: array
items:
type: object
required:
- storageSize
- mountPoint
properties:
name:
type: string
description: Name of additional storage resource.
example: logs
storageSize:
type: string
description: Additional persistent volume for the application.
example: 80GB
pattern: ^\d+(GB|MB)$
mountPoint:
type: string
description: Location of additional storage resource.
example: /logs

Vcpu:
type: string
pattern: ^\d+((\.\d{1,3})|(m))?$
description: |
Number of vcpus in whole (i.e 1), decimal (i.e 0.500) up to
millivcpu, or millivcpu (i.e 500m) format.
example: "500m"

Kubernetes:
description: Definition of Kubernetes Cluster Infrastructure.
type: object
required:
- nodePools
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: kubernetes
enum:
- kubernetes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infrakind is part of the top level attribute KubernetesResources and looks redundant with value as "kubernetes" as KubernetesResources itself indicate that it is kubernetes resource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version:
type: string
description: Minimum Kubernetes Version.
example: "1.29"
controlNodes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there should be a controlNodesFlavor to indicate which flavor to use for control nodes? Or is it expected that the platform shall choose an appropriate size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of control nodes is out of the scope for the Application Developer.

type: integer
description: Number of nodes for Kubernetes control plane.
enum:
- 1

Check failure on line 1119 in code/API_definitions/Edge-Application-Management.yaml

View workflow job for this annotation

GitHub Actions / MegaLinter

1119:11 [indentation] wrong indentation: expected 12 but found 10
- 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why an enum is defined for the controlNodes integer, shouldn't it allow any integer greater than 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I realize now this is for master nodes, so ignore my previous comment. Docs recommend up to 5 nodes for large clusters, I don't think we'll be dealing with large clusters here but perhaps for completeness add an enum value for 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, the idea here was to offer Control Plane w/HA and wo/HA but at the end it is controlled by the operator. I think better approach is a boolean controlPlaneHa: true/false

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be part of the API. Application provider should have SLA in place with the operators and they shouldn't care about how the control plane of the operator infra is implemented.

nodePools:
type: array
description: |
Description of worker node set in a Kubernetes cluster.
items:
$ref: '#/components/schemas/NodePools'
additionalStorage:
type: string
description: |
Amount of persistent storage allocated to the Kubernetes PVC.
example: 80GB
pattern: ^\d+(GB|MB)$
networking:
$ref: '#/components/schemas/K8sNetworking'
addons:
$ref: '#/components/schemas/K8sAddons'


VirtualMachine:
description: Virtual Machine Infrastructure Definition
type: object
required:
- flavor
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: VirtualMachine
enum:
- VirtualMachine
flavor:
$ref: '#/components/schemas/Flavor'
additionalStorages:
$ref: '#/components/schemas/AdditionalStorage'
addons:
$ref: '#/components/schemas/VmAddons'


Container:
description: Container Infrastructure Definition
type: object
required:
- numCPU
- memory
- storage
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: containers
enum:
- containers
numCPU:
$ref: '#/components/schemas/Vcpu'
memory:
type: integer
example: 10
description: Memory in giga bytes
storage:
$ref: '#/components/schemas/AdditionalStorage'
gpu:
type: array
description: Number of GPUs
items:
$ref: '#/components/schemas/GpuInfo'

Ipv4Addr:
type: string
format: ipv4
Expand Down Expand Up @@ -1024,33 +1249,23 @@
type: integer
description: Port to stablish the connection
minimum: 0

RequiredResources:
description: |
Fundamental hardware requirements to be provisioned by the
Application Provider.
type: object
required:
- numCPU
- memory
- storage
properties:
numCPU:
type: integer
description: Number of virtual CPUs
example: 1
memory:
type: integer
example: 10
description: Memory in giga bytes
storage:
type: integer
example: 60
description: Storage in giga bytes
gpu:
type: array
description: Number of GPUs
items:
$ref: '#/components/schemas/GpuInfo'
type: array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you really want an array here, right? That would imply the resources could include multiple kubernetes cluster plus multiple VirtualMachines plus multiple Containers. I think you really only want one of either a Kubernetes cluster or a VirtualMachine or a Container resources request.

items:
oneOf:
- $ref: "#/components/schemas/Kubernetes"

Check failure on line 1260 in code/API_definitions/Edge-Application-Management.yaml

View workflow job for this annotation

GitHub Actions / MegaLinter

1260:9 [indentation] wrong indentation: expected 10 but found 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, EdgeXR allows the user to pre-create a (kubernetes) cluster and then specify the cluster during AppInstance create, in addition to specifying the cluster-resources to create one on-the-fly (as per this spec). This allows users to, over time, manage multiple AppInstances in the same cluster. That could be supported here by additionally adding a ClusterRef as one of the oneOf objects. I don't know if you have considered whether we should allow users to manage clusters directly. The main advantage of this is you get per-tenant cluster isolation without having to pay the overhead of cluster create for every AppInstance, assuming the user wants to share multiple AppInstances in the same cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to manage applications as self-contained units, including the resources they need. This way, the application itself collects all the resources required to work properly. If a second application needs to be deployed on the same cluster, it might indicate that the resource requirements for the first application were overestimated. Here are two options for the developer:

a) Modify the Helm chart to add the application there (even modify the resources to fit - this would require an API for application Update) or
b) Deploy a new application.

- $ref: "#/components/schemas/VirtualMachine"
- $ref: "#/components/schemas/Container"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the schema names misleading. For example VirtualMachine typically implies an instance of a specific image, and Container also implies at least a specific image, none of which is the case here. I would recommend changing the names to KubernetesResources, VMResources, and ContainerResources or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!, I'll modify it.

discriminator:
propertyName: infraKind
mapping:
kubernetes: "#/components/schemas/Kubernetes"
virtualMachine: "#/components/schemas/VirtualMachine"
container: "#/components/schemas/Container"

SubmittedApp:
description: Information about the submitted app
Expand Down