[Proposal] Enhance ORM by NRI #488

Airren · 2024-02-28T07:18:38Z

Airren
Feb 28, 2024

In this discussion, we explore the rationale behind utilizing the Node Resource Interface (NRI) on Katalyst, detailing both its significance and implementation strategies. This is a part of ORM #430.

What is NRI

Background

As of now, Kubernetes does not offer a fully comprehensive resource management solution. Many open-source projects in the Kubernetes ecosystem have devised their methods to modify the deployment and management processes of pods, enabling fine-grained resource allocation.

There are various approaches to extending Kubernetes, which we have summarized as follows.

NRI

To address the need for intrusive modifications to Kubernetes and changes to the default process, enabling developers to have a more unified implementation approach, NRI has emerged.

NRI allows plugging domain- or vendor-specific custom logic into OCI- compatible runtimes. This logic can make controlled changes to containers or perform extra actions outside the scope of OCI at certain points in a containers lifecycle. This can be used, for instance, for improved allocation and management of devices and other container resources.

Use NRI In Katalyst

Katalyst QRM mode

Katalyst ORM mode(kubelet listener)

NRI Enhanced ORM(Along with kubelet polling)

Benefits

Adopt a synchronous method for prompt responsiveness.
Pluggable deployment than intrusive modification of kubelet(QRM)
Keep in sync with upstream components.

Features can be done by NRI:

adjust container's cpuset / cfsquota
adjust memory QoS
control containers rdt group
add environment for Nvidia GPU

Design Details: NRI Enhanced ORM(Along with kubelet listener)

In this part, the method based on the Kubelet API polling is referred to as Bypass Mode, while another method based on NRI is referred to as NRI Mode.

Addon

The ORM support two operational modes: Bypass or NRI. Only one mode can be active at any given time. When creating a new ORM Manger, the current operational mode can be determined by reading the configuration, and it does not support changing the mode during runtime.

const AGENT_MODE_NRI = "nri"
const AGENT_NODE_BYPASS = "bypass"

type ManagerImpl struct {
	ctx context.Context
  ....
  // ORM run mode: bypass or nri. 
  // Bypass mode is triggered by polling kubelet api to get the pod event.
  // NRI mode is required containerd version >= 1.7.0 and NRI enabled.
	mode string
  ....
}


func NewManger(... config *config.Configuration){

	if config.GenericConfiguration.NRIEnable && ValidateNRIStatus(){
		m.mode=AGENT_MODE_NRI
	}else{
		m.mode=AGENT_NODE_BYPASS
	}
}

func ValidateNRIStatus() bool{
// TODO detect NRI status
}

The ORM ManagerImpl functions as an NRI stub, implementing processing logic within the corresponding hook event functions.

import "github.com/containerd/nri/pkg/stub"

type ManagerImpl struct {
    ctx context.Context
    ....
    // nriStub is the implementtion of NRI events handlers
    nriStub stub.Stub
    // nriMask stores the specific events that need to be hooked
    nriMask stub.EventMask
    ....
}

In enhancing the ORM implementation, three hook functions are required: RunPodSandbox(), CreateContainer(), and RemovePodSandbox().

Step 1, during RunPodSanbox(), the Admit() function is triggered. If Admit() succeeds, resources are allocated for the container, and the pod creation process continues. If Admit() fails, pod creation also fails.

    func (m *MangerImpl) RunPodSandbox(podSandbox *api.PodSandbox) error {
	    // get podspec from podSandbox -> MetaServer?
            // Admit 
	    return m.topologyManager.Admit(pod)
    }

Step 2, after a successful Admit(), the process proceeds to the CreateContainer() event. At this point, resources have been allocated for the container by Admit(). The corresponding resources are updated in the container's spec and returned.

    func (m *MangerImpl) CreateContainer(pod *api.PodSandbox, container *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
      // Update Container Spec from the podResoures
      adjust, err:= m.updateContainer(pod, container)
	    return adjust, nil, err
    }

Step 3, During RemovePodSandbox(), all resource allocations related to the pod are returned.

    func (p *plugin) RemovePodSandbox(pod *api.PodSandbox) error {
	    // return Pod releated resources
	    err := m.returnPodResource(pod)
	    return err
    }

Modification

If using the NRI Mode, after the allocation of resources is completed in the Admit() , the Allocate() does not need to execute syncContainer(); it should simply return after the resources have been allocated.

func (m *ManagerImpl) Allocate(pod *v1.Pod, container *v1.Container) error {
....
err := m.addContainer(pod, container)
// return after resource allocate when run in NRIMode
  if err != nil || m.mode == AGENT_MODE_NRI{
	  return err
  }
  err = m.syncContainer(pod, container)
  return err
}

In NRI Mode, the executer in syncContainer() can be implemented through NRI's updateContainer() .

func NewExecutor(cgroupManager cgroupmgr.Manager) Executor {

  if m.mode == AGENT_MODE_NRI {
      // executor by NRI's updateContainer()
			return NewNRIExecutor()
  }else{
      // executor by cgroupManger
			return &Impl{cgroupManager: cgroupManager}					
	}
}

The metaServer as a member variable of the ORM ManagerImpl because it is used in both Bypass and NRI modes.
TBD: During NRI mode, halt the MetaManager's Reconcile.

Opens

Handling failure for Admit():
- the Pod will enter a retry loop while RunPodSandbox() returns with an error(how to fix: integrated with scheduling ?)
- How to handle Admit() failure in Bypass mode. (How to evict the Pod which does not admit?)
Timeout:
- While timeout, in OnClose() invoke stub.Restart.
- Do Admit() with a timeout (configured) context .
The Containerd version is requited ≥ 1.7.0 and NRI feature enabled.
Co-work with other Plugin(RDT)

Releated Stuff

NRI : https://github.com/containerd/nri
NRI Introduction: https://juejin.cn/post/7221357811288293432

ORM PR: #406 #430

caohe · 2024-03-11T11:21:54Z

caohe
Mar 11, 2024
Collaborator

@Airren Thanks for your contribution! NRI will bring significant functionality and performance enhancements to Katalyst’s ORM framework.

I have a small question: similar to the Device Plugin interface, there is a PreStartContainer method in the definition of QRM Plugin. Which NRI hook point is suitable to trigger this method?

1 reply

Airren Mar 12, 2024
Author

The NRI hook points, CreateContainer and PostCreateContainer, precede the StartContainer. There are no PreStartContainer hook points in NRI. Therefore, I believe that calling the device plugin and injecting devices into the container is best done at the CreateContainer hook point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Enhance ORM by NRI #488

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Proposal] Enhance ORM by NRI #488

Airren Feb 28, 2024

What is NRI

Background

NRI

Use NRI In Katalyst

Benefits

Features can be done by NRI:

Design Details: NRI Enhanced ORM(Along with kubelet listener)

Addon

Modification

Opens

Releated Stuff

Replies: 1 comment · 1 reply

caohe Mar 11, 2024 Collaborator

Airren Mar 12, 2024 Author

Airren
Feb 28, 2024

Replies: 1 comment 1 reply

caohe
Mar 11, 2024
Collaborator

Airren Mar 12, 2024
Author