Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node pod probe daemon #1077

Merged
merged 1 commit into from
Sep 27, 2022
Merged

Conversation

zmberg
Copy link
Member

@zmberg zmberg commented Sep 16, 2022

Signed-off-by: liheng.zms [email protected]

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2022

Codecov Report

Merging #1077 (974f35e) into master (16277db) will decrease coverage by 0.51%.
The diff coverage is 31.44%.

@@            Coverage Diff             @@
##           master    #1077      +/-   ##
==========================================
- Coverage   50.29%   49.77%   -0.52%     
==========================================
  Files         126      130       +4     
  Lines       17855    18348     +493     
==========================================
+ Hits         8980     9133     +153     
- Misses       7886     8215     +329     
- Partials      989     1000      +11     
Flag Coverage Δ
unittests 49.77% <31.44%> (-0.52%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/daemon/podprobe/prober.go 0.00% <0.00%> (ø)
pkg/daemon/podprobe/worker.go 29.70% <29.70%> (ø)
pkg/daemon/podprobe/pod_probe_controller.go 38.27% <38.27%> (ø)
pkg/daemon/podprobe/results_manager.go 51.85% <51.85%> (ø)
pkg/controller/cloneset/cloneset_controller.go 54.62% <0.00%> (-0.57%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

)
updateQueue := workqueue.NewNamedRateLimitingQueue(
// Backoff duration from 500ms to 50~55s
workqueue.NewItemExponentialFailureRateLimiter(500*time.Millisecond, 50*time.Second+time.Millisecond*time.Duration(rand.Intn(5000))),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G404: Use of weak random number generator (math/rand instead of crypto/rand)


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

}
queue := workqueue.NewNamedRateLimitingQueue(
// Backoff duration from 500ms to 50~55s
workqueue.NewItemExponentialFailureRateLimiter(500*time.Millisecond, 50*time.Second+time.Millisecond*time.Duration(rand.Intn(5000))),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G404: Use of weak random number generator (math/rand instead of crypto/rand)


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@zmberg zmberg force-pushed the pod-probe-marker-daemon branch 4 times, most recently from 2ec676b to 4b19381 Compare September 20, 2022 06:33
InitialDelaySeconds: 100,
},
},
PodProbeMarkerName: "ppm-1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typecheck: unknown field PodProbeMarkerName in struct literal


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

InitialDelaySeconds: 100,
},
},
PodProbeMarkerName: "ppm-1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typecheck: unknown field PodProbeMarkerName in struct literal


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from 4b19381 to 7e07dba Compare September 20, 2022 07:20
pkg/daemon/podprobe/worker.go Outdated Show resolved Hide resolved
// current only support exec
// todo: http, tcp
if p.Exec != nil {
return pb.exec.Probe(pb.newExecInContainer(containerID, p.Exec.Command, timeout))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz log the command, container and pod information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time the probe is executed, it prints whether the log is too large. Currently, the details are printed when the probe worker run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least plz logging the command if exec fails

func (pb *prober) newExecInContainer(containerID string, cmd []string, timeout time.Duration) exec.Cmd {
return &execInContainer{run: func() ([]byte, error) {
// current ignore stdout
_, stderr, err := pb.runtimeService.ExecSync(containerID, cmd, timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider mix stdout and stderr together as the built-in probe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If probe success, return stdout, otherwise return stderr?

msg = msg[:maxMessageLength]
}
if err != nil || (result != probe.Success && result != probe.Warning) {
return appsv1alpha1.ProbeFailed, msg, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider record event for probing result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is event necessary here? Currently kruise daemon prints the log.

pkg/daemon/podprobe/results_manager.go Outdated Show resolved Hide resolved
@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from 7e07dba to 0c70034 Compare September 21, 2022 08:49
for key, worker := range c.workers {
if _, ok := validProbe[key]; !ok {
klog.Infof("NodePodProbe stop pod(%s/%s) container(%s) probe(%s) worker", key.podUID, key.containerName, key.probeName)
worker.stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove worker from c.workers ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time worker may be probing, so when the worker exits, it will be removed from c.workers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be race condition that the workers map will be concurrently write and delete by different goroutines.

Copy link
Member Author

@zmberg zmberg Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, currently relies on a locking mechanism to ensure. The current logic also refers to the kubelet implementation.

pkg/daemon/podprobe/pod_probe_controller.go Outdated Show resolved Hide resolved
@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from 0c70034 to f186e69 Compare September 23, 2022 06:55
nodeList, err := c.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
gomega.Expect(err).NotTo(gomega.HaveOccurred())
if len(nodeList.Items) == 0 {
ginkgo.By(fmt.Sprintf("pod probe markers list nodeList is zero"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S1039: unnecessary use of fmt.Sprintf


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from f186e69 to be4ba70 Compare September 23, 2022 07:35
pkg/daemon/podprobe/pod_probe_controller.go Outdated Show resolved Hide resolved
pkg/daemon/podprobe/worker.go Outdated Show resolved Hide resolved
pkg/daemon/podprobe/worker.go Show resolved Hide resolved
pkg/daemon/daemon.go Outdated Show resolved Hide resolved
@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from be4ba70 to fcb8e1f Compare September 26, 2022 07:15
// Let the worker wait for a random portion of tickerPeriod before probing.
// Do it only if the kruise daemon has started recently.
if probeTickerPeriod > time.Since(w.probeController.start) {
time.Sleep(time.Duration(rand.Float64() * float64(probeTickerPeriod)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G404: Use of weak random number generator (math/rand instead of crypto/rand)


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@@ -23,6 +23,8 @@ import (
"net/http"
"sync"

"github.com/openkruise/kruise/pkg/daemon/podprobe"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort import

}

klog.Infof("Starting NodePodProbe controller")
// Launch one workers to process resources, for there is only one nodePodProbe per Node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one workers

a worker

// guards the cache
sync.RWMutex
// map of container ID -> probe Result
cache map[string]Update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about sync.Map{} without RWMutex?

@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from fcb8e1f to 5b65270 Compare September 26, 2022 08:41
Copy link
Member

@furykerry furykerry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

daemonruntime "github.com/openkruise/kruise/pkg/daemon/criruntime"
"github.com/openkruise/kruise/pkg/daemon/imagepuller"
daemonoptions "github.com/openkruise/kruise/pkg/daemon/options"
"github.com/openkruise/kruise/pkg/daemon/podprobe"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typecheck: could not import github.com/openkruise/kruise/pkg/daemon/podprobe (-: could not load export data: no export data for "github.com/openkruise/kruise/pkg/daemon/podprobe")


ℹ️ Learn about @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Signed-off-by: liheng.zms <[email protected]>
@zmberg zmberg force-pushed the pod-probe-marker-daemon branch from 5b65270 to 92b94b8 Compare September 26, 2022 09:34
@kruise-bot kruise-bot removed the lgtm label Sep 26, 2022
@FillZpp
Copy link
Member

FillZpp commented Sep 26, 2022

/approve

@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FillZpp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@FillZpp
Copy link
Member

FillZpp commented Sep 26, 2022

/lgtm

@FillZpp FillZpp merged commit ce5dd53 into openkruise:master Sep 27, 2022
baxiaoshi pushed a commit to baxiaoshi/kruise that referenced this pull request Nov 21, 2022
Signed-off-by: peng.xin <[email protected]>

optimize workloadspread when suitable subset maxReplicas is niil (openkruise#1066)

Signed-off-by: mingzhou.swx <[email protected]>

Signed-off-by: mingzhou.swx <[email protected]>
Co-authored-by: mingzhou.swx <[email protected]>

Optimize performance of LabelSelector conversion (openkruise#1068)

Signed-off-by: FillZpp <[email protected]>

Signed-off-by: FillZpp <[email protected]>

Support timezone for AdvancedCronJob (openkruise#1070)

Signed-off-by: FillZpp <[email protected]>

Signed-off-by: FillZpp <[email protected]>

ignore if pod condition has been updated (openkruise#1074)

Signed-off-by: mingzhou.swx <[email protected]>

Signed-off-by: mingzhou.swx <[email protected]>
Co-authored-by: mingzhou.swx <[email protected]>

pod probe marker apis (openkruise#1073)

* pod probe marker apis

Signed-off-by: liheng.zms <[email protected]>

* fix statefulset truncateHistory panic

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

pod probe marker controller (openkruise#1075)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

consider whether patch field is matched when assign existing pods to subset (openkruise#1083)

Signed-off-by: mingzhou.swx <[email protected]>

Signed-off-by: mingzhou.swx <[email protected]>
Co-authored-by: mingzhou.swx <[email protected]>

pod probe marker webhook (openkruise#1078)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

sidecarset support pods ns(kube-system, kube-public) (openkruise#1084)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

pod probe marker proposal (openkruise#1053)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

node pod probe daemon (openkruise#1077)

Signed-off-by: liheng.zms <[email protected]>

sidecarset support patch pod metadata proposal (openkruise#993)

Signed-off-by: liheng.zms <[email protected]>

Support predownload image in ads (openkruise#1057)

Signed-off-by: Abner <[email protected]>

* support predownload image for ads

Add PreDownloadImageForDaemonSetUpdate featureGate (openkruise#1093)

Signed-off-by: FillZpp <[email protected]>

Signed-off-by: FillZpp <[email protected]>

Add changelog for v1.3.0 (openkruise#1092)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

fix sidecarset inject annotations abnormal (openkruise#1101)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

fix InPlaceUpdateEnvFromMetadata bug (openkruise#1108)

Co-authored-by: 郭已钦 <[email protected]>

support uniteddeployment persistentVolumeClaimRetentionPolicy inherit from  advancestatefulset template (openkruise#1110)

Signed-off-by: cheyuexian <[email protected]>

Signed-off-by: cheyuexian <[email protected]>
Co-authored-by: cheyuexian <[email protected]>

Add PreNormal Lifecycle Hook for CloneSet (openkruise#1071)

* add pre-normal(pre-available) hook

Signed-off-by: mingzhou.swx <[email protected]>

* add pre-normal(pre-available) hook

Signed-off-by: mingzhou.swx <[email protected]>

Signed-off-by: mingzhou.swx <[email protected]>
Co-authored-by: mingzhou.swx <[email protected]>

partition support float percent (openkruise#1124)

Signed-off-by: shiyan2016 <[email protected]>

Signed-off-by: shiyan2016 <[email protected]>

improve error hanlding of inplace update env from metadata (openkruise#1125)

Signed-off-by: mingzhou.swx <[email protected]>

Signed-off-by: mingzhou.swx <[email protected]>
Co-authored-by: mingzhou.swx <[email protected]>

pub support to configure Evict,Delete,Update Operation (openkruise#1126)

Signed-off-by: liheng.zms <[email protected]>

Signed-off-by: liheng.zms <[email protected]>

feat: add watch whitelist

feat: add watch whitelist

add watch whitelist check

add dynamic watch option

feat: add persistent annotations

format

feat: add pps support custom workload

Signed-off-by: peng.xin <[email protected]>

fix: nodeSelector and nodeAffinity have not been modified

Signed-off-by: peng.xin <[email protected]>

fix: error string

Signed-off-by: peng.xin <[email protected]>

fix: error string

Signed-off-by: peng.xin <[email protected]>

fix: goimports

Signed-off-by: peng.xin <[email protected]>

refactor: refactoring ersistentPodAnnotations type

Signed-off-by: peng.xin <[email protected]>

- add statefulet group check
- modify the pod create processing logic

Signed-off-by: peng.xin <[email protected]>

refactor: pps watch of whitelist workload

Signed-off-by: peng.xin <[email protected]>

fix: add statefulsetlike's pod controller

Signed-off-by: peng.xin <[email protected]>
ppbits pushed a commit to ppbits/kruise that referenced this pull request Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants