Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charts/karmada: ignore the static-resource Pod in the post-install check #5369

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

iawia002
Copy link
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

I encountered the following situation: the static-resource Job failed during execution, so it retried once. The retry was successful, and although the Job eventually completed successfully, the initial failure resulted in a failed Pod. Because of this failed Pod, the post-install check script kept running, even though all Karmada components were already running. This patch makes the post-install script ignore Pods related to static-resource.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

none

@karmada-bot karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 14, 2024
@karmada-bot karmada-bot requested review from chaosi-zju and pidb August 14, 2024 08:32
@karmada-bot karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 14, 2024
@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 29.40%. Comparing base (bcefb22) to head (8f3e32c).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5369      +/-   ##
==========================================
- Coverage   29.41%   29.40%   -0.01%     
==========================================
  Files         632      632              
  Lines       43835    43835              
==========================================
- Hits        12892    12889       -3     
- Misses      30003    30004       +1     
- Partials      940      942       +2     
Flag Coverage Δ
unittests 29.40% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iawia002
Copy link
Member Author

/assign @chaosi-zju

@chaosi-zju
Copy link
Member

Hi, this problem realted to issue #5233

if you continue to try helm installation, you may also encounter this problem.

I have a batter resolution in PR #5305, but lack of people who good at helm reviews.

I think you are very proficient in helm. Can you help me review it?

when that pr merged, this problem will gone.

@iawia002
Copy link
Member Author

Hi, this problem realted to issue #5233

I didn't see how this relates to #5233, even if we merge #5305, the static-resource Job will still exist, right? It may still fail and leave behind a Pod in a Failed state, causing the post-install check to fail. The last time I encountered this issue (I forgot to take a screenshot), I remember it had nothing to do with the secret; it failed once because it couldn't connect to the apiserver.

Can you help me review it?

I can help review #5305, but I'm new to the Karmada chart, I don't know the big picture about it, so I can barely give some review comments.

@chaosi-zju
Copy link
Member

even if we merge #5305, the static-resource Job will still exist, right?

yes

causing the post-install check to fail.

no, after my #5305, it will no longer check static-resource Job pod.

I didn't see how this relates to #5233
I remember it had nothing to do with the secret

the relation is, you can see that wait condition in static-resource job and some init-container are newly introduced, which aims to install each component in order. However, these wait conditions are unreasonably implemented.

unreasonable wait condition caused #5233 and yours failure, so we hope to completely rectify these wait conditions, eventually we want to change it to #5305

Of cource, if your issue is more urgent, we can also merge yours in first and then I'll think about moving forward #5305.

but I'm new to the Karmada chart, I don't know the big picture about it

actually, I mainly want to ask for your opinion on the issue of sequentially installing deployments. I wonder if you know some best practices.

@iawia002
Copy link
Member Author

We may not on the same page. The post-install Job is still checking if all the Pods under the karmada-system namespace are Ready, isn't it? I didn't see any updates to the post-install check script in #5305. It still checks all the Pods under the entire namespace, including the static-resource, of course.

My issue is that if the static-resource Job has ever failed, the post-install Job will not exit. It will keep waiting because there is a failed static-resource Pod in the karmada-system namespace.

actually, I mainly want to ask for your opinion on the issue of sequentially installing deployments. I wonder if you know some best practices.

Hardly at all, Helm's ability to handle dependency installation is very weak.

@chaosi-zju
Copy link
Member

chaosi-zju commented Aug 16, 2024

The post-install Job is still checking if all the Pods under the karmada-system namespace are Ready, isn't it?

Oh, sorry ! My fault, this pr has been around for nearly 10 days, and my memory is a little confused (((;꒪ꈊ꒪;)))

I originally intended to remove this waiting check in post-install Job directly (let job auto cleanup after finished), and then for some reason did not do so, I mistakenly thought I did.

So, your this PR is not conflict with that PR, I will continue to review it~

@chaosi-zju
Copy link
Member

/retest

@chaosi-zju
Copy link
Member

/lgtm

cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2024
@iawia002
Copy link
Member Author

/assign @RainbowMango
/unassign @chaosi-zju

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
Thanks.

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2024
@karmada-bot karmada-bot merged commit 7eb8590 into karmada-io:master Aug 19, 2024
13 checks passed
@iawia002 iawia002 deleted the post-install branch August 19, 2024 06:31
@RainbowMango RainbowMango added this to the v1.11 milestone Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants