Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: vap generation #3266

Merged
merged 1 commit into from
Feb 26, 2024
Merged

Conversation

ritazh
Copy link
Member

@ritazh ritazh commented Feb 13, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #2874

Special notes for your reviewer:

@ritazh ritazh requested a review from a team as a code owner February 13, 2024 05:41
@ritazh ritazh changed the title Feat vapgeneration feat: vap generation Feb 13, 2024
@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2024

Codecov Report

Attention: Patch coverage is 24.06639% with 183 lines in your changes are missing coverage. Please review.

Project coverage is 54.18%. Comparing base (6d1e4ee) to head (f5ea086).
Report is 3 commits behind head on master.

Files Patch % Lines
pkg/controller/constraint/constraint_controller.go 0.00% 126 Missing ⚠️
...onstrainttemplate/constrainttemplate_controller.go 49.09% 42 Missing and 14 partials ⚠️
pkg/target/target.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3266      +/-   ##
==========================================
- Coverage   54.28%   54.18%   -0.10%     
==========================================
  Files         134      134              
  Lines       12356    10247    -2109     
==========================================
- Hits         6707     5552    -1155     
+ Misses       5152     4190     -962     
- Partials      497      505       +8     
Flag Coverage Δ
unittests 54.18% <24.06%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritazh ritazh force-pushed the feat-vapgeneration branch 2 times, most recently from e5b1b08 to 766e1ff Compare February 14, 2024 01:29
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but very nice!

pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
@ritazh ritazh force-pushed the feat-vapgeneration branch 3 times, most recently from 8b5048a to 90423db Compare February 16, 2024 23:19
.github/workflows/workflow.yaml Show resolved Hide resolved
pkg/controller/config/config_controller_suite_test.go Outdated Show resolved Hide resolved
pkg/controller/config/config_controller_suite_test.go Outdated Show resolved Hide resolved
pkg/controller/constraint/constants.go Outdated Show resolved Hide resolved
pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
@ritazh ritazh force-pushed the feat-vapgeneration branch 2 times, most recently from 1792445 to cf29d69 Compare February 19, 2024 21:15
@ritazh ritazh requested review from sozercan and maxsmythe February 19, 2024 21:43
@@ -125,6 +135,7 @@ func newReconciler(mgr manager.Manager, cfClient *constraintclient.Client, wm *w
}

// via the registrar below.
constraintEvents := make(chan event.GenericEvent, 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Where is the constraint controller reading from constraintEvents? Should we just be writing to regEvents instead?

If we want to use a second channel, we'd need to plumb it through.

pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
pkg/controller/constraint/constraint_controller.go Outdated Show resolved Hide resolved
@ritazh ritazh force-pushed the feat-vapgeneration branch 5 times, most recently from 2f860cc to f5ea086 Compare February 23, 2024 04:43
@ritazh
Copy link
Member Author

ritazh commented Feb 23, 2024

@maxsmythe @sozercan PTAL when you get a chance

return reconcile.Result{}, err
}
}
if !reflect.DeepEqual(currentVap, newVap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd make this an else if statement, otherwise we make two writes to the API server instead of one (on account of currentVAP being nil, guaranteeing reflect.DeepEqual() will be false.

return reconcile.Result{}, err
}
}
if !reflect.DeepEqual(currentVapBinding, newVapBinding) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd make this an else if statement, otherwise we make two writes to the API server instead of one (on account of currentVAP being nil, guaranteeing reflect.DeepEqual() will be false.

return reconcile.Result{}, err
}

// after vap is updated, trigger update event for all constraints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to do this after create/delete (not update), since we want to re-reconcile constraints if the "use VAP" annotation ever changes (so constraints can re-evaluate whether their default behavior should change).

On update "use VAP" was true and will continue to be true. On create/delete "use VAP" has changed (or the CT is brand new/just deleted)

}
// do not generate vap resources
// remove if exists
if !generateVap && constraint.IsVapAPIEnabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the "listObjects/replay constraints" behavior here as well (might make sense to factor that out into its own function to avoid code duplication)

Copy link
Contributor

@maxsmythe maxsmythe Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more deeply about this, I think we have a general race condition. Let's pretend:

  1. constraint controller is fast
  2. CT controller is slow (likely, since it may require compilation of Rego/CEL)
  3. there is some constraint Foo
  4. It is backed by some template FooTemplate
  5. there are 3 events for FooTemplate: no-generate-vap, generate-vap, no-generate-vap
  6. FooTemplate reconciler is slow enough that only two events get reconciled: no-generate-vap, no-generate-vap
  7. constraint controller gets a reconcile while generate-vap is set

If this sequence of events happens, you could get a dangling VAP binding b/c when constraint controller reconciles, it's gonna run client.Get() on the informer cache and see the annotation to create a binding, but the constraint template controller sees no change (from its view no-generate-vap has always been set), so it will not trigger a reconcile.

I think I see a way out of this, but it's not super pretty:

  1. create a shared cache between the constraint template and constraint controllers that stores the CT controller's opinion as to whether a constraint kind should/should not generate VAP objects
  2. create a function on the CT reconciler called shouldGenerateVAP(kind string, shouldGenerate bool), when shouldGenerate changes for a particular kind, that's what updates the cache and triggers a re-reconcile of all constraints for a given template
  3. call shouldGenerateVAP() where we are currently running list calls (might need to think about the timing)
  4. The constraint controller reads from the cache, rather than the object
  5. Because the results of reconciliation are dependent on internal state, we should run reconciliation as a singleton process (i.e. throw it on the status/audit Pod), this will prevent thrashing of the k8s resources whenever the pods' watch events are out of sync

(5) is probably a good thing to address generally, since there is also a chance of thrashing on G8r upgrade. I'd suggest that any controller that creates a downstream object (CRDs, VAP objects), the creation runs as a singleton (we can carve it out as an operation, pehaps, like audit or status updates). The controllers should still run for the purposes of caching state (e.g. loading constraint templates into CF), and they should still write out pod status objects.

This is medium-to-large body of work. I would recommend we do this before graduating to beta (or at least before allowing users to set failure policy to Fail). However, I don't think we need to block the current PR on this because:

  • This is an in-development feature that is off-by-default
  • We currently fail open, so a dangling binding will not cause any issues

Signed-off-by: Rita Zhang <[email protected]>
@ritazh ritazh force-pushed the feat-vapgeneration branch from f5ea086 to 76f216f Compare February 24, 2024 00:47
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but see my caveat about a race condition above.

}
// do not generate vap resources
// remove if exists
if !generateVap && constraint.IsVapAPIEnabled() {
Copy link
Contributor

@maxsmythe maxsmythe Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more deeply about this, I think we have a general race condition. Let's pretend:

  1. constraint controller is fast
  2. CT controller is slow (likely, since it may require compilation of Rego/CEL)
  3. there is some constraint Foo
  4. It is backed by some template FooTemplate
  5. there are 3 events for FooTemplate: no-generate-vap, generate-vap, no-generate-vap
  6. FooTemplate reconciler is slow enough that only two events get reconciled: no-generate-vap, no-generate-vap
  7. constraint controller gets a reconcile while generate-vap is set

If this sequence of events happens, you could get a dangling VAP binding b/c when constraint controller reconciles, it's gonna run client.Get() on the informer cache and see the annotation to create a binding, but the constraint template controller sees no change (from its view no-generate-vap has always been set), so it will not trigger a reconcile.

I think I see a way out of this, but it's not super pretty:

  1. create a shared cache between the constraint template and constraint controllers that stores the CT controller's opinion as to whether a constraint kind should/should not generate VAP objects
  2. create a function on the CT reconciler called shouldGenerateVAP(kind string, shouldGenerate bool), when shouldGenerate changes for a particular kind, that's what updates the cache and triggers a re-reconcile of all constraints for a given template
  3. call shouldGenerateVAP() where we are currently running list calls (might need to think about the timing)
  4. The constraint controller reads from the cache, rather than the object
  5. Because the results of reconciliation are dependent on internal state, we should run reconciliation as a singleton process (i.e. throw it on the status/audit Pod), this will prevent thrashing of the k8s resources whenever the pods' watch events are out of sync

(5) is probably a good thing to address generally, since there is also a chance of thrashing on G8r upgrade. I'd suggest that any controller that creates a downstream object (CRDs, VAP objects), the creation runs as a singleton (we can carve it out as an operation, pehaps, like audit or status updates). The controllers should still run for the purposes of caching state (e.g. loading constraint templates into CF), and they should still write out pod status objects.

This is medium-to-large body of work. I would recommend we do this before graduating to beta (or at least before allowing users to set failure policy to Fail). However, I don't think we need to block the current PR on this because:

  • This is an in-development feature that is off-by-default
  • We currently fail open, so a dangling binding will not cause any issues

Copy link
Member

@sozercan sozercan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ritazh ritazh merged commit 7ed8ef0 into open-policy-agent:master Feb 26, 2024
18 checks passed
@ritazh ritazh deleted the feat-vapgeneration branch February 26, 2024 23:44
leewoobin789 pushed a commit to softlee-io/gatekeeper that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration with k8s Validating Admission Policy for admission
4 participants