Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit test for scalable node group controller #157

Merged
merged 10 commits into from
Dec 1, 2020

Conversation

prateekgogia
Copy link
Contributor

Description of changes:
Add unit test for scalable node group controller

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

pkg/cloudprovider/fake/nodegroup.go Show resolved Hide resolved
})

AfterEach(func() {
nodeGroupObject.Stable = defaultNodeGroupObject.Stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, all this logic should go in a before each

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some duplication I removed that from here. Kept reseting nodeGroupObject, to be specific that we are reseting the nodeGroupObject after every run. Let me know, if you still think it might be better moving to BeforeEach func.

})

Context("ScalableNodeGroup", func() {
It("should be created", func() {
Expect(ns.ParseResources("docs/examples/reserved-capacity-utilization.yaml", sng)).To(Succeed())
sng.Spec.Replicas = ptr.Int32(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

ExpectDeleted(ns.Client, sng)
})
})

Context("Basic ScalableNodeGroup Reconcile tests", func() {
It("Test reconciler to scale up nodes", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider more idiomatic test naming with "It".

e.g. "it should scale up nodes", where the test name is "should scale up nodes".

})
})

Context("Advanced ScalableNodeGroup Reconcile tests", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the difference between Basic and Advanced tests, they share all Before/After hooks and neither context has variables. Does it make sense to just collapse them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I started I though I will be adding some complex test but it didn't make sense while writing tests. So yes, I will combine them.

package fake

// fakeError implements controllers.RetryableError & controllers.CodedError
type fakeError struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fake.fakeError stutters. Consider fake.RetriableError

// fakeError implements controllers.RetryableError & controllers.CodedError
type fakeError struct {
error
retryAble bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was gonna suggest retriable, but it looks like this is a fairly deep rabbit hole. I think retryable is actually correct -- go figured. Either way, A(ble) shouldn't be capitalized.

return &NodeGroup{
WantErr: f.WantErr,
Replicas: ptr.Int32(f.NodeReplicas[sng.ID]),
if f.nodegroup == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the benefit here... Looks like you're creating a singleton pattern for nodegroup such that we always return the same object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, trying to reuse the same nodegroup object because when controller.reconcile calls c.CloudProvider.NodeGroupFor it returns a new nodegroup object here which cannot be referenced in tests and can't verify if the replicas set are correct since the object changes.


BeforeEach(func() {
var err error
sng = &v1alpha1.ScalableNodeGroup{}
ng = fakeController.CloudProvider.NodeGroupFor(&sng.Spec)
defaultNodeGroupObject = *ng.(*fake.NodeGroup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out a better pattern than casting.

var sng *v1alpha1.ScalableNodeGroup
var ng cloudprovider.NodeGroup
var defaultNodeGroupObject fake.NodeGroup

BeforeEach(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little trouble following the flow of the BeforeEach, Test, AfterEach logic. It looks like some logic is duplicated (i.e. sng.Spec.Replicas being set).

When I write these, I typically try to do everything in a BeforeEach loop, unless my test has mutated something that needs to be cleaned up after all of the tests.

I also think it might be cleaner to modify the fields of the CloudProvider factory, rather than using a singleton pattern on nodegroup and modifying that object directly. I think all you'd have to do is wire through the behavior of NodeGroupStabilized and NodeGroupStabilizedMessage to the CloudProvider factory, and then before each test, you can just do something like

factory.NodeGroupStabilized = false 
factory.NodeGroupStabilizedMessage = "still trying"

Then when the factory creates the object, it will inject the behavior into the node group and should clean up most of your singleton/casting logic.

Let me know what you think to this approach.

@@ -105,7 +105,7 @@ var _ = Describe("Examples", func() {
It("should scale to average value target, metric=41, target=4, want=11", func() {
Expect(ns.ParseResources("docs/examples/queue-length-average-value.yaml", ha, sng)).To(Succeed())
sng.Spec.Replicas = ptr.Int32(1)
fakeCloudProvider.NodeReplicas[sng.Spec.ID] = *sng.Spec.Replicas
fakeCloudProvider.NodeReplicas[sng.Spec.ID] = ptr.Int32(*sng.Spec.Replicas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind adding a comment here about the race condition? Just to warn someone who might try to simplify it

fakeCloudProvider.NodeReplicas[sng.Spec.ID] = ptr.Int32(*sng.Spec.Replicas) # Create a new pointer to avoid races with the controller

@@ -53,24 +54,75 @@ var _ = AfterSuite(func() {

var _ = Describe("Examples", func() {
var ns *environment.Namespace
var desiredReplicas = ptr.Int32(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should just configure this explicitly in each test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I am setting this here is, because I want to avoid setting sng.Spec.Replicas to default value 0, we might miss some cases where values are not set and we just test the default. May be, it will be better if I rename it to defaultReplicaCount?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought variable rename isn't necessary. I think it's good as is.

@@ -53,24 +54,75 @@ var _ = AfterSuite(func() {

var _ = Describe("Examples", func() {
var ns *environment.Namespace
var desiredReplicas = ptr.Int32(5)
var dummyMessage = "test message"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd either make this a file level constant, or probably cleaner to just inline.

fakeCloudProvider.NodeGroupMessage = ""
fakeCloudProvider.WantErr = nil
sng.Spec.Replicas = desiredReplicas
sng.Spec.ID = "dummyID"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can probably get away with it.

@prateekgogia
Copy link
Contributor Author

@ellistarn I addressed all the comments, PTAL.

fakeCloudProvider.NodeGroupMessage = ""
fakeCloudProvider.WantErr = nil
sng.Spec.Replicas = defaultReplicaCount
fakeCloudProvider.NodeReplicas[sng.Spec.ID] = ptr.Int32(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, sng.Spec.ID is nil here, so this line is just muddying the water.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sng.Spec.ID is an empty string.
This is required here, since, in my tests also sng.Spec.ID is always an empty string and NodeReplicas[""] is a Nil value if I don't set it which causes nodegroup.Replicas to be also Nil.

Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Ready to approve after this around.

Copy link
Contributor

@JacobGabrielson JacobGabrielson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just a few comments to add

pkg/cloudprovider/fake/errors.go Outdated Show resolved Hide resolved
pkg/cloudprovider/fake/factory.go Outdated Show resolved Hide resolved
@@ -53,24 +54,76 @@ var _ = AfterSuite(func() {

var _ = Describe("Examples", func() {
var ns *environment.Namespace
var defaultReplicaCount = ptr.Int32(3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never reassigned, so consider making this

defaultReplicas := 3
or
defaultRepliacs := ptr.Int32(3)

I'd avoid relying on ptr if you can just use native go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference in these two declarations?

var defaultReplicas  = ptr.Int32(3)
defaultRepliacs := ptr.Int32(3)

IIUC, latter is just a short method of doing what we are doing with var statement, am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the go docs The := syntax is shorthand for declaring and initializing a variable, e.g. for var f string = "apple" in this case.. Fine either way. I tend to use the shorthand.

// NodeReplicas is used by tests to control observed replicas.
NodeReplicas map[string]*int32
NodeGroupStable bool
NodeGroupMessage string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider embedding a fake error message into the fake.Factory implementation so that tests don't need to configure it.

Maybe even make a public const that tests can reference

const ErrorMessage = "fake factory failed"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting doing something like this-

const (
	ErrMessage = ""
)

func NewFactory(options cloudprovider.Options) *Factory {
	return &Factory{
		NodeReplicas:     make(map[string]*int32),
		NodeGroupStable:  true,
		NodeGroupMessage: ErrMessage,
	}
}

and in tests we will be doing this -

fake.ErrMessage = dummyMessage

I am not sure how does this help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test we wouldn't need to do fake.ErrMessage = dummyMessage. You would only return the ErrMessage if NodeGroupStable is false. Either way works, but the former results in less test code.

@ellistarn ellistarn merged commit f88bdbe into aws:main Dec 1, 2020
@prateekgogia prateekgogia deleted the unit_tests branch February 3, 2021 19:01
ellistarn added a commit that referenced this pull request Feb 10, 2021
* Switches presubmits to track main. (#139)

* Switches presubmits to track main.

* Address https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

* Added WORKING_GROUP (#140)

* Readme edits + organization docs (#142)

* readme and admin docs

* faq edits

* doc edits

* edits and organization

* readme edits

* readme edit

* Remove Queue reference from NodeGroup overprovisioning example (#145)

* correct a typo in README.md (#143)

* readme typo (#146)

* Capitalise TERMS.md link (#147)

* Update OWNERS (#150)

* Update README.md (#149)

* Fix installation to target `main` branch instead of `master` (#153)

* Cleaned up FAQs, Working Group Invite, and other typos (#148)

* Update OWNERS (#155)

add tabern

* Switches to Allocatable from Capacity and exposes more human readable (#154)

values in capacity reservation status.

* Update FAQs.md (#156)

* Update DESIGN.md (#158)

Fixed typos

* Update README.md (#160)

* Update README.md

* Update README.md

* Update README.md

* Initial github banner image (#162)

* Initial github banner image

* Small image fix and add to readme

* Remove repo name from top of readme

* Smaller readme banner. Add repo preview

* HA Bottoms out at 1 DesiredReplica for Utilization and Value metrics (#161)

* Update README.md (#163)

* changed consumer context from customer to user (#168)

* changed consumer context from customer to user (#167)

* Reorganized OWNERS (#169)

* grammar & syntax updates in docs/DESIGN.md (#170)

* Add unit test for scalable node group controller (#157)

* Add unit test for scalable node group controller

* Add some more tests

* Add error checks for SetReplicas call

* Fix Unit tests

* rename fakeError

* update tests

* rename a few things

* rename fake error object

* Fixed pointers comparison

* Simplify tests a little

* Updated release image to use public ecr (#171)

* Improved release automation. Cut pre-release for v0.1.1 (#174)

* Changed desiredReplicas to be a pointer to allow for displaying scale to (#172)

* Changed desiredReplicas to be a pointer to allow for displaying scale to
zero.

* Updated tests

* Created a minimal Helm Chart (#175)

* Implemented a minimal helm chart with release automation using Github
Pages

* Removed stray files

* Updated chart

* Updated install instructions (#176)

* Updated install instructions

* Fixed a typo

* Improved calendar invite for Working Group (#177)

* Add working group meeting notes (#180)

* Add working group meeting notes (#182)

* Create NOTICE (#178)

* Fix errors in hack/quick-install.sh (#185)

Fix issues:

1/ Command helm upgrade --install karpenter charts/karpenter fails.

2/ Uninstall: Current order of operation leaves deployment: karpenter and statefulset: prometheus-karpenter-monitor remaining:

* Fixed a few typos in developer guide (#187)

The developer guide hadn't been updated after some of the make commands changed name. Also corrected an environment variable name.

* Add notes from working group meeting (#192)

* ScheduledCapacity API (#194)

* fixed a small typo in metricsproducer (#195)

* Add notes from working group meeting (#196)

* Implemented Scheduled Capacity Metrics Producer (#200)

* fixing merge problems

* Implemented Scheduled Capacity Metrics Producer

* Design Doc for ScheduledCapacity (#197)

* Delete karpenter.ics (#204)

Deleted in ics invite favor of google calendar.

* enable build on provisioner branch too (#206)

* Validations for MetricsProducers (#211)

* MetricsProducer Schedule Validation

* cleaned up validation efforts for non-schedules

* Added Validation checks in all resource reconcile loops

* Add working group notes (#217)

* add working group notes

* add some more notes to WG notes

* Rebase provisioner-work into main. (#219)

* Provisioner api (#198)

* First commit for provisioner spec

* remove comented code

* Added a simple controller skeleton (#199)

* Adding capacity API client (#202)

* Adding fleet API client

* Add capacity provisioner and refine the interface definitions

* Provisioner skeleton code (#203)

* Wired up controller to main and trivial test (#205)

* More tweaks to test wiring (#207)

* Add end to end functionality for pending pods (#209)

* add end to end functionality

* simplify functions after PR feedback

* simplify create fleet method

* fix comments, remove new line char and remove private create method

* Implemented cloud provider initialization (#208)

* Implemented Cloud Provider initialization and discovery logic.

* Implemented cloud provider initialization

* Cloudprovider improvements: Lazy Initialization, IAM Separation, and Topology Support (#213)

* Create node objects and bind pods to nodes (#215)

* Create node objects and bind pods to nodes

* fix error check

* fix error check

* Reoriented Capacity CloudProvider around BinPacking (#214)

* Setup packing interface and refactored instance provider for launching constrained instances (#216)

* Pausing on packing

* Setup packing interface and refactored instance provider for launching constrained instances

Co-authored-by: Prateek Gogia <[email protected]>

* Defaults for Metrics Producers (#222)

* Refactor cloud provider, add packing package (#221)

* refactor cloud provider, add packing package

* Refactored allocator and improve logging (#218)

* Skeleton code for reallocator (#235)

* Generated v0.1.2 release

Co-authored-by: Nate Taber <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: dherman <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: Tom Kerkhove <[email protected]>
Co-authored-by: alethatarn <[email protected]>
Co-authored-by: Justin Garrison <[email protected]>
Co-authored-by: Cameron Senese <[email protected]>
Co-authored-by: Prateek Gogia <[email protected]>
Co-authored-by: Prateek Gogia <[email protected]>
Co-authored-by: Henri Yandell <[email protected]>
Co-authored-by: Jacob Gabrielson <[email protected]>
Co-authored-by: Nick Tran <[email protected]>
spring1843 added a commit to spring1843-2/karpenter that referenced this pull request Jan 31, 2023
gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
Signed-off-by: alexandr.danilin <[email protected]>

Signed-off-by: alexandr.danilin <[email protected]>
Co-authored-by: Alexandr Danilin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants