-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run operators locally, when using the conditions package #50
Comments
…, or from a non-standard SA file. While developing an operator, it is impossible to run from a local environment variable if the code is trying to access the namespace file. For example, calling the `conditions.NewCondition` function will return an error when the operator is running in a development environment. This PR add to options to override the hard-coded path to the namespace file with another location: 1. by setting the new `OPERATOR_NAMESPACE` environment variable to the required namespace. 2. by setting the new `SA_FILE_PATH` environment variable to the local path of the namespace file. If both new environment variables are set, the namespace will be taken from the `OPERATOR_NAMESPACE` environment variable. Fixes operator-framework#50 Signed-off-by: Nahshon Unna-Tsameret <[email protected]>
…, or from a non-standard SA file. While developing an operator, it is impossible to run from a local environment variable if the code is trying to access the namespace file. For example, calling the `conditions.NewCondition` function will return an error when the operator is running in a development environment. This PR add to options to override the hard-coded path to the namespace file with another location: 1. by setting the new `OPERATOR_NAMESPACE` environment variable to the required namespace. 2. by setting the new `SA_FILE_PATH` environment variable to the local path of the namespace file. If both new environment variables are set, the namespace will be taken from the `OPERATOR_NAMESPACE` environment variable. Fixes operator-framework#50 Signed-off-by: Nahshon Unna-Tsameret <[email protected]>
Hi @nunnatsa , |
Relevant PR comment: #42 (comment) |
Thank you @varshaprasad96 for the detailed response. As an operator developer, I need to be able to test my code and to debug it. I have many options to do that. For example, running with a real cluster, or running a unit test. In a cluster I can, for example, to manually deploy the opetator condition CRD and then the CR even without a running OLM. The thing is that with the current implementation, it is harder to write a testable code. |
The enhancement for this lib explicitly says that it assumes, and therefore was developed assuming, that OLM is installed. Additionally the CRD enhancement doesn't make an explicit goal for it to be used without OLM, although this is possible I guess. Regardless the intent of this library is specifically to be used with OLM. Out of curiosity why would you be using the operator condition CRD without OLM? If you really want to, you still can use // Set in test code.
var (
testNS = "foo"
testCondName = "bar"
)
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
...
condKey, err := GetNamespacedName()
if err != nil {
// Errors pertaining to running outside of a cluster.
condKey = &types.NamespacedName{Name: testCondName, Namespace: testNS}
}
...
} |
But the code that uses it still need to be tested and debugging.
There is no point to use this library without OLM, but still need to test and debug the code that uses it.
Yes! When I run a unit test of the function that called NewCondition. I can't do that using the current implementation.
This code is a mix of a test code and a production code. There is no way to add a call to NewCondition and not getting error. So all the tests that get to this point will exit with an expected error, and rhere will no way to test what happends next. |
Yes, and the suggested way to handle running the tests locally is to skip conditions-related code when an error is returned.
What I don’t understand is what you’re trying to test when you’re running your operator locally and trying to write to the conditions CR. In this scenario, OLM either isn’t running or doesn’t know about your operator if it is. Therefore you’re only testing if you can write to the condition CR, which isn’t really useful; in fact this test is effectively already being performed in the library’s unit tests.
Yes it is a gross mix, but it does let you set the in-cluster condition CR and namespace if you really want to go down that path. However I don’t actually recommend using the code I suggested for the reason I gave above. You shouldn’t need to test writing operator conditions in non-cluster scenarios. |
Let's try to explain again the issue. The operator we're working on does many things beside setting the the operator-condition. The code that do that is part of much larer code. In order to run unit tests we must mock the cluster. We're doing it to mock reading, writing and updating many types of objects. But using this package as it implemented now will break any unit test we'll trying to write. We must simulate the running environment for tests, we can't skip it because this operaor need to eventually run in a cluster. Sorry to write it, but this issue is a real issue that can't be rejected by "it's written in the godoc". You don't have to use the sugested solution, but please understand that this is not something we can skip. |
I understand the issue you’re posing, and I think because the solution isn’t clear we need to improve the UX of this library, in particular handling the local/testing case. I think the right solution is to create a test // controllers/kind_controller.go
var newCondition = conditions.NewCondition
func (r *MemcachedReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
...
cond, err := newCondition(client, SomeCondType)
if err != nil {
// This error will never happen in tests.
}
...
} // controllers/kind_controller_test.go
func TestReconciler(t *testing.T) {
newCondition = conditions.NewFakeCondition
// Run reconciler tests.
} That would solve your problem correct? |
How would I run this out-of-cluster (eg. when developing on my machine)? |
This is how we can run an operator from the IDE: https://github.com/kubevirt/hyperconverged-cluster-operator/blob/master/docs/run-locally.md |
The issue is not only in testing, but also running the operator from a local machine for debugging. OLM itself is already injecting environment variables for a few things and I don't see a rationale behind reading the namespace from file instead. Kubernetes even has a downward API to make this task trivial:
I also don't think that having a |
@nunnatsa From your original comment, the reason the directory is hard-coded is because this is where it will live in a Kubernetes environment. If you are running outside of the cluster that location will not exist. This is true when I'm open to other solutions but the hardcoded path is where the cluster will put the namespace information. In the next comment I'll outline a few thoughts. How are you trying to test your operator? unit tests? Or is it trying to run it locally like @thetechnick mentioned. |
@estroz @varshaprasad96 I think we could add an Basically if we change func NewCondition(cl client.Client, condType apiv1.ConditionType, opts ...ConditionOption) (Condition, error) {
...
for _, opt := range opts {
if err := opt(&condition); err != nil {
return nil, err
}
}
...
} Or we could make var GetNamespacedName = getNamespacedName()
func getNamespacedName() (*types.NamespacedName, error) {
...
}
func NewCondition(cl client.Client, condType apiv1.ConditionType) (Condition, error) {
objKey, err := GetNamespacedName()
...
} From a test I should be able to override that |
And this is the issue, exactly. This is true when It won't work because
Please look at #51 for a suggestion. I don't know why it was closed. Using environment variable was always the easiest way to deliver data to pods.
Both |
Why not using the existing namespace environment variable? Or inject it if it's missing? reading from a hard coded path seems a very fragile method to do things. I still think the solution in #51 give both of us what we want. |
@jmrodri Why would you rather implement override options than just having the environment tell what the environment looks like? That is also the reason why controller-runtime is first checking |
I don't think I've seen a (satisfying) answer to my original question: why do you need to use the condition CRD if you're running locally or testing? "My unit tests will fail if I don't" is a reason to mock these calls with a local-only
@thetechnick @nunnatsa the rationale is that is file is present in every pod's file system, and is the recommended way to get the pod's namespace to access namespaced resources from within the pod. It also avoids having to configure the downward API for every operator Deployment.
@thetechnick can you elaborate on why a fake @erdii @thetechnick perhaps calling the condition a "fake" is incorrect, since fake implies usage only during testing. You could easily use the fake |
I see I failed to explain why this is needed. It's a bit hard to explain tthe obvious. So let's do the other way around: why is it so problematic to add it? As a user of this package I am telling you that I need a way to override the default namespace source - what is the reason not to allow that? |
Because it is a poor design decision. The whole reason a
Yes I understand that. I'm just trying to work with you on a compromise that has a better interface and doesn't require external setup. Are you saying your tests cannot, for some good reason, use a fake/local
See above. |
Maybe to step a little bit back here and start from the beginning:
I personally prefer to write tests asserting on the desired outcome, so it doesn't matter which custom code or utility library is used in the implementation to archive the goal. This way it is far more likely that the test can be reused when refactoring the implementation itself. It is completely valid to stub this library, use a FakeCondition and just assert that functions are called when expected. As a software maintainer, I would like to have the choice to either mock, stub and fake my way through tests or run the actual code to verify functionality. This also extends to running the operator outside of a cluster and without OLM, sometimes I need to verify that stuff works as intended manually and in environments that are not identical to the full blown production system. I am required to make these trade-offs constantly or stuff will never get done. If I need to quickly validate that my operator is working as intended on a duct-taped raspberry pi powered by a box of potato batteries, because that's expedient at the moment, I don't want to be held back by a hard coded file path in 20 lines of library code. |
In general I agree with you. This particular case is interesting though, because you are effectively writing to a list of objects (a conditions CR's status) that is not written to by any other entity than the operator. The conditions CRD status is that simple. Therefore a mock that has an underlying list of conditions would almost exactly replicate in-cluster behavior. Now you could argue that k8s guarantees don't apply so the mock can introduce bugs, but I counter that with: you should be writing e2e tests too, since unit tests are not intended to extensively test in-cluster scenarios.
Ultimately I agree with this statement, and therefore it is probably ok to expose some way to set a namespace as well as provide a mock |
If namespace injection is going to be added, I'd also like to see injection happen only when running locally, i.e. const OperatorNamespaceLocalEnv = "OPERATOR_NAMESPACE_LOCAL"
const podNamespaceFilePath = "/var/run/secrets/kubernetes.io/serviceaccount/namespace"
if _, err := os.Stat(podNamespaceFilePath); err == nil || errors.Is(err, os.ErrExist) {
return readNSFile()
} else if localNSValue, isSet := os.LookupEnv(OperatorNamespaceLocalEnv); isSet {
return localNSValue, nil
} else {
return "", ErrNamespaceNotSet
} This disallows using this environment variable in production, which prevents incorrect and confusion about usage IMO. |
@estroz That would be all I need :) I already wanted to submit a PR to add this, but falling back to the env lookup on error looks quite fragile to me. Unfortunately Go is not abstracting errors across environments (Linux, OS X, Windows?) very nicely, also there are probably a few cases that need special handling:
And any other odd error that could come up in this context... I completely understand, that you don't want anyone ever to use this in prod, so would you mind making this dead simple? const OperatorNamespaceLocalEnv = "OPERATOR_NAMESPACE_OVERRIDE_NEVER_USE_IN_PROD"
const podNamespaceFilePath = "/var/run/secrets/kubernetes.io/serviceaccount/namespace"
if localNSValue, isSet := os.LookupEnv(OperatorNamespaceLocalEnv); isSet && len(localNSValue) > 0 {
return localNSValue, nil
}
return readNSFile() |
I know I'm really late to this, but I'm the one that had opinions about making this interface less "magic". In pre-1.0 versions of operator-sdk, we supported something similar via environment variables. If I remember correctly, we supported:
Both of these struck me as odd and hard for users to get right.
So with that context in mind, I think our goal for this problem should be:
I think 2) is what we're all discussing. To avoid footguns around the "magic" caused by environment variable side effects we experienced with this in SDK pre-v1, I would propose that we make it possible for library users (not end users) to inject custom namespaces via a code API. The simplest way to do this would be to make this an exported variable. However, since that's a global variable, it would not be thread safe in concurrent testing scenarios, so I would be against that approach. A more complex, but improved solution would be for us to add a If code under test is aware only of the Factory abstraction, then authors can swap implementations in and out as necessary to test various aspects of their operator. These ideas are very similar to the ones proposed by @jmrodri here, and I think something like this is the path we should ultimately take. Here is a PR I posted with this idea implemented: #58 |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
@joelanford - this is still a blocker for the implementation of operator condition. |
/remove-lifecycle rotten |
@nunnatsa thanks for staying on top of this. We'll re-prioritize this issue. |
Bug Report
The
NewCondition
(and other) method, called (indirectlly) to thereadSAFile
method. This method uses a hard code path for the operator namespace file:operator-lib/internal/utils/utils.go
Line 29 in c0ba7dc
This implementation blocks the ability to run the operator locally for develpment and debug,
Please consider one of the following, or both (pefered):
Environment
The text was updated successfully, but these errors were encountered: