Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configmap based mechanism to override the admin/canary versions #643

Merged
merged 7 commits into from
Feb 2, 2022
Merged

Conversation

shawkins
Copy link
Contributor

On the epic brief it appeared there wasn't a good way to associate this information with the strimzi deployment. So I'm assuming we need to use configmaps. I didn't see any information on in the epic brief or the presentation about how those would be created. My assumption here is that there would be 1 configmap per strimzi version. The name of the configmap would match the name of the strimzi version. The map would contain 3 key/value pairs. One for each of the images:

admin_api=quay.io/mk-ci-cd/kafka-admin-api:0.7.0
canary=...
canary_init=...

It falls back to the defaults in the application.properties - that could of course fall back in other ways such as sorting the versions, or even cause the managedkafka to be invalid.

The resync logic could be made more targeted if needed - just filtering to those managedkafkas that have matching strimzi versions.

But before doing a lot more I wanted to see if this matches up with people's new expectations based upon what was discussed in the standup.

@github-actions github-actions bot added the operator changes related to operator label Jan 21, 2022
@shawkins shawkins requested a review from rareddy January 21, 2022 20:05
@rareddy
Copy link
Contributor

rareddy commented Jan 24, 2022

Yes, this is taking in the direction of our conversation. we already have a config map for log4j properties in the strimzi bundle, maybe we can add additional keys into it?

@shawkins
Copy link
Contributor Author

Yes, this is taking in the direction of our conversation. we already have a config map for log4j properties in the strimzi bundle, maybe we can add additional keys into it?

Sorry I should have modeled this on that from the start.

It's been updated to use that map by checking for the app label and if the name is the same as the deployment that the strimzi operator checks for. It could be combined with either the strimzi manager or the bundle manager if we don't want another manager in the mix.

@rareddy
Copy link
Contributor

rareddy commented Jan 25, 2022

thanks. now that there is config, what is your opinion on possibly introducing ENV variables along with image URLs for completeness?

@shawkins
Copy link
Contributor Author

what is your opinion on possibly introducing ENV variables along with image URLs for completeness?

In looking at the code changes to the buildEnvVar methods, the only simplistic change I can quickly see in the last 6 months that fits this paradigm is the introduction of RECONCILE_INTERVAL_MS. However that would be fine to set across all Canary. You'd need a situation where the value would change per Canary version. The other changes seem to be dependent upon the managedkafka cr, other operator state, or are more complex involving references to the canary-config for example.

Copy link
Contributor

@rareddy rareddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shawkins
Copy link
Contributor Author

Fantastic... these changes cause an out of memory error on github. That does not occur locally nor is there much of a memory footprint being introduced here. I'll see if I can track down what is happening.

@shawkins
Copy link
Contributor Author

Fantastic... these changes cause an out of memory error on github. That does not occur locally nor is there much of a memory footprint being introduced here. I'll see if I can track down what is happening.

As far as I can tell we're hitting a quarkus issue here that's quite odd. I can strip down the class to have no logic/state, just return the default images, and I'll still either hit build or what looks to be test issues. Just the presence of this additional application bean is too much.

@MikeEdgar
Copy link
Contributor

Did you get it to happen locally? Could it have been fixed in a more recent Quarkus version?

@shawkins
Copy link
Contributor Author

Did you get it to happen locally?

Yes, a couple build iterations will consistently reproduce.

Could it have been fixed in a more recent Quarkus version?

Don't know. I've only confirmed that it's not due to any actual logic, just the introduction of the application bean.

Copy link
Contributor

@k-wall k-wall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @shawkins, approach looks good.

there is a bit of difference, our config uses admin-api, every else uses
adminserver.  So the key for the override was changed to admin-server
@shawkins
Copy link
Contributor Author

shawkins commented Jan 27, 2022

I've tacked a question about this build error onto quarkusio/quarkus#18973 (comment) - as it seems similar to the behavior we are seeing with the test server annotation. I've updated a branch to latest fabric8/quarkus/operator sdk and reproduced the test issue. In the meantime I'll confirm if this seems to still happen with that upgrade in place - and open a separate pr that contains just the upgrade work #648

String canaryImage;

@ConfigProperty(name = "image.canary-init")
String canaryInitImage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to have a kind of same structure for the properties configured via env var or application.properties file instead of the fleetshard_operands.yaml" with overrides? I mean naming the above as admin-server.image, canary.image and canary.init.image ? Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean naming the above as admin-server.image, canary.image and canary.init.image ? Wdyt?

In practical terms once this mechanism is utilized the old / default values should go unused. Turning the override into a properties file precludes easily using more structured parsing down the road (not that I'd like to overly worry about things that may not be needed). In the interest of time, I'd probably leave this as is at this point.

@shawkins
Copy link
Contributor Author

shawkins commented Feb 1, 2022

@rareddy @k-wall As far I can there are 4-5 ways to address the build issue:

  • Update the amount of resources for the build/test runs (now with this pr) - doubling basepom.test.memory seems to work - but we'll of course not be able to do this too much and may in the future do one of these others.
  • change the operator tests to reuseForks=false - since the tests are not currently parallelized, this greatly impacts the time it takes to run the tests.
  • Use a new module - tried with a github commit as well. It's messy but it works...
  • Patching okhttp to remove/replace their timeout implementation
  • Patch fabric8 to allow it to disable timeouts. That would need to include the mock server as well.

@rareddy
Copy link
Contributor

rareddy commented Feb 1, 2022

I am ok with taking first approach as you already coded it as an interim approach. I am not sure what patching okhttp involves, you mean at fabric8 level?

@shawkins
Copy link
Contributor Author

shawkins commented Feb 1, 2022

I am not sure what patching okhttp involves, you mean at fabric8 level?

Someone would have to own a jar that at least overrides the offending class.

I'll be making a separate determination of whether this affects the mock server and how to proceed from there.

@shawkins shawkins closed this Feb 2, 2022
@shawkins shawkins reopened this Feb 2, 2022
@shawkins shawkins merged commit 96706a1 into bf2fc6cc711aee1a0c2a:main Feb 2, 2022
@MikeEdgar MikeEdgar added this to the 0.19.0 milestone Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator changes related to operator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants