Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jaeger CSV and Package for OLM integration and deployment of the … #173

Merged
merged 1 commit into from
Feb 4, 2019

Conversation

awgreene
Copy link
Contributor

…operator through OperatorHub

OLM integration could be improved by future code changes to the Operator:

@jpkrohling
Copy link
Contributor

This change is Reviewable

@codecov
Copy link

codecov bot commented Jan 21, 2019

Codecov Report

Merging #173 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #173   +/-   ##
======================================
  Coverage    96.7%   96.7%           
======================================
  Files          32      32           
  Lines        1639    1639           
======================================
  Hits         1585    1585           
  Misses         41      41           
  Partials       13      13

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b71a4ee...24df602. Read the comment docs.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @awgreene)

a discussion (no related file):
This looks really cool, @awgreene! Thanks for your contribution. I just have a couple of questions, but nothing blocking.



deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml, line 18 at r1 (raw file):

  description: |-
    Provides monitoring and troubleshooting microservices-based distributed systems
  keywords: ['tracing', 'monitoring', 'troubleshooting', 'distributed']

What's the source for these keywords? Perhaps distributed could be removed, or perhaps changed to distributed tracing?


deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml, line 45 at r1 (raw file):

    strategy: deployment
    spec:
      permissions:

Can we reuse this from the role.yaml? Or have a pre-commit hook that checks and blocks the commit if the contents diverge?

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @awgreene)

a discussion (no related file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

This looks really cool, @awgreene! Thanks for your contribution. I just have a couple of questions, but nothing blocking.

I have one more request: could you please add information to the CONTRIBUTING.adoc about how to run the scorecard tests?


@awgreene
Copy link
Contributor Author

Hello @jpkrohling, thanks for reviewing my PR.

What's the source for these keywords? Perhaps distributed could be removed, or perhaps changed to distributed tracing?

I made a best effort attempt at adding the appropriate categories based on the content of the operator - I will follow your suggestion and remove distributed from the list of categories.

Can we reuse this from the role.yaml? Or have a pre-commit hook that checks and blocks the commit if the contents diverge?

Good question - the CSV has to include the service account name and permissions. Creating a pre-commit hook that compares the permissions in the CSV and the openshift role (and vice verse) seems like an appropriate solution. I haven't made a pre-commit hook before, is there an easy way to add a hook that checks that one block of text exists in another?

I have one more request: could you please add information to the CONTRIBUTING.adoc about how to run the scorecard tests?

I will work on updating the document today.

@jpkrohling
Copy link
Contributor

is there an easy way to add a hook that checks that one block of text exists in another?

It would probably be better to make it as a shell script, so that we can reuse it in the Makefile as well, under the check target.

If the same YAML is expected in both files, only with different indentation, then the easiest solution would be to play with tail and head to get the relevant parts from both files, then sed to remove any indentation spaces, and finally a diff to compare both versions. Does that sound reasonable?

The hook itself is the easy part: just add a .git/hooks/pre-commit file (executable) that calls the script mentioned before. AFAIK, hooks aren't part of the repository, so, there could be a new make target that installs the hook.

@awgreene awgreene force-pushed the jaeger-csv branch 5 times, most recently from 1db0375 to 509a481 Compare January 24, 2019 18:42
@awgreene
Copy link
Contributor Author

@jpkrohling I have updated the CONTRIBUTING.adoc to include information about testing the CSV with the operator-sdk. Please let me know if you would like any changes.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @awgreene)


CONTRIBUTING.adoc, line 146 at r2 (raw file):

OLM also enforces some constraints on the components it manages in order to ensure a good user experience.

The Jaeger community can provide and mantain a link:https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/building-your-csv.md/[ClusterServiceVersion (CSV) YAML] to integrate with OLM.

Not quite sure I understood this. Should this be "... provides and maintains a ..." instead?


CONTRIBUTING.adoc, line 151 at r2 (raw file):

[source,bash]
----
operator-sdk scorecard --cr-manifest deploy/cr-example.yaml --csv-path ./deploy/bundle/camel-k.v0.1.0.clusterserviceversion.yaml 

This should probably be:

operator-sdk scorecard --cr-manifest deploy/examples/simplest.yaml --csv-path ./deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml

But the command above fails for some attempts with:

Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Checking that writing into CRs has an effect
FATA[0011] failed to get logs: container "scorecard-proxy" in pod "jaeger-operator-676d5f74b6-84kmd" is waiting to start: ContainerCreating 

Eventually, I get this output:

Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Checking that writing into CRs has an effect
Checking for CRD resources
Checking for existence of example CRs
Checking spec descriptors
Checking status descriptors
Basic Operator:
	Spec Block Exists: 1/1 points
	Status Block Exist: 1/1 points
	Operator actions are reflected in status: 0/1 points
	Writing into CRs has an effect: 1/1 points
OLM Integration:
	Owned CRDs have resources listed: 0/1 points
	CRs have at least 1 example: 1/1 points
	Spec fields with descriptors: 0/12 points
	Status fields with descriptors: N/A (depends on an earlier test that failed)

Total Score: 4/18 points

It would be nice to add the expected output to the doc as well, along with a description of what's necessary to avoid the error above.

Which version of the Operator SDK is required for this? If a version newer than 0.1.1 is required, we need to update a few places as well:

  • CONTRIBUTING.adoc - Installing the Operator SDK command line tool
  • .travis/install.sh
  • Gopkg.toml (and the lock file, after running dep ensure)

deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml, line 23 at r2 (raw file):

            "strategy": "allInOne",
            "allInOne": {
              "image": "jaegertracing/all-in-one:1.8",

This demonstrates that the contents of this file will get outdated really fast if we don't add some sort of automated check... We are at 1.9 already :-)

@awgreene
Copy link
Contributor Author

awgreene commented Jan 28, 2019

@jpkrohling

This demonstrates that the contents of this file will get outdated really fast if we don't add some sort of automated check... We are at 1.9 already :-)

This is a fast moving project! The operator-sdk introduced functionality that will generate a CSV based on the deploy files. If the deploy files change and a CSV already exist the sdk will update the changed values in the CSV. The hook could run the SDK command and check if the files don't match. Note: There is a pending PR to the SDK that need to be merged before this can be done.

Copy link
Contributor Author

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @awgreene and @jpkrohling)


CONTRIBUTING.adoc, line 146 at r2 (raw file):

provides and maintains a
Good point, I will update the contribution doc to reflect this suggestion.


CONTRIBUTING.adoc, line 151 at r2 (raw file):

operator-sdk scorecard --cr-manifest deploy/examples/simplest.yaml --csv-path ./deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml

This is correct, I will update the contribution doc.

But the command above fails for some attempts with:
Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Checking that writing into CRs has an effect
FATA[0011] failed to get logs: container "scorecard-proxy" in pod "jaeger-operator-676d5f74b6-84kmd" is waiting to start: ContainerCreating

This error appears to be the result of the Jaeger operator not coming up in time for the CR test - one can configure the timeout with an addition flag to the sdk:

$ operator-sdk scorecard --help 
...
-init-timeout int             Timeout for status block on CR to be created in seconds (default 10)
...

I will update the example command above to include a 30 second init-timeout to avoid this in the future.

Which version of the Operator SDK is required for this?

The scorecard feature is currently implemented in the master branch, a tagged release with the feature hasn't yet been released. For now, I have update this section of the document to suggest using the master branch and will update it once the version has been tagged.


deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml, line 18 at r1 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

What's the source for these keywords? Perhaps distributed could be removed, or perhaps changed to distributed tracing?

Done.

@awgreene awgreene force-pushed the jaeger-csv branch 5 times, most recently from 3da1e28 to 14952c8 Compare January 28, 2019 21:38
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r3.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @awgreene)

a discussion (no related file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

I have one more request: could you please add information to the CONTRIBUTING.adoc about how to run the scorecard tests?

I think we are close. I'm just missing changes to the following files (from a previous comment):

  • CONTRIBUTING.adoc - Installing the Operator SDK command line tool
  • .travis/install.sh
  • Gopkg.toml (and the lock file, after running dep ensure)

There are a couple of other non-blocking comments as well and I'm a bit uneasy to use master instead of a tagged version, but considering that we won't fail the build because of this, I think it's OK to have it there as a "preview".



CONTRIBUTING.adoc, line 151 at r2 (raw file):

Previously, awgreene (Alexander Greene) wrote…

operator-sdk scorecard --cr-manifest deploy/examples/simplest.yaml --csv-path ./deploy/bundle/jaeger.v1.8.2.clusterserviceversion.yaml

This is correct, I will update the contribution doc.

But the command above fails for some attempts with:
Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Checking that writing into CRs has an effect
FATA[0011] failed to get logs: container "scorecard-proxy" in pod "jaeger-operator-676d5f74b6-84kmd" is waiting to start: ContainerCreating

This error appears to be the result of the Jaeger operator not coming up in time for the CR test - one can configure the timeout with an addition flag to the sdk:

$ operator-sdk scorecard --help 
...
-init-timeout int             Timeout for status block on CR to be created in seconds (default 10)
...

I will update the example command above to include a 30 second init-timeout to avoid this in the future.

Which version of the Operator SDK is required for this?
The scorecard feature is currently implemented in the master branch, a tagged release with the feature hasn't yet been released. For now, I have update this section of the document to suggest using the master branch and will update it once the version has been tagged.

In a follow-up PR, I'll add this command as is to a Makefile target, probably named scorecard, so that developers just need to run make scorecard.

Also, the timeout would probably need to be even higher, as I did get the same failure as before:

$ operator-sdk scorecard --cr-manifest deploy/examples/simplest.yaml --csv-path deploy/olm-catalog/jaeger-operator.csv.yaml --init-timeout 30
Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Checking that writing into CRs has an effect
FATA[0031] failed to get logs: container "scorecard-proxy" in pod "jaeger-operator-7bdf49d479-rl8l8" is waiting to start: ContainerCreating 

@awgreene: could you please confirm the following message isn't a real problem?

Unknown type for key (enabled) in spec: (<nil>)

Here's the full context (3rd output line):

$ operator-sdk scorecard --cr-manifest deploy/examples/simplest.yaml --csv-path deploy/olm-catalog/jaeger-operator.csv.yaml --init-timeout 30
Checking for existence of spec and status blocks in CR
Checking that operator actions are reflected in status
Unknown type for key (enabled) in spec: (<nil>)
Checking that writing into CRs has an effect
Checking for CRD resources
Checking for existence of example CRs
Checking spec descriptors
Checking status descriptors
Basic Operator:
	Spec Block Exists: 1/1 points
	Status Block Exist: 1/1 points
	Operator actions are reflected in status: 0/1 points
	Writing into CRs has an effect: 1/1 points
OLM Integration:
	Owned CRDs have resources listed: 0/1 points
	CRs have at least 1 example: 1/1 points
	Spec fields with descriptors: 0/12 points
	Status fields with descriptors: N/A (depends on an earlier test that failed)

Total Score: 4/18 points

deploy/olm-catalog/jaeger-operator.csv.yaml, line 16 at r3 (raw file):

            "strategy": "allInOne",
            "allInOne": {
              "image": "jaegertracing/all-in-one:1.8",

This should also be bumped to 1.9, but can be done in a follow-up PR.


deploy/olm-catalog/jaeger.package.yaml, line 4 at r3 (raw file):

channels:
- name: alpha
  currentCSV: jaeger-operator.v1.8.2

Shouldn't this be 1.9.0?

@awgreene awgreene force-pushed the jaeger-csv branch 2 times, most recently from 0220e60 to c3cb162 Compare January 29, 2019 19:39
Copy link
Contributor Author

@awgreene awgreene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 4 unresolved discussions (waiting on @jpkrohling)

a discussion (no related file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

I think we are close. I'm just missing changes to the following files (from a previous comment):

  • CONTRIBUTING.adoc - Installing the Operator SDK command line tool
  • .travis/install.sh
  • Gopkg.toml (and the lock file, after running dep ensure)

There are a couple of other non-blocking comments as well and I'm a bit uneasy to use master instead of a tagged version, but considering that we won't fail the build because of this, I think it's OK to have it there as a "preview".

I assume we can't update the .travis/install.sh and Gopkg.toml until the new tagged version is released right?



CONTRIBUTING.adoc, line 151 at r2 (raw file):

@awgreene: could you please confirm the following message isn't a real problem?

I spoke with Alex Pavel, the designer of the Operator-sdk scorecard feature, and he said that the error is a result of a couple of structs in the spec that have a field named "enabled" of type *bool. If that is unset, that value is <nil>. It might be a bit cleaner to no have pointers as fields for k8s CRs, and that might fix the problem but there may be a reason for using pointers instead of real bools, so it depends how the operator actually works. The "Operator actions are reflected in status" test is still a new feature and future development to improve the test is planned. This should not be seen as a huge issue for now.


deploy/olm-catalog/jaeger-operator.csv.yaml, line 16 at r3 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

This should also be bumped to 1.9, but can be done in a follow-up PR.

I can make sure to update this now since I have to address the error in the package.


deploy/olm-catalog/jaeger.package.yaml, line 4 at r3 (raw file):

Previously, jpkrohling (Juraci Paixão Kröhling) wrote…

Shouldn't this be 1.9.0?

It should be - I failed to update this after upgrading the CSV with the SDK.

…operator through OperatorHub

- This pull requests introduces a CSV and Package which can be bundled together and used by the [Operator Lifecycle Manager](https://github.com/operator-framework/operator-lifecycle-manager) to install, manage, and upgrade the Jaeger Operator in a cluster.

- The Jaeger operator versions available to all Kubernetes clusters using OLM can be updated by submitting a pull request to the [Community Operators GitHub repo](https://github.com/operator-framework/community-operators) that includes the latest CSVs, CRDs, and Packages.

- The CSV was created based of the [documentation provided by OLM](https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/building-your-csv.md) and the [documentation](https://github.com/operator-framework/community-operators/blob/master/docs/marketplace-required-csv-annotations.md) provided by [OperatorHub](https://github.com/operator-framework/operator-marketplace).

- The operator and the CSV were tested using the [scorecard functionality recently introduced](operator-framework/operator-sdk#758) to the [Operator-sdk](https://github.com/operator-framework/operator-sdk). Currently, the Jaeger operator scores a 4/18 on the scorecard tests.

OLM integration could be improved by future code changes to the Operator:
- Addressing the operator-sdk scorecard tests that failed.
- Adding additional information to the Jaeger CSV based on the [CSV documentation provided by OLM](https://github.com/operator-framework/operator-lifecycle-manager/blob/master/Documentation/design/building-your-csv.md)
- Adding additional information to the Jaeger CSV based on the [CSV documentation provided by OperatorHub](https://github.com/operator-framework/community-operators/blob/master/docs/marketplace-required-csv-annotations.md)

Signed-off-by: awgreene <[email protected]>
@awgreene
Copy link
Contributor Author


deploy/olm-catalog/jaeger.package.yaml, line 4 at r3 (raw file):

Previously, awgreene (Alexander Greene) wrote…

It should be - I failed to update this after upgrading the CSV with the SDK.

Done.

@awgreene
Copy link
Contributor Author


deploy/olm-catalog/jaeger-operator.csv.yaml, line 16 at r3 (raw file):

Previously, awgreene (Alexander Greene) wrote…

I can make sure to update this now since I have to address the error in the package.

Done.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @awgreene)

a discussion (no related file):

Previously, awgreene (Alexander Greene) wrote…

I assume we can't update the .travis/install.sh and Gopkg.toml until the new tagged version is released right?

I'd rather not, as we don't want the PR builds to break because of something outside of the PRs (like a bad commit to master).

I think this :lgtm: for a first round. We may just need to perform a follow-up PR based on our day-to-day usage.

@objectiser WDYT?



CONTRIBUTING.adoc, line 151 at r2 (raw file):

Previously, awgreene (Alexander Greene) wrote…

@awgreene: could you please confirm the following message isn't a real problem?

I spoke with Alex Pavel, the designer of the Operator-sdk scorecard feature, and he said that the error is a result of a couple of structs in the spec that have a field named "enabled" of type *bool. If that is unset, that value is <nil>. It might be a bit cleaner to no have pointers as fields for k8s CRs, and that might fix the problem but there may be a reason for using pointers instead of real bools, so it depends how the operator actually works. The "Operator actions are reflected in status" test is still a new feature and future development to improve the test is planned. This should not be seen as a huge issue for now.

*bool is valid, because the absence of value has a meaning (not set), allowing the operator to apply a sensible default (true/false). I'll just ignore this message for now, but let me know if you want me to open an issue against the Operator SDK repository.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@objectiser
Copy link
Contributor

LGTM

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@jpkrohling jpkrohling merged commit 21a9daa into jaegertracing:master Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants