Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

Open
ShyamsundarR opened this issue Mar 7, 2023 · 8 comments
Labels
good first issue Good for newcomers test Testing related issue

Comments

@ShyamsundarR
Copy link
Member

This is a problem that was reported earlier by @nirs that the method to get ramen catalog and bundles installed via OLM on a minikube cluster as described here does not work.

Subsequent testing with and without drenv resulted in the following conclusion:

  1. In a vanilla minikube cluster, if the steps are followed as laid out AND olm installed using operator-sdk, the ramen bundle gets installed and the operator starts running
  2. In drenv if the steps are followed, the pod created for the catalog source in the ramen-system namespace crashes with errors like so: Error: open db-118615996: permission denied
    • this leads to the Subscription not resolving to fetch and install the bundle as the CatalogSource remains unhealthy with a TRANSIENT_FAILURE
  3. In the same drenv created cluster, if operator-sdk was used to uninstall and then install olm again, the scheme as before starts working.

The issue seems to either be the version of olm installed by drenv (0.22) or the manner of installing the same (although steps seems to follow the upstream olm install procedure as laid out). This needs further investigation and a fix, in case operator-sdk is not going to be used to install olm.

Another alternative could be to try using the install script provided part of the olm releases to install and ensure our catalog works. This also seems to be less work at our end to install, than go through installing various manifests one after the other.

@ShyamsundarR
Copy link
Member Author

Another alternative could be to try using the install script provided part of the olm releases to install and ensure our catalog works. This also seems to be less work at our end to install, than go through installing various manifests one after the other.

Tried the above method, with 0.22.0 version it still failed. With 0.23.1 version it worked as expected. For now we should move to 0.23.1 (or use operator-sdk for latest version install, which is usually a bad idea anyway) to overcome this issue.

A deeper analysis may throw up what the actual problem is/was, but the above should be enough to make forward progress with bundles in the e2e system.

@nirs nirs added good first issue Good for newcomers test Testing related issue labels Mar 9, 2023
@nirs
Copy link
Member

nirs commented Mar 9, 2023

@Shwetha-Acharya do you want to take this issue? This should be a trivial change
and good learning task.

Testing this is building the ramen bundle and installing it in the clusters
as described in the install guide.

@ShyamsundarR
Copy link
Member Author

After pr #729 was merged, the bundles now work with the olm version 0.22 that is installed by drenv, I suspected the opm versions in use, so potentially updating that has helped.

So we do not need to shift versions as long as it is not required. Feel free to close this issue if needed.

@nirs
Copy link
Member

nirs commented Mar 9, 2023

Nice! but do we have any reason to pin version 0.22?

I think it is better to always use the latest release, this way if a new
release breaks us, the tests will discover this early, hopefully before
users experience the breakage.

@ShyamsundarR
Copy link
Member Author

Nice! but do we have any reason to pin version 0.22?

Not necessary.

I think it is better to always use the latest release, this way if a new release breaks us, the tests will discover this early, hopefully before users experience the breakage.

We should pin it to a released version, during the course of development to not have to deal with instability from the dependents.

Closer to a ramen release the latest released version to ensure non-breakage.

@nirs
Copy link
Member

nirs commented Mar 12, 2023

Updating depenedencies right before release is too risky. I think it will be
safer to update our dependencies when we start new development cycle, for
example after rleasing upstream version. With this we know that the release
version was tested with certain dependncies during development.

For the next release I think it should be good enough to upgrade olm now
since we don't have any upstream users yet.

@nirs
Copy link
Member

nirs commented Mar 24, 2023

I think before we upgrade olm we need to understand why we don't one of the official
ways to install olm:

  • Using operator-sdk https://olm.operatorframework.io/docs/getting-started/
  • Using the install script:
    Install Operator Lifecycle Manager (OLM), a tool to help manage the Operators running on your cluster.
    
    $ curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.24.0/install.sh | bash -s v0.24.0
    
    This is part of the instructions for installing an operator, shown when clicking the "Install" button in operatorhub.io, for example in https://operatorhub.io/operator/minio-operator.

Then either change our minio installation, or document why we cannot use one of the
official ways.

@nirs
Copy link
Member

nirs commented Mar 24, 2023

Before we change olm install, we need olm self test (olm/test).

The test should install an example operator that is quick to install
and check that the operator is deployed properly.

It should pass with current code based on @ShyamsundarR report, and
with the olm deploy code and olm version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers test Testing related issue
Projects
None yet
Development

No branches or pull requests

2 participants