Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

ShyamsundarR · 2023-03-07T01:45:41Z

This is a problem that was reported earlier by @nirs that the method to get ramen catalog and bundles installed via OLM on a minikube cluster as described here does not work.

Subsequent testing with and without drenv resulted in the following conclusion:

In a vanilla minikube cluster, if the steps are followed as laid out AND olm installed using operator-sdk, the ramen bundle gets installed and the operator starts running
In drenv if the steps are followed, the pod created for the catalog source in the ramen-system namespace crashes with errors like so: Error: open db-118615996: permission denied
- this leads to the Subscription not resolving to fetch and install the bundle as the CatalogSource remains unhealthy with a TRANSIENT_FAILURE
In the same drenv created cluster, if operator-sdk was used to uninstall and then install olm again, the scheme as before starts working.

The issue seems to either be the version of olm installed by drenv (0.22) or the manner of installing the same (although steps seems to follow the upstream olm install procedure as laid out). This needs further investigation and a fix, in case operator-sdk is not going to be used to install olm.

Another alternative could be to try using the install script provided part of the olm releases to install and ensure our catalog works. This also seems to be less work at our end to install, than go through installing various manifests one after the other.

The text was updated successfully, but these errors were encountered:

ShyamsundarR · 2023-03-07T01:55:26Z

Another alternative could be to try using the install script provided part of the olm releases to install and ensure our catalog works. This also seems to be less work at our end to install, than go through installing various manifests one after the other.

Tried the above method, with 0.22.0 version it still failed. With 0.23.1 version it worked as expected. For now we should move to 0.23.1 (or use operator-sdk for latest version install, which is usually a bad idea anyway) to overcome this issue.

A deeper analysis may throw up what the actual problem is/was, but the above should be enough to make forward progress with bundles in the e2e system.

nirs · 2023-03-09T15:30:49Z

@Shwetha-Acharya do you want to take this issue? This should be a trivial change
and good learning task.

Testing this is building the ramen bundle and installing it in the clusters
as described in the install guide.

ShyamsundarR · 2023-03-09T22:28:31Z

After pr #729 was merged, the bundles now work with the olm version 0.22 that is installed by drenv, I suspected the opm versions in use, so potentially updating that has helped.

So we do not need to shift versions as long as it is not required. Feel free to close this issue if needed.

nirs · 2023-03-09T22:36:08Z

Nice! but do we have any reason to pin version 0.22?

I think it is better to always use the latest release, this way if a new
release breaks us, the tests will discover this early, hopefully before
users experience the breakage.

ShyamsundarR · 2023-03-09T22:56:17Z

Nice! but do we have any reason to pin version 0.22?

Not necessary.

I think it is better to always use the latest release, this way if a new release breaks us, the tests will discover this early, hopefully before users experience the breakage.

We should pin it to a released version, during the course of development to not have to deal with instability from the dependents.

Closer to a ramen release the latest released version to ensure non-breakage.

nirs · 2023-03-12T13:59:36Z

Updating depenedencies right before release is too risky. I think it will be
safer to update our dependencies when we start new development cycle, for
example after rleasing upstream version. With this we know that the release
version was tested with certain dependncies during development.

For the next release I think it should be good enough to upgrade olm now
since we don't have any upstream users yet.

nirs · 2023-03-24T19:06:54Z

I think before we upgrade olm we need to understand why we don't one of the official
ways to install olm:

Using operator-sdk https://olm.operatorframework.io/docs/getting-started/

Using the install script:

Install Operator Lifecycle Manager (OLM), a tool to help manage the Operators running on your cluster.

$ curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.24.0/install.sh | bash -s v0.24.0

This is part of the instructions for installing an operator, shown when clicking the "Install" button in operatorhub.io, for example in https://operatorhub.io/operator/minio-operator.

Then either change our minio installation, or document why we cannot use one of the
official ways.

nirs · 2023-03-24T19:18:25Z

Before we change olm install, we need olm self test (olm/test).

The test should install an example operator that is quick to install
and check that the operator is deployed properly.

It should pass with current code based on @ShyamsundarR report, and
with the olm deploy code and olm version.

nirs added good first issue Good for newcomers test Testing related issue labels Mar 9, 2023

ShyamsundarR closed this as completed Mar 9, 2023

ShyamsundarR reopened this Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

ShyamsundarR commented Mar 7, 2023

ShyamsundarR commented Mar 7, 2023

nirs commented Mar 9, 2023

ShyamsundarR commented Mar 9, 2023

nirs commented Mar 9, 2023

ShyamsundarR commented Mar 9, 2023

nirs commented Mar 12, 2023

nirs commented Mar 24, 2023

nirs commented Mar 24, 2023

Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

Ramen catalog fails to report healthy in drenv, potentially due to olm installation differences #745

Comments

ShyamsundarR commented Mar 7, 2023

ShyamsundarR commented Mar 7, 2023

nirs commented Mar 9, 2023

ShyamsundarR commented Mar 9, 2023

nirs commented Mar 9, 2023

ShyamsundarR commented Mar 9, 2023

nirs commented Mar 12, 2023

nirs commented Mar 24, 2023

nirs commented Mar 24, 2023