Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenSearchISMPolicy doesn't work #732

Open
vchirikov opened this issue Feb 20, 2024 · 4 comments
Open

[BUG] OpenSearchISMPolicy doesn't work #732

vchirikov opened this issue Feb 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@vchirikov
Copy link

vchirikov commented Feb 20, 2024

I created a OpenSearchISMPolicy according to the current spec, but it isn't created in opensearch (error in logs is the object has been modified; please apply your changes to the latest version and try again).
Looks like this kind doesn't work at all (check also #696 this)

apiVersion: opensearch.opster.io/v1
kind: OpenSearchISMPolicy
metadata:
  name: rollover
  namespace: infra
spec:
  opensearchCluster:
    name: opensearch
  policyId: default
  description: Rollover ISM policy
  ismTemplate:
    priority: 100
    indexPatterns:
      - "apps*"
      - "kube*"
      - "node*"
      - "ingress*"
  errorNotification:
    channel: slack
    destination:
      slack:
        url: "${SLACK_ALERTS_WEBHOOK_URL}"
    messageTemplate:
      source: "The index {{ctx.index}} failed during ISM policy execution"
  defaultState: hot
  states:
    - name: hot
      actions:
        - rollover:
            minIndexAge: 10d
            minPrimaryShardSize: 10gb
        - indexPriority:
            priority: 100
      transitions:
        - stateName: warm
          conditions:
            minIndexAge: 7d
            minRolloverAge: 7d
    - name: warm
      actions:
        - indexPriority:
            priority: 50
      transitions:
        - stateName: delete
          conditions:
            minIndexAge: 20d
    - name: delete
      transitions: []
      actions:
        - delete: {}

logs:

{"level":"info","ts":"2024-02-20T13:18:33.780Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"infra"},"namespace":"infra","name":"opensearch","reconcileID":"7e721d68-7a74-4aa9-ae84-add870efee35","cluster":{"name":"opensearch","namespace":"infra"}}
{"level":"info","ts":"2024-02-20T13:18:33.799Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"infra"},"namespace":"infra","name":"opensearch","reconcileID":"7e721d68-7a74-4aa9-ae84-add870efee35","interface":"transport"}
{"level":"info","ts":"2024-02-20T13:18:33.799Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"opensearch","namespace":"infra"},"namespace":"infra","name":"opensearch","reconcileID":"7e721d68-7a74-4aa9-ae84-add870efee35","interface":"http"}
{"level":"info","ts":"2024-02-20T13:18:57.218Z","msg":"Reconciling OpensearchISMPolicy","controller":"opensearchismpolicy","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchISMPolicy","OpenSearchISMPolicy":{"name":"rollover","namespace":"infra"},"namespace":"infra","name":"rollover","reconcileID":"cd48d62d-3287-4063-811f-54a910d41b1f","tenant":{"name":"rollover","namespace":"infra"}}
{"level":"info","ts":"2024-02-20T13:18:57.227Z","msg":"Reconciling OpensearchIndexTemplate","controller":"opensearchindextemplate","controllerGroup":"opensearch.opster.io","controllerKind":"OpensearchIndexTemplate","OpensearchIndexTemplate":{"name":"all","namespace":"infra"},"namespace":"infra","name":"all","reconcileID":"67058e28-82d7-4951-ac6f-0c8235a3d7b9","indextemplate":{"name":"all","namespace":"infra"}}
{"level":"info","ts":"2024-02-20T13:18:57.350Z","msg":"Reconciling OpensearchISMPolicy","controller":"opensearchismpolicy","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchISMPolicy","OpenSearchISMPolicy":{"name":"rollover","namespace":"infra"},"namespace":"infra","name":"rollover","reconcileID":"a627ee69-6d8e-4aec-b9e0-53e4cc0a09b7","tenant":{"name":"rollover","namespace":"infra"}}
{"level":"error","ts":"2024-02-20T13:18:57.363Z","msg":"Reconciler error","controller":"opensearchismpolicy","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchISMPolicy","OpenSearchISMPolicy":{"name":"rollover","namespace":"infra"},"namespace":"infra","name":"rollover","reconcileID":"a627ee69-6d8e-4aec-b9e0-53e4cc0a09b7","error":"Operation cannot be fulfilled on opensearchismpolicies.opensearch.opster.io \"rollover\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"}
{"level":"info","ts":"2024-02-20T13:18:57.363Z","msg":"Reconciling OpensearchISMPolicy","controller":"opensearchismpolicy","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchISMPolicy","OpenSearchISMPolicy":{"name":"rollover","namespace":"infra"},"namespace":"infra","name":"rollover","reconcileID":"2deb7560-cf4d-4e6c-a7fa-76143ec10945","tenant":{"name":"rollover","namespace":"infra"}}
{"level":"info","ts":"2024-02-20T13:18:57.422Z","msg":"Reconciling OpensearchIndexTemplate","controller":"opensearchindextemplate","controllerGroup":"opensearch.opster.io","controllerKind":"OpensearchIndexTemplate","OpensearchIndexTemplate":{"name":"all","namespace":"infra"},"namespace":"infra","name":"all","reconcileID":"3b6836b9-10f8-41a9-9ec0-63f0b41450bb","indextemplate":{"name":"all","namespace":"infra"}}
{"level":"info","ts":"2024-02-20T13:18:57.440Z","msg":"Reconciling OpensearchISMPolicy","controller":"opensearchismpolicy","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchISMPolicy","OpenSearchISMPolicy":{"name":"rollover","namespace":"infra"},"namespace":"infra","name":"rollover","reconcileID":"cdf2e629-0b1a-4ff8-a8df-fc4613491081","tenant":{"name":"rollover","namespace":"infra"}}

docker image: opensearchproject/opensearch-operator:2.5.1

@vchirikov vchirikov added bug Something isn't working untriaged Issues that have not yet been triaged labels Feb 20, 2024
@PauloJFCabral
Copy link

Hi, I'm trying to do the same, but I have some doubts.
After create the OpenSearchISMPolicy resource I need to reference this in cluster configurations?
How this is applying in the cluster?

@vchirikov
Copy link
Author

ISMPolicy references cluster via

spec:
  opensearchCluster:
    name: opensearch

@prudhvigodithi
Copy link
Member

[Triage]
Looks like this works for @cthtrifork based on the PR that was recently merged #750, @cthtrifork can you please help here.
Thanks

@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Mar 25, 2024
@cthtrifork
Copy link
Contributor

Hmm I am unsure if my fix is related and it sounds like a kubernetes reconcile error. Can we create a beta release of the reconciler and I can help with testing and investigation.
If a official beta release is not possible, then a guide for building and creating the docker image which you could push to a private registry.

@getsaurabh02 getsaurabh02 moved this from 🆕 New to Backlog in Engineering Effectiveness Board Jul 18, 2024
swoehrl-mw pushed a commit that referenced this issue Aug 22, 2024
### Description
The ISM Policy reconciler was constantly trying to update the ISM Policy
and it was not handling reconciliation requeue in some cases. There were
possibly other issues as well. Below I have described what caused the
different issues I encountered

- The ISM Policy request was different from the response, but they were
both made with the same struct. This caused the reconciler to always see
the existing ISM Policy and the ISM Policy from the CR as different and
try to update it. I have created a separate struct model for each to
separate the logic and in the code I now compare the existing policy
with the policy from the CR by comparing both the Policy IDs and the
policy spec
- There were some very complex cases in the code that were very
difficult to understand so I have attempted to make the code more
concise and easy to read and understand
- I have added reconciliation requeuing to all cases so the operator
doesn't just stop reconciling the ISM Policy in some cases

One thing I am wondering is that I am not sure why we would want to
create a CR without specifying the cluster ID and then the operator
automatically links it to that cluster ID so it breaks if the OpenSearch
CR is deleted. Is this intended and why? I'm talking about the section
with the comment "Check cluster ref has not changed"

Tested cases:
- A new ISM Policy is created through a CR and the operator creates it
in the OpenSearch Cluster
- The CR for an ISM Policy that is created by the operator is removed
and the operator removes it in the OpenSearch Cluster
- An ISM Policy that already exists in the OpenSearch Cluster is created
through a CR and the operator ignores it and marks it as existing
- The CR for an ISM Policy that was pre-existing and therefore was not
created by the operator is removed and the operator does not remove the
ISM Policy from the OpenSearch Cluster
- An ISM Policy that already exists in the OpenSearch Cluster is created
through a CR and the operator ignores it and marks it as existing. The
ISM Policy is then manually removed from the OpenSearch Cluster and the
operator now applies the ISM Policy from the CR

The test for ISM Policies is currently failing miserably, but I decided
to create the PR to get feedback before I dive into fixing it.

### Issues Resolved
#833
#732
Possibly other issues

### Check List
- [x] Commits are signed per the DCO using --signoff 
- [x] Unittest added for the new/changed functionality and all unit
tests are successful
- [x] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [ ] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [ ] Changes to CRDs documented

Please refer to the [PR
guidelines](https://github.com/opensearch-project/opensearch-k8s-operator/blob/main/docs/developing.md#submitting-a-pr)
before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: rkthtrifork <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 📦 Backlog
Development

No branches or pull requests

4 participants