Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] index template priority issue causing the server to crash on upgrading from 2.7 to 2.9 or 2.10 #1771

Closed
Bogendra-Betapudi opened this issue Oct 9, 2023 · 21 comments
Assignees
Labels
backwards-compatibility bug Something isn't working

Comments

@Bogendra-Betapudi
Copy link

Bogendra-Betapudi commented Oct 9, 2023

Describe the bug
When upgrading the Open search server from 2.7 to 2.9 or 2.10, the server crashes in loop and tries to restart and in the logs we notice an error w.r.to the index templates and priority.
Upon further investigation, found this thread where one of the other user also ran into the same issue. But in my case, the server doesn't start up as it tries to restart in a loop and crashes eventually.

To Reproduce
Steps to reproduce the behavior:

  1. Have a server installed and running with version prior to 2.9 (mine was 2.7)
  2. Now, upgrade the server to 2.9 or 2.10 by downloading the corresponding zip/tar from the official website.
  3. Restart the server with the newer version artifacts
  4. See error in the logs and the server will not be up for servicing and eventually crash.
  5. Below is the snippet of the error that is continuously seen in the logs:

[2023-10-09T14:54:01,342][ERROR][o.o.b.Bootstrap ] [localhost] Exception
java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics--] matching patterns from existing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics--]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority

Note that this is happening even with plain vanilla installation. The steps I tried are here: https://github.com/opensearch-project/observability/issues/1771

Expected behavior
The upgrade should be successful and the server should be up and running without any errors/issues.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Windows

Additional context
I tried to get past this by trying to update the priority of the existing template but that resulted in creating a new template instead. This is a blocker as once we try to upgrade, it will fail due to this issue and we can't go back to using the older version as well as the index context has been updated with the latest Open search version.

@Bogendra-Betapudi
Copy link
Author

Bogendra-Betapudi commented Nov 14, 2023

Hi Team,
I'm still seeing the same error with the latest 2.11 release as well.
Can you please provide any available workaround and also an ETA on the fix?
Please let me know if any additional info is needed from my end.

Thanks!

@dblock
Copy link
Member

dblock commented Nov 16, 2023

@Bogendra-Betapudi Looks bad. Are these repro steps correct, ie. does this happen on a vanilla installation of OpenSearch?

@Bogendra-Betapudi
Copy link
Author

@dblock - yes Daniel, this happens even on a plain vanilla installation. Here is what I did:

  1. downloaded 2.7 from Open search official downloads page , proceeded with the installation, created few dummy indices and populated data.
  2. then tried upgrading this setup to the 2.9/2.10 by pointing the data folder to the existing data folder created in above setup.
  3. This fails to start with the above error. The server restarts several times and eventually dies.
  4. This is a blocker stopping us from upgrading to the latest revisions (which has fixes for several high security vulnerabilities)

Please let me know if any further info is needed from our end to expedite this.
Thanks!

@dblock
Copy link
Member

dblock commented Nov 21, 2023

This is caused by one type of index/template, right? A specific mapping? Edit your repro above with the exact minimal commands for the index/template that exhibits this behavior (ss4o_metrics_template above)? Just trying to narrow it down.

@Bogendra-Betapudi
Copy link
Author

Bogendra-Betapudi commented Nov 22, 2023

Hi @dblock , will update the reproduce steps to include minimal commands and just the relevant stack trace. Meanwhile, please find below the research and work around I have done so far to get past this:

  1. downloaded 2.7.0 and did a plain vanilla installation
  2. tried to view the existing templates out of the box via "_cat/templates" and here is the output:
    ss4o_trace_template [ss4o_traces--] 1 1 []
    ss4o_metric_template [ss4o_metrics--] 1 1 []
  3. tried updating the priority of the "ss4o_metric_template" (as its the conflicting one during the upgrade) to 0 via PUT command "_template/ss4o_metric_template" and body :
{
  "order": 0,
  "index_patterns": ["ss4o_metrics-*-*"]
}

and this resulted in creating a new template instead of updating the priority of the existing one as shown below via the cat command:

**ss4o_metric_template [ss4o_metrics-*-*] 0**             
ss4o_trace_template  [ss4o_traces-*-*]  1     1       []
ss4o_metric_template [ss4o_metrics-*-*] 1     1       []
  1. reverted back the changes by deleting the newly created template.
  2. now, tried upgrading this plain vanilla setup to 2.10 and it failed with below error:
    org.opensearch.bootstrap.StartupException: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics--] matching patterns from existing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics--]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority

As you can see, its failing with plain vanilla installation too. Would appreciate if there's a way to unblock us while a fix is being worked upon.
Please let me know if any further info is needed from my end.
Thanks!

@Bogendra-Betapudi
Copy link
Author

@dblock - Have updated the steps to reproduce with the minimal steps and snippet of the error log. Also, provided the list of things tried so far to get pas this.
Please let me know if anything else is needed from our end to get this moving.
Thanks!

@dblock
Copy link
Member

dblock commented Nov 26, 2023

I see two problems.

It looks like you tried to edit a template, but that didn't work and another template was created? I can't reproduce that one, here's what I did.

$ curl -s -k -uadmin:admin https://localhost:9200 | jq ".version.number"
"2.9.0"

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template | jq ".index_templates[].name"
"ss4o_metrics_template"
"ss4o_traces_template"

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template/ss4o_metrics_template | jq ".index_templates[].name"
"ss4o_metrics_template"

$ curl -s -k -uadmin:admin -X PUT https://localhost:9200/_index_template/ss4o_metrics_template -H "Content-Type: application/json" -d'{"index_patterns":["ss4o_metrics-*-*"], "priority":0}' | jq
{
  "acknowledged": true
}

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template | jq ".index_templates[].name"
"ss4o_metrics_template"
"ss4o_traces_template"

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template | jq ".index_templates[].index_template.priority"
0
1

Reading the error carefully it looks like you have a template called ss4o_metrics_template and another called ss4o_metric_template (one with s and another without), so this conflict makes sense. You cannot have the same priority [1] for multiple index templates.

tl;dr, @Bogendra-Betapudi is the problem the typo in trying to update the existing template that causes a new template to be created inadvertedly?

Assuming it is, one expect the error to happen when the second template was being created (via PUT), which is what I see happen in 2.9.

Please let me know if anything else is needed from our end to get this moving.

In general bugs get triaged and assigned at some point soon, but you'll be faster served by digging through the problem and trying to fix it if you have time. We sincerely appreciate your help.

Related, I found https://forum.opensearch.org/t/java-lang-illegalargumentexception-index-template-how-critical/15306/16, opensearch-project/OpenSearch#837 and opensearch-project/OpenSearch#8926.

@dblock
Copy link
Member

dblock commented Nov 26, 2023

Note that this in your example looked suspicious.

ss4o_metric_template [ss4o_metrics--] 0
ss4o_trace_template [ss4o_traces--] 1 1 []
ss4o_metric_template [ss4o_metrics--] 1 1 []

It has ss4o_metric_template twice. If I try with an s I get an error trying to create the template as expected.

$ curl -s -k -uadmin:admin -X PUT https://localhost:9200/_index_template/ss4o_metric_template -H "Content-Type: application/json" -d'{"index_patterns":["ss4o_metrics-*-*"], "priority":0}' | jq
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "index template [ss4o_metric_template] has index patterns [ss4o_metrics-*-*] matching patterns from existing templates [ss4o_metrics_template] with patterns (ss4o_metrics_template => [ss4o_metrics-*-*]) that have the same priority [0], multiple index templates may not match during index creation, please use a different priority"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "index template [ss4o_metric_template] has index patterns [ss4o_metrics-*-*] matching patterns from existing templates [ss4o_metrics_template] with patterns (ss4o_metrics_template => [ss4o_metrics-*-*]) that have the same priority [0], multiple index templates may not match during index creation, please use a different priority"
  },
  "status": 400
}

So how do we get into the state where you have two templates with the same name? @Bogendra-Betapudi curl repro steps?

@dblock
Copy link
Member

dblock commented Nov 26, 2023

I tried with 2.7.0.

$ curl -s -k -uadmin:admin https://localhost:9200 | jq ".version.number"
"2.7.0"

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template | jq ".index_templates[].name"
"ss4o_trace_template"
"ss4o_metric_template"

curl -s -k -uadmin:admin -X PUT https://localhost:9200/_index_template/ss4o_metrics_template -H "Content-Type: application/json" -d'{"index_patterns":["ss4o_metrics-*-*"], "priority":0}' | jq
{
  "acknowledged": true
}

Lo and behold, I got two templates, but they do have a different name.

$ curl -s -k -uadmin:admin https://localhost:9200/_index_template | jq ".index_templates[].name"
"ss4o_trace_template"
"ss4o_metric_template"
"ss4o_metrics_template"

So looks like the template was renamed in 2.8.0 or 2.9.0?

I think upgrade is a red herring, that's expected to fail if you have two templates with the same priority for the same index. So the issue is how we got into that state before the upgrade. @Bogendra-Betapudi help narrow this down, try my repro steps in 2.8.0 (I'm on a plane and docker pull will take forever :)), I'd like to know whether this was renamed in 2.8 or 2.9. Also @bowenlan-amzn maybe you recognize this bug?

@bowenlan-amzn
Copy link
Member

I don't recognize this. Did a quick search, seem it's from this repo
https://github.com/opensearch-project/opensearch-catalog/blob/4691ec8a01abfabbbe3e728103a3408329179958/docs/schema/system/samples/instances.json#L55

So @YANG-DB probably can help here.

@Bogendra-Betapudi
Copy link
Author

Bogendra-Betapudi commented Nov 28, 2023

@dblock - Thanks for checking. I tried reproducing the same with the 2.8 as suggested. Please find below the detailed steps:

  1. Installed the 2.8.0 version and did a cat on templates and I see below :
    name index_patterns order version composed_of ss4o_trace_template [ss4o_traces-*-*] 1 1 [] ss4o_metric_template [ss4o_metrics-*-*] 1 1 []
  2. Tried updating the existing ss4o_metric_template priority to 0 from 1 (updating the existing template, not creating a new one ) with the below command:
    http://localhost:9200/_template/ss4o_metric_template { "order": 0, "index_patterns": ["ss4o_metrics-*-*"] }
    Note the name - its same as the existing one (with out the 's').
  3. The above "PUT" resulted in creating a new template with the same name and pattern but with a different priority instead of updating the existing one :
    name index_patterns order version composed_of ss4o_metric_template [ss4o_metrics-*-*] 0 ss4o_trace_template [ss4o_traces-*-*] 1 1 [] ss4o_metric_template [ss4o_metrics-*-*] 1 1 []
    Note that there are 2 templates with the same name ss4o_metric_template with different priorities.
  4. Then, deleted this newly created template and then tried creating a new template with 's' in the name (similar to the 2.9.0):
    http://localhost:9200/_template/ss4o_metrics_template { "order": 0, "index_patterns": ["ss4o_metrics-*-*"] }
  5. Now, I see both the templates (with and without 's' in their name) :
    name index_patterns order version composed_of ss4o_metrics_template [ss4o_metrics-*-*] 0 ss4o_trace_template [ss4o_traces-*-*] 1 1 [] ss4o_metric_template [ss4o_metrics-*-*] 1 1 []
  6. Then tried to upgrade to 2.10 by pointing to the data folder created by the 2.8.0 in above installation. And strangely enough, this time it failed with similar error but for a different template instead:
    org.opensearch.bootstrap.StartupException: java.lang.IllegalArgumentException: index template [ss4o_traces_template] has index patterns [ss4o_traces-*-*] matching patterns from existing templates [ss4o_trace_template] with patterns (ss4o_trace_template => [ss4o_traces-*-*]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority
  7. If you see creating a new ss4o_metrics_template with priority 0 (different than the 2.10 default priority of 1), it didnt fail on this ss4o_metrics_template template. But it failed on ss4o_trace_template and ss4o_traces_template being of the same priority.

My understanding is there's a change to the names of the default templates (by adding an 's') and they are causing an issue due to the same priority. The work around would be to be able to create these templates with a different priority or be able to update the priority of the existing ones (which didn't work as I showed above).
This seems to be related to the change mentioned by @bowenlan-amzn in above comment.
BTW, I tried creating a new ss4o_traces_template with priority 0 similar to the ss4o_metrics_template and that helped me to get past the issue with the 2.10 upgrade.
Please let me know if anything else is needed from my end to narrow it down further.

@YANG-DB
Copy link
Member

YANG-DB commented Nov 28, 2023

Thanks @bowenlan-amzn @dblock @Bogendra-Betapudi
I'm looking into this and will update as soon as I have a working solution

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 29, 2023

Likely caused by this PR which renamed the index and got picked up in 2.9, matching the repro in 2.8 -> 2.9.

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 29, 2023

In general I'd think we should be clear if we just delete these templates because we don't use them and they were superceded by integrations loading them (also released in 2.9), but I didn't write the original classes before refactoring so I'm not sure what their behavior is if the templates are cleared. @YANG-DB and I are looking at deleting this code for 2.12.

@dblock
Copy link
Member

dblock commented Nov 29, 2023

Thanks @YANG-DB @Swiddis - should I move this issue to a different repo?

To be clear, let's make sure the upgrade works without errors with the default installation.

@Swiddis
Copy link
Collaborator

Swiddis commented Dec 2, 2023

@dblock We can move the issue to observability.

Issue seems to affect default config, but it recovers. Steps:

  1. Take the default docker-compose.yml from the docs.
  2. Replace all latest with 2.8.0
  3. docker compose up
  4. Wait for everything to load, curl _cat/templates and verify the two ss4o templates are there
  5. docker compose down
  6. Replace all 2.8.0 with 2.9.0
  7. docker compose up
  8. Both nodes take a little while to start up, but eventually get there.
  9. In the startup logs there are entries matching
opensearch-node1       | org.opensearch.bootstrap.StartupException: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics-*-*] matching patterns from existing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics-*-*]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority
opensearch-node1       |        at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:184) ~[opensearch-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:171) ~[opensearch-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104) ~[opensearch-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) ~[opensearch-cli-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.cli.Command.main(Command.java:101) ~[opensearch-cli-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:137) ~[opensearch-2.9.0.jar:2.9.0]
opensearch-node1       |        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:103) ~[opensearch-2.9.0.jar:2.9.0]
opensearch-node1       | Caused by: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics-*-*] matching patterns from existing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics-*-*]) that have the same priority [1], multiple index templates may not match during index creation, please use a different priority
  1. _cat/templates only returns the old template with the singular ss4o_metric_template.

@Bogendra-Betapudi
Copy link
Author

Thanks @YANG-DB and @Swiddis for looking into this. Any plans on providing a fix for someone using the downloaded binaries from the official download site? We download the zip/archive for the windows to be used in our application and dont use it as part of the docker. How do we go about addressing such usages?
I believe the work around is to create the required templates (with the 's' in name with a priority) but in most cases its not possible/easy.
Please let me know if any further info is needed on this.
Thanks!

@dblock dblock transferred this issue from opensearch-project/OpenSearch Dec 4, 2023
@Swiddis Swiddis removed the untriaged label Dec 4, 2023
@Swiddis
Copy link
Collaborator

Swiddis commented Dec 4, 2023

The best solution I can find without a patch release is to delete both of the ss4o templates before upgrading. This is safe since the templates are otherwise unused, and avoids any StartupExceptions when upgrading. Might need more detailed reproduction steps if the solution doesn't work.

# As step 4.5 for the above process
curl -XDELETE "https://opensearch-node1:9200/_index_template/ss4o_*_template"

I still wonder why it's intermittently starting successfully, though. In a linked issue I see both people who see the exception without startup being hindered, and others where it's blocking startup entirely. I'm not sure what differentiates them.

I also have to wonder what tests we need to write to avoid future occurrences, I'd have expected something like this to be revealed by testing.

@Swiddis
Copy link
Collaborator

Swiddis commented Dec 4, 2023

Will be resolved in 2.12 onwards following #1770, what's left is verifying a workaround for 2.9-2.11.

@Bogendra-Betapudi
Copy link
Author

@Swiddis - Thanks for looking into this. For now, will wait for the 2.12 to be available and would try to use the work around if needed before the release.

@Swiddis
Copy link
Collaborator

Swiddis commented Jan 29, 2024

Marking as completed since there's a workaround available and the bug is resolved in 2.12. Workaround: delete the conflicting templates, they're unused/deprecated system templates from Integrations. Specifically the singular forms (ss4o_trace_template and ss4o_metric_template) are completely unused and should almost always be safe to delete, the plural forms (ss4o_traces_template and ss4o_metrics_template) may be in use if the user has active integrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-compatibility bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants