"failed to turn off translog retention" after upgrade #651

gferrette · 2020-08-17T12:43:24Z

Hello,

After upgrade Opendistro from version 1.0.2 to version 1.7.0, on nodes startup, its appearing on the logs the message below:

[2020-08-14T10:54:21,893][WARN ][o.e.i.s.IndexShard ] [machine] [.tasks][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]

This message is appearing for several indices, but the indices/shards are not corrupted and they are on green state, but on every startup those messages are appearing on the logs.

Is there any way to solve this issue?

Thanks in advance.

Gabriel.

peterzhuamazon · 2020-08-17T18:36:08Z

Hi @gferrette, I have tried to reproduce the issue with a centos7 server running RPM upgrading from 1.0.2 to 1.7.0. However, I am not able to reproduce such issue with simple data on my ends.

From the looks of it, this seems like an issue related to upstream.

We would appreciate if you could share more information regarding your setup and logs.

Thanks.

gferrette · 2020-08-17T19:06:24Z

Hello @peterzhuamazon !

Thanks for replying.

This issue seems to be the same of this thread https://github.com/opendistro-for-elasticsearch/security/issues/354, but in my case it's happening on several indexes and not only on audit.
My setup is a test environment with one single node, follow below elasticsearch.yml (without certificates info):

#action.destructive_requires_name: true

script.painless.regex.enabled: true

#repositorio snapshot
path.repo: ["/tmp/backup_nodes"]

######## Start OpenDistro for Elasticsearch Security Demo Configuration ########

WARNING: revise all the lines below before you go into production

opendistro_security.ssl.transport.pemcert_filepath: dummy.pem
opendistro_security.ssl.transport.pemkey_filepath: dummy-key.pem
opendistro_security.ssl.transport.pemtrustedcas_filepath: root-ca.pem
opendistro_security.ssl.transport.enforce_hostname_verification: false
opendistro_security.ssl.http.enabled: false
opendistro_security.ssl.http.pemcert_filepath: dummy.pem
opendistro_security.ssl.http.pemkey_filepath: dummy-key.pem
opendistro_security.ssl.http.pemtrustedcas_filepath: root-ca.pem
#opendistro_security.allow_unsafe_democertificates: true
opendistro_security.allow_default_init_securityindex: true
opendistro_security.authcz.admin_dn:

'DUMMY'
opendistro_security.nodes_dn:
'DUMMY'

opendistro_security.enable_snapshot_restore_privilege: true
opendistro_security.check_snapshot_restore_write_privileges: true
opendistro_security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]
cluster.routing.allocation.disk.threshold_enabled: false
node.max_local_storage_nodes: 3
######## End OpenDistro for Elasticsearch Security Demo Configuration ########

More logs info:

[2020-08-17T15:57:03,094][INFO ][o.e.g.GatewayService ] [machine] recovered [27] indices into cluster_state
[2020-08-17T15:57:03,121][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [machine] Node started
[2020-08-17T15:57:03,122][INFO ][c.a.o.s.c.ConfigurationRepository] [machine] Check if .opendistro_security index exists ...
[2020-08-17T15:57:03,122][INFO ][c.a.o.s.c.ConfigurationRepository] [machine] .opendistro_security index does already exist, so we try to load the config from it
[2020-08-17T15:57:03,127][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [machine] 4 Open Distro Security modules loaded so far: [Module [type=REST_MANAGEMENT_API, implementing class=com.amazon.opendistroforelasticsearch.security.dlic.rest.api.OpenDistroSecurityRestApiActions], Module [type=DLSFLS, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.OpenDistroSecurityFlsDlsIndexSearcherWrapper], Module [type=AUDITLOG, implementing class=com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl], Module [type=MULTITENANCY, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.PrivilegesInterceptorImpl]]
[2020-08-17T15:57:03,130][INFO ][c.a.o.s.c.ConfigurationRepository] [machine] Background init thread started. Install default config?: false
[2020-08-17T15:57:04,056][WARN ][o.e.i.s.IndexShard ] [machine] [.tasks][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-08-17T15:57:06,429][WARN ][o.e.i.s.IndexShard ] [machine] [.kibana_-532334581_test_1][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]

peterzhuamazon · 2020-08-17T21:11:53Z

Hi @gferrette after discussing with the team, we think this issue is more related to the security repo as there are already similar issues to this one. We will transfer this issue to the security repo. Thanks

dinusX · 2020-08-17T22:59:11Z

Hi @gferrette ,
would you be able to change the log level to DEBUG and paste again the stack trace ?

gferrette · 2020-08-18T14:05:19Z

Hello @dinusX !

Follow below the stack trace with DEBUG log level:

[2020-08-18T10:52:54,720][DEBUG][o.e.c.s.MasterService ] [machine] publishing cluster state version [482]
[2020-08-18T10:52:54,711][DEBUG][o.e.i.t.Translog ] [machine] [.kibana_1298139586_usuarioteste][0] open uncommitted translog checkpoint Checkpoint{offset=55, numOps=0, generation=4, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=0, minTranslogGeneration=3, trimmedAboveSeqNo=-2}
[2020-08-18T10:52:54,722][DEBUG][o.e.i.t.Translog ] [machine] [.kibana_1298139586_usuarioteste][0] recovered local translog from checkpoint Checkpoint{offset=55, numOps=0, generation=4, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=0, minTranslogGeneration=3, trimmedAboveSeqNo=-2}
[2020-08-18T10:52:54,707][DEBUG][o.e.i.t.Translog ] [machine] [.kibana_1276559883_testesubidaversao_1][0] recovered local translog from checkpoint Checkpoint{offset=55, numOps=0, generation=7, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=0, minTranslogGeneration=5, trimmedAboveSeqNo=-2}
[2020-08-18T10:52:54,733][DEBUG][o.e.i.t.Translog ] [machine] [.kibana_1298139586_usuarioteste][0] recovered local translog from checkpoint Checkpoint{offset=55, numOps=0, generation=4, minSeqNo=-1, maxSeqNo=-1, globalCheckpoint=0, minTranslogGeneration=3, trimmedAboveSeqNo=-2}
[2020-08-18T10:52:54,736][DEBUG][o.e.c.c.PublicationTransportHandler] [machine] received diff cluster state version [482] with uuid [kPQDh3sIRgOcMbRvFdobqQ], diff size [252]
[2020-08-18T10:52:54,744][DEBUG][o.e.i.e.Engine ] [machine] [security-auditlog-2020.08.17][0] Safe commit [CommitPoint{segment[segments_4], userData[{history_uuid=6F_7duGVQfWQ09WpZ8xQVw, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=KGFfdj5NSmu7Cmu82PshqQ, translog_generation=3, translog_uuid=sE5JbcYESma2dXRitoTphw}]}], last commit [CommitPoint{segment[segments_4], userData[{history_uuid=6F_7duGVQfWQ09WpZ8xQVw, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=KGFfdj5NSmu7Cmu82PshqQ, translog_generation=3, translog_uuid=sE5JbcYESma2dXRitoTphw}]}]
[2020-08-18T10:52:54,760][DEBUG][o.e.g.PersistedClusterStateService] [machine] writing cluster state took [0ms]; wrote global metadata [false] and metadata for [0] indices and skipped [27] unchanged indices
[2020-08-18T10:52:54,762][DEBUG][o.e.i.e.Engine ] [machine] [.kibana_1276559883_testesubidaversao_1][0] Safe commit [CommitPoint{segment[segments_4], userData[{history_uuid=FAQMiebKTBiYHhXci0HgUA, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=ZBSi3AaQQKScPFa8XIreQw, translog_generation=5, translog_uuid=3Zfcvlm2QjmmJBIeNnzKeQ}]}], last commit [CommitPoint{segment[segments_4], userData[{history_uuid=FAQMiebKTBiYHhXci0HgUA, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=ZBSi3AaQQKScPFa8XIreQw, translog_generation=5, translog_uuid=3Zfcvlm2QjmmJBIeNnzKeQ}]}]
[2020-08-18T10:52:54,763][DEBUG][o.e.c.s.ClusterApplierService] [machine] processing [Publication{term=21, version=482}]: execute
[2020-08-18T10:52:54,764][DEBUG][o.e.c.s.ClusterApplierService] [machine] cluster state updated, version [482], source [Publication{term=21, version=482}]
[2020-08-18T10:52:54,764][DEBUG][o.e.c.NodeConnectionsService] [machine] connected to {machine}{PNjawAAZRj-olrAsLoq8TQ}{0VAaApceT7ylZ8IhxY-Bug}{10.0.2.191}{10.0.2.191:9300}{dim}
[2020-08-18T10:52:54,764][DEBUG][o.e.c.s.ClusterApplierService] [machine] apply cluster state with version 482
[2020-08-18T10:52:54,767][DEBUG][o.e.i.s.IndexShard ] [machine] [.tasks][0] turn off the translog retention for the replication group [.tasks][0] as it starts using retention leases exclusively in peer recoveries
[2020-08-18T10:52:54,768][DEBUG][o.e.c.s.ClusterApplierService] [machine] set locally applied cluster state to version 482
[2020-08-18T10:52:54,769][DEBUG][o.e.c.s.ClusterApplierService] [machine] processing [Publication{term=21, version=482}]: took [0s] done applying updated cluster state (version: 482, uuid: kPQDh3sIRgOcMbRvFdobqQ)
[2020-08-18T10:52:54,769][DEBUG][o.e.c.c.C.CoordinatorPublication] [machine] publication ended successfully: Publication{term=21, version=482}
[2020-08-18T10:52:54,769][DEBUG][o.e.c.s.MasterService ] [machine] took [0s] to notify listeners on successful publication of cluster state (version: 482, uuid: kPQDh3sIRgOcMbRvFdobqQ) for [cluster_reroute(async_shard_fetch)]
[2020-08-18T10:52:54,775][DEBUG][o.e.i.e.Engine ] [machine] [.kibana_1298139586_usuarioteste][0] Safe commit [CommitPoint{segment[segments_4], userData[{history_uuid=YBSLGiwlQeiK3RocGmqtwQ, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=Tmc9nBuJRaOExp3UzLc3Mg, translog_generation=3, translog_uuid=AJle9vsoRZ-HAKiYeCHd8g}]}], last commit [CommitPoint{segment[segments_4], userData[{history_uuid=YBSLGiwlQeiK3RocGmqtwQ, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=Tmc9nBuJRaOExp3UzLc3Mg, translog_generation=3, translog_uuid=AJle9vsoRZ-HAKiYeCHd8g}]}]
[2020-08-18T10:52:54,792][DEBUG][o.e.i.e.Engine ] [machine] [.tasks][0] Safe commit [CommitPoint{segment[segments_a], userData[{history_uuid=FlCD17y_Qeay5P2wxsfYOA, local_checkpoint=3, max_seq_no=3, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=4, sync_id=589jUf1WRcuWeIxbW4Ox7Q, translog_generation=23, translog_uuid=65fd-Z8BT0qPvJa45rf7Tw}]}], last commit [CommitPoint{segment[segments_a], userData[{history_uuid=FlCD17y_Qeay5P2wxsfYOA, local_checkpoint=3, max_seq_no=3, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=4, sync_id=589jUf1WRcuWeIxbW4Ox7Q, translog_generation=23, translog_uuid=65fd-Z8BT0qPvJa45rf7Tw}]}]
[2020-08-18T10:52:54,788][WARN ][o.e.i.s.IndexShard ] [machine] [.tasks][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-08-18T10:52:54,824][DEBUG][o.e.i.s.IndexShard ] [machine] [security-auditlog-2020.08.17][0] state: [RECOVERING]->[POST_RECOVERY], reason [post recovery from shard_store]
[2020-08-18T10:52:54,824][DEBUG][o.e.i.s.IndexShard ] [machine] [security-auditlog-2020.08.17][0] recovery completed from [shard_store], took [287ms]
[2020-08-18T10:52:54,825][DEBUG][o.e.c.a.s.ShardStateAction] [machine] sending [internal:cluster/shard/started] to [PNjawAAZRj-olrAsLoq8TQ] for shard entry [StartedShardEntry{shardId [[security-auditlog-2020.08.17][0]], allocationId [W-rGbGb5TMigtXbkl3t60g], primary term [3], message [after existing store recovery; bootstrap_history_uuid=false]}]
[2020-08-18T10:52:54,826][DEBUG][o.e.c.a.s.ShardStateAction] [machine] [security-auditlog-2020.08.17][0] received shard started for [StartedShardEntry{shardId [[security-auditlog-2020.08.17][0]], allocationId [W-rGbGb5TMigtXbkl3t60g], primary term [3], message [after existing store recovery; bootstrap_history_uuid=false]}]
[2020-08-18T10:52:54,827][DEBUG][o.e.c.s.MasterService ] [machine] executing cluster state update for [shard-started StartedShardEntry{shardId [[security-auditlog-2020.08.17][0]], allocationId [W-rGbGb5TMigtXbkl3t60g], primary term [3], message [after existing store recovery; bootstrap_history_uuid=false]}[StartedShardEntry{shardId [[security-auditlog-2020.08.17][0]], allocationId [W-rGbGb5TMigtXbkl3t60g], primary term [3], message [after existing store recovery; bootstrap_history_uuid=false]}]]

Thanks in advance!

dinusX · 2020-08-18T20:15:12Z

From the above logs it seems that you have an index ".tasks" that is failing during ES process boot-up.

Is this index healthy "green" ?
Is this index used by any of your plugins ? Maybe that plugin has some bug.
Can you try maybe re-indexing the index to see if it helps? You'll have to do it twice in order to keep the same name. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

gferrette · 2020-08-18T21:00:46Z

Hello @dinusX !

Thanks for replying.

This error is occurring in several indexes, not only in .tasks. The .tasks index is in green state, but i have removed this index .tasks, because it is recreated as ES needs it. The error continues on other indexes as below:

[2020-08-18T17:53:34,022][DEBUG][o.e.c.s.ClusterApplierService] [machine] processing [Publication{term=24, version=635}]: execute
[2020-08-18T17:53:34,022][DEBUG][o.e.c.s.ClusterApplierService] [machine] cluster state updated, version [635], source [Publication{term=24, version=635}]
[2020-08-18T17:53:34,022][DEBUG][o.e.c.NodeConnectionsService] [machine] connected to {machine}{PNjawAAZRj-olrAsLoq8TQ}{jOVOPUGCT0mHN2QRYj4Y1Q}{10.0.2.191}{10.0.2.191:9300}{dim}
[2020-08-18T17:53:34,022][DEBUG][o.e.c.s.ClusterApplierService] [machine] applying settings from cluster state with version 635
[2020-08-18T17:53:34,022][DEBUG][o.e.c.s.ClusterApplierService] [machine] apply cluster state with version 635
[2020-08-18T17:53:34,023][DEBUG][o.e.i.s.IndexShard ] [machine] [security-auditlog-2020.06.15][0] state: [POST_RECOVERY]->[STARTED], reason [global state is [STARTED]]
[2020-08-18T17:53:34,023][DEBUG][o.e.i.s.IndexShard ] [machine] [.kibana_-532334581_publicoredmine_2][0] turn off the translog retention for the replication group [.kibana_-532334581_publicoredmine_2][0] as it starts using retention leases exclusively in peer recoveries
[2020-08-18T17:53:34,023][DEBUG][o.e.c.a.s.ShardStateAction] [machine] sending [internal:cluster/shard/started] to [PNjawAAZRj-olrAsLoq8TQ] for shard entry [StartedShardEntry{shardId [[.kibana_-532334581_publicoredmine_2][0]], allocationId [2IgravwzSFCNRA-BY4eJBw], primary term [22], message [master {machine}{PNjawAAZRj-olrAsLoq8TQ}{jOVOPUGCT0mHN2QRYj4Y1Q}{10.0.2.191}{10.0.2.191:9300}{dim} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}]
[2020-08-18T17:53:34,024][DEBUG][o.e.c.a.s.ShardStateAction] [machine] [.kibana_-532334581_publicoredmine_2][0] received shard started for [StartedShardEntry{shardId [[.kibana_-532334581_publicoredmine_2][0]], allocationId [2IgravwzSFCNRA-BY4eJBw], primary term [22], message [master {machine}{PNjawAAZRj-olrAsLoq8TQ}{jOVOPUGCT0mHN2QRYj4Y1Q}{10.0.2.191}{10.0.2.191:9300}{dim} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}]
[2020-08-18T17:53:34,024][DEBUG][o.e.i.s.IndexShard ] [machine] [.kibana_-532334581_publicoredmine_1][0] turn off the translog retention for the replication group [.kibana_-532334581_publicoredmine_1][0] as it starts using retention leases exclusively in peer recoveries
[2020-08-18T17:53:34,024][DEBUG][o.e.c.a.s.ShardStateAction] [machine] sending [internal:cluster/shard/started] to [PNjawAAZRj-olrAsLoq8TQ] for shard entry [StartedShardEntry{shardId [[.kibana_-532334581_publicoredmine_1][0]], allocationId [-EIVcT1GQEy3cnSYVpCiGA], primary term [22], message [master {machine}{PNjawAAZRj-olrAsLoq8TQ}{jOVOPUGCT0mHN2QRYj4Y1Q}{10.0.2.191}{10.0.2.191:9300}{dim} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}]
[2020-08-18T17:53:34,024][DEBUG][o.e.c.a.s.ShardStateAction] [machine] [.kibana_-532334581_publicoredmine_1][0] received shard started for [StartedShardEntry{shardId [[.kibana_-532334581_publicoredmine_1][0]], allocationId [-EIVcT1GQEy3cnSYVpCiGA], primary term [22], message [master {machine}{PNjawAAZRj-olrAsLoq8TQ}{jOVOPUGCT0mHN2QRYj4Y1Q}{10.0.2.191}{10.0.2.191:9300}{dim} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}]
[2020-08-18T17:53:34,025][DEBUG][o.e.i.s.IndexShard ] [machine] [.kibana_92668751_admin_1][0] turn off the translog retention for the replication group [.kibana_92668751_admin_1][0] as it starts using retention leases exclusively in peer recoveries
[2020-08-18T17:53:34,025][DEBUG][o.e.c.s.ClusterApplierService] [machine] set locally applied cluster state to version 635
[2020-08-18T17:53:34,029][WARN ][o.e.i.s.IndexShard ] [machine] [.kibana_92668751_admin_1][0] failed to turn off translog retention
org.apache.lucene.store.AlreadyClosedException: engine is closed
at org.elasticsearch.index.shard.IndexShard.getEngine(IndexShard.java:2528) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard.trimTranslog(IndexShard.java:1106) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.index.shard.IndexShard$3.doRun(IndexShard.java:1944) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) [elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
[2020-08-18T17:53:34,033][DEBUG][o.e.i.e.Engine ] [machine] [.kibana_92668751_admin_1][0] Safe commit [CommitPoint{segment[segments_4], userData[{history_uuid=2dgYcvEuRpSjF2naPHfXyA, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=uSgr5MU4QH6yju-GsoF9zA, translog_generation=3, translog_uuid=kyHNsR0nQIGnBo3_O1fAYA}]}], last commit [CommitPoint{segment[segments_4], userData[{history_uuid=2dgYcvEuRpSjF2naPHfXyA, local_checkpoint=0, max_seq_no=0, max_unsafe_auto_id_timestamp=-1, min_retained_seq_no=1, sync_id=uSgr5MU4QH6yju-GsoF9zA, translog_generation=3, translog_uuid=kyHNsR0nQIGnBo3_O1fAYA}]}]

All the indexes that this error is ocurring are in green state.

dinusX · 2020-08-18T21:38:03Z

If I'm not mistaken , the following commit should fix your warning messages: elastic/elasticsearch#57063

This was fixed in ES 7.7.1+

From the description it doesn't seem to be a bug, but just an unnecessary warning message.

gferrette · 2020-08-19T11:36:23Z

Hello @dinusX !

It seems it's only a warning message according to this thread.

Thanks for your help and for clarifying our questions!

peterzhuamazon transferred this issue from opendistro-for-elasticsearch/opendistro-build Aug 17, 2020

peterzhuamazon added info requested Further information is requested question User requested information labels Aug 17, 2020

gferrette closed this as completed Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"failed to turn off translog retention" after upgrade #651

"failed to turn off translog retention" after upgrade #651

gferrette commented Aug 17, 2020

peterzhuamazon commented Aug 17, 2020

gferrette commented Aug 17, 2020

peterzhuamazon commented Aug 17, 2020

dinusX commented Aug 17, 2020

gferrette commented Aug 18, 2020

dinusX commented Aug 18, 2020

gferrette commented Aug 18, 2020

dinusX commented Aug 18, 2020

gferrette commented Aug 19, 2020

"failed to turn off translog retention" after upgrade #651

"failed to turn off translog retention" after upgrade #651

Comments

gferrette commented Aug 17, 2020

peterzhuamazon commented Aug 17, 2020

gferrette commented Aug 17, 2020

WARNING: revise all the lines below before you go into production

peterzhuamazon commented Aug 17, 2020

dinusX commented Aug 17, 2020

gferrette commented Aug 18, 2020

dinusX commented Aug 18, 2020

gferrette commented Aug 18, 2020

dinusX commented Aug 18, 2020

gferrette commented Aug 19, 2020