NullPointerException on snapshot #29052

gkozyryatskyy · 2018-03-14T09:26:05Z

Elasticsearch version: 5.6.8 (Docker image docker.elastic.co/elasticsearch/elasticsearch:5.6.8)

Plugins installed:

ingest-geoip
ingest-user-agent
repository-s3
x-pack

JVM version (java -version):

openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux b5f28c65ef45 4.9.60-linuxkit-aufs #1 SMP Mon Nov 6 16:00:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I have an s3 snapshot repo

curl localhost:9200/_snapshot/*?pretty
{
  "weavo-backup" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "...",
      "region" : "us-east-1",
      "base_path" : "dev/weavo-backup"
    }
  }
}

When im trying to snapshot im getting java.lang.NullPointerException

curl -XPUT 'localhost:9200/_snapshot/weavo-backup/snapshot_1?wait_for_completion=true'
{
  "error":{
    "root_cause":[
      {
        "type":"null_pointer_exception",
        "reason":null
      }
    ],
    "type":"null_pointer_exception",
    "reason":null
  },
  "status":500
}

When im trying to delete snapshot im getting java.lang.NullPointerException

curl -XDELETE localhost:9200/_snapshot/weavo-backup/curator-20180224000000?pretty
{
  "error" : {
    "root_cause" : [
      {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    ],
    "type" : "null_pointer_exception",
    "reason" : null
  },
  "status" : 500
}

Provide logs (if relevant):
Snapshot logs error

[2018-03-14T08:17:29,967][INFO ][o.e.s.SnapshotShardsService] [iatu_s1] snapshot [weavo-backup:snapshot_1/ueYAa0XASNu1RK_2i2rTZQ] is done
[2018-03-14T08:17:30,086][WARN ][o.e.s.SnapshotsService   ] [iatu_s1] [weavo-backup:snapshot_1/ueYAa0XASNu1RK_2i2rTZQ] failed to finalize snapshot
java.lang.NullPointerException: null
	at org.elasticsearch.repositories.RepositoryData.snapshotsToXContent(RepositoryData.java:357) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeIndexGen(BlobStoreRepository.java:838) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.finalizeSnapshot(BlobStoreRepository.java:568) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:978) [elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:575) [elasticsearch-5.6.8.jar:5.6.8]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2018-03-14T08:17:30,088][WARN ][r.suppressed             ] path: /_snapshot/weavo-backup/snapshot_1, params: {repository=weavo-backup, wait_for_completion=true, snapshot=snapshot_1}
java.lang.NullPointerException: null
	at org.elasticsearch.repositories.RepositoryData.snapshotsToXContent(RepositoryData.java:357) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeIndexGen(BlobStoreRepository.java:838) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.finalizeSnapshot(BlobStoreRepository.java:568) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.snapshots.SnapshotsService$5.run(SnapshotsService.java:978) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:575) ~[elasticsearch-5.6.8.jar:5.6.8]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

Delete logs error

[2018-03-14T01:00:02,957][WARN ][r.suppressed             ] path: /_snapshot/weavo-backup/curator-20180224000000, params: {repository=weavo-backup, snapshot=curator-20180224000000}
java.lang.NullPointerException: null
	at org.elasticsearch.repositories.RepositoryData.snapshotsToXContent(RepositoryData.java:357) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.writeIndexGen(BlobStoreRepository.java:838) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.deleteSnapshot(BlobStoreRepository.java:445) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.snapshots.SnapshotsService.lambda$deleteSnapshotFromRepository$6(SnapshotsService.java:1309) ~[elasticsearch-5.6.8.jar:5.6.8]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:575) ~[elasticsearch-5.6.8.jar:5.6.8]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-03-14T09:39:31Z

Pinging @elastic/es-distributed

bleskes · 2018-03-14T13:57:55Z

@tlrx have you seen this before?

gkozyryatskyy · 2018-03-14T14:49:44Z

If this helps, all the snapshots, Im trying to remove, done on elasticsearch 5.4.0 version.

Also I have 4 different environments with same elasticsearch version and configs, but this problem I experienced just on single environment... So I think the problem is in specific snapshot...

imotov · 2018-03-14T16:08:42Z

It reminds me of #26127, but it was fixed in 5.6.0 and in 26127 it was failing because failure reason was null, but it looks like it fails here because snapshotId is null. So, it is probably a different issue.

tlrx · 2018-03-14T16:17:39Z

@gkozyryatskyy Would it be possible that, at a given time, two environments accessed to the same repository to write a snapshot?

gkozyryatskyy · 2018-03-14T16:33:01Z

@tlrx theoretically, yes.. this is dev environment =( Someone can up few environments with one snapshot configs...

tlrx · 2018-03-15T08:22:43Z

@gkozyryatskyy I've seen similar issue when two different clusters are accessing the same S3 repository (more exactly, the same S3 bucket): one environment is creating a snapshot while another environment is deleting a snapshot. This is a quite rare situation, as the creation and deletion must be executed exactly at the same time, but it can still happen, specially when there's a lot of indices/documents involves in the snapshot.

gkozyryatskyy · 2018-03-15T16:25:28Z

@tlrx
Thank you a lot for your responses!

Is there any way to understand that it is my case?
How can I do a "hotfix" for this? Should I delete all the snapshots and create new one? Or I can delete some specific one? Or I can cleanup something from s3 bucket?
It any case, I think will be nice, to make some fix in the code for this to not cause NullPointer.. But it is on you.. =)

tlrx · 2018-03-15T19:57:41Z

Is there any way to understand that it is my case?

That's not easy - do you think that your case is similar to the situation I explained in #29052 (comment)? If so, they we have an explanation.

How can I do a "hotfix" for this? Should I delete all the snapshots and create new one? Or I can delete some specific one? Or I can cleanup something from s3 bucket?

I think that the best fix would be to create a new repository, in a different S3 bucket (or a sub path of the same bucket, see base_path option), and have a single cluster that can write snapshots to it and the other clusters have this repository with the read_only option.

It any case, I think will be nice, to make some fix in the code for this to not cause NullPointer.. But it is on you.. =)

I'd love to have a fix for S3 and concurrent access :) But S3 is not a filesystem: it's a replicated, distributed, consistent-after-write blob storage system. We can't really implements locks or atomic writes with it, so we cannot have any strong guarantees except that an uploaded file will appear (after some undefined time) in the S3 bucket. If you need strong guarantees then you should consider using a real filesystem.

gkozyryatskyy · 2018-03-16T09:37:17Z

@tlrx
Thank you a lot for your responses!

If this info helps you, Im able to delete/snapshot to this repo with elasticsearch 5.4.0. So it is not just S3 problem... Theoretically, snapshot logic can be reverted to 5.4.0 version and will work.

So for now, Im thinking to delete everything with older db version and start creating new snapshots with new db version... Changing the bucket or base path is not an option right now, because it is bound to environment name and will cause renaming/changing the environment just because of db snapshots...

gkozyryatskyy · 2018-03-16T10:52:36Z

@tlrx
Ok, here is what I did:

I create temp FS repo and backup there
I manually delete everything from backup bucket with backup path prefix
I run new, first snapshot in same repo/bucker/path prefix and it was succeeded.

A NullPointerException is thrown when trying to create or delete a snapshot in a repository that has been written to by an older Elasticsearch after writing to it with a newer Elasticsearch version. This is because the way snapshots are formatted in the repository snapshots index file changed in #24477. This commit changes the parsing of the repository index file so that it now detects a corrupted index file and fails early the snapshot operation. closes #29052

gkozyryatskyy changed the title ~~NullPointerException on snaapshot~~ NullPointerException on snapshot Mar 14, 2018

colings86 added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 14, 2018

bleskes assigned tlrx Mar 14, 2018

tlrx mentioned this issue Apr 24, 2018

Creating a snapshot fails at the very last stage with NullPointerException #29649

Closed

tlrx closed this as completed in 63148dd Apr 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException on snapshot #29052

NullPointerException on snapshot #29052

gkozyryatskyy commented Mar 14, 2018

elasticmachine commented Mar 14, 2018

bleskes commented Mar 14, 2018

gkozyryatskyy commented Mar 14, 2018

imotov commented Mar 14, 2018

tlrx commented Mar 14, 2018

gkozyryatskyy commented Mar 14, 2018 •

edited

Loading

tlrx commented Mar 15, 2018

gkozyryatskyy commented Mar 15, 2018

tlrx commented Mar 15, 2018

gkozyryatskyy commented Mar 16, 2018

gkozyryatskyy commented Mar 16, 2018 •

edited

Loading

NullPointerException on snapshot #29052

NullPointerException on snapshot #29052

Comments

gkozyryatskyy commented Mar 14, 2018

elasticmachine commented Mar 14, 2018

bleskes commented Mar 14, 2018

gkozyryatskyy commented Mar 14, 2018

imotov commented Mar 14, 2018

tlrx commented Mar 14, 2018

gkozyryatskyy commented Mar 14, 2018 • edited Loading

tlrx commented Mar 15, 2018

gkozyryatskyy commented Mar 15, 2018

tlrx commented Mar 15, 2018

gkozyryatskyy commented Mar 16, 2018

gkozyryatskyy commented Mar 16, 2018 • edited Loading

gkozyryatskyy commented Mar 14, 2018 •

edited

Loading

gkozyryatskyy commented Mar 16, 2018 •

edited

Loading