Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE in DataTiersUsageTransportAction #87001

Closed
DaveCTurner opened this issue May 20, 2022 · 1 comment · Fixed by #96015
Closed

NPE in DataTiersUsageTransportAction #87001

DaveCTurner opened this issue May 20, 2022 · 1 comment · Fixed by #96015
Labels
>bug Team:Data Management Meta label for data/management team

Comments

@DaveCTurner
Copy link
Contributor

Elasticsearch Version

8.2.0 (likely others)

Installed Plugins

None

Java Version

bundled

OS Version

Cloud

Problem Description

I saw an NPE in DataTiersUsageTransportAction which I believe was because the shards whose stats were collected in the nodes stats differed from those in the cluster state, which could happen due to a concurrent shard movement.

Steps to Reproduce

Not known

Logs (if relevant)

collector [cluster_stats] failed to collect data
java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.routing.ShardRouting.state()" because the return value of "org.elasticsearch.cluster.routing.RoutingNode.getByShardId(org.elasticsearch.index.shard.ShardId)" is null
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.aggregateDataTierShardStats(DataTiersUsageTransportAction.java:210) ~[?:?]
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.classifyIndexAndCollectStats(DataTiersUsageTransportAction.java:194) ~[?:?]
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.lambda$aggregateDataTierIndexStats$4(DataTiersUsageTransportAction.java:175) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
	at java.util.stream.DistinctOps$1$2.accept(DistinctOps.java:174) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.Iterator.forEachRemaining(Iterator.java:133) ~[?:?]
	at java.util.Collections$UnmodifiableCollection$1.forEachRemaining(Collections.java:1061) ~[?:?]
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[?:?]
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.aggregateDataTierIndexStats(DataTiersUsageTransportAction.java:175) ~[?:?]
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.calculateStats(DataTiersUsageTransportAction.java:140) ~[?:?]
	at org.elasticsearch.xpack.core.DataTiersUsageTransportAction.lambda$masterOperation$0(DataTiersUsageTransportAction.java:91) ~[?:?]
	at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:162) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.client.internal.node.NodeClient$ActionResponseTaskListener.onResponse(NodeClient.java:175) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:176) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:170) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:163) ~[?:?]
	at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:245) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:473) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction.newResponseAsync(TransportNodesAction.java:181) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction.newResponse(TransportNodesAction.java:156) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.lambda$finishHim$0(TransportNodesAction.java:303) ~[elasticsearch-8.2.0.jar:8.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:714) [elasticsearch-8.2.0.jar:8.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue May 11, 2023
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue May 11, 2023
With this commit we check whether there is an available shard routing
before we test whether it has been started. This makes
`DataTiersUsageTransportAction` more resilient to potential temporary
inconsistencies between cluster state and node stats due to concurrent
shard movement.

Closes elastic#87001
Closes elastic#96000
danielmitterdorfer added a commit that referenced this issue May 11, 2023
With this commit we check whether there is an available shard routing
before we test whether it has been started. This makes
`DataTiersUsageTransportAction` more resilient to potential temporary
inconsistencies between cluster state and node stats due to concurrent
shard movement.

Closes #87001
Closes #96000
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue May 11, 2023
With this commit we check whether there is an available shard routing
before we test whether it has been started. This makes
`DataTiersUsageTransportAction` more resilient to potential temporary
inconsistencies between cluster state and node stats due to concurrent
shard movement.

Closes elastic#87001
Closes elastic#96000
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue May 11, 2023
elasticsearchmachine pushed a commit that referenced this issue May 11, 2023
With this commit we check whether there is an available shard routing
before we test whether it has been started. This makes
`DataTiersUsageTransportAction` more resilient to potential temporary
inconsistencies between cluster state and node stats due to concurrent
shard movement.

Closes #87001
Closes #96000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants