Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node exits with org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler when using HashSet in scripted metric #54708

Closed
lucabelluccini opened this issue Apr 3, 2020 · 4 comments · Fixed by #54769
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache

Comments

@lucabelluccini
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version): 7.6.1

Description of the problem including expected versus actual behavior:

I'm trying to run a scripted metric which makes use of HashSet in painless.
I expect to obtain an answer or an error to the request.

Instead, the node stops.

This seems to be related to the fact Elasticsearch doesn't know how to serialize an HashSet (java.io.IOException: can not write type [class java.util.HashSet]).
The problem doesn't affect just the "reduce" phase (when serializing the final response) but also in the intermediate steps.

Seems related to #54666, feel free to close this.

Steps to reproduce:

  1. Create the sample data:
POST index2/_doc/1
{
  "host": { "name": "host1" },
  "software": "sw1"
}
POST index2/_doc/11
{
  "host": { "name": "host1" },
  "software": "sw2"
}
POST index2/_doc/111
{
  "host": { "name": "host1" },
  "software": "sw3"
}
POST index2/_doc/2
{
  "host": { "name": "host2" },
  "software": "sw2"
}
POST index2/_doc/3
{
  "host": { "name": "host3" },
  "software": "sw3"
}
  1. Run the query:
POST index2/_search
{
  "size": 0, 
  "aggs": {
    "softwares": {
      "scripted_metric": {
        "init_script": "state.sw = new HashSet();",
        "map_script": "state.sw.add(doc['software.keyword'])",
        "combine_script": "def merged = new HashSet(); for (s in state.sw) { merged.addAll(s) } return merged",
        "reduce_script": "def merged = new HashSet(); for (s in states) { merged.addAll(s) } return merged.stream().collect(Collectors.toList());"
      }
    }
  }
}

Provide logs (if relevant):

org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler
[instance-0000000002] fatal error in thread [elasticsearch[instance-0000000002][search][T#3]], exiting
java.lang.AssertionError: Could not serialize response
	at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$19(IndicesService.java:1340) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$20(IndicesService.java:1392) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:174) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:157) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:123) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1398) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1332) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:336) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:358) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:343) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:146) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: java.io.IOException: can not write type [class java.util.HashSet]
	at org.elasticsearch.common.io.stream.StreamOutput.writeGenericValue(StreamOutput.java:801) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.aggregations.metrics.InternalScriptedMetric.doWriteTo(InternalScriptedMetric.java:67) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.aggregations.InternalAggregation.writeTo(InternalAggregation.java:119) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteable(StreamOutput.java:1025) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.common.io.stream.StreamOutput.writeNamedWriteableList(StreamOutput.java:1134) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.aggregations.InternalAggregations.writeTo(InternalAggregations.java:87) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.search.query.QuerySearchResult.writeToNoId(QuerySearchResult.java:328) ~[elasticsearch-7.6.1.jar:7.6.1]
	at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$19(IndicesService.java:1337) ~[elasticsearch-7.6.1.jar:7.6.1]
	... 21 more

Workaround:

Use a list and remove the duplicates only on the last step.

POST index2/_search
{
  "size": 0, 
  "aggs": {
    "softwares": {
      "scripted_metric": {
        "init_script": "state.sw = [];",
        "map_script": "state.sw.add(doc['software.keyword'])",
        "combine_script": "def merged = []; for (s in state.sw) { merged.addAll(s) } return merged",
        "reduce_script": "def merged = new HashSet(); for (s in states) { merged.addAll(s) } return merged.asList()"
      }
    }
  }
}
@lucabelluccini lucabelluccini added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Apr 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 4, 2020
This commit adds support for reading and writing sets as generic values
in stream input and output.

closes elastic#54708
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 8, 2020
When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.

This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.

relates elastic#54708
@polyfractal
Copy link
Contributor

polyfractal commented Apr 13, 2020

Side note for posterity: this is related to #54665, and fixed in #54692

Not saying the stream infra shouldn't be tweaked either, just wanted to leave some breadcrumbs to related issues :)

@imotov
Copy link
Contributor

imotov commented Apr 13, 2020

and fixed in #54692

It was patched in #54692 in general and #54936 should improve user experience and error messages in aggregations specifically.

@rjernst
Copy link
Member

rjernst commented Apr 13, 2020

Just for clarity #54769 fixes the support for HashSet being serializable. The other issues mentioned stop a node from crashing in an exceptional case (#54692) and give an upfront error to a user who runs a script that returns a non-serializable object (#54936).

rjernst added a commit that referenced this issue Apr 13, 2020
This commit adds support for reading and writing sets as generic values
in stream input and output.

closes #54708
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 13, 2020
This commit adds support for reading and writing sets as generic values
in stream input and output.

closes elastic#54708
rjernst added a commit that referenced this issue Apr 16, 2020
This commit adds support for reading and writing sets as generic values
in stream input and output.

closes #54708
rjernst added a commit that referenced this issue Apr 21, 2020
)

When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.

This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.

relates #54708
rjernst added a commit to rjernst/elasticsearch that referenced this issue Apr 21, 2020
…stic#54936)

When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.

This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.

relates elastic#54708
rjernst added a commit that referenced this issue Apr 28, 2020
) (#55561)

When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.

This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.

relates #54708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants