-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate and Fix Serialization Issue with IngestStats #52339
Comments
Pinging @elastic/es-core-features (:Core/Features/Ingest) |
The bug here is that elasticsearch/server/src/main/java/org/elasticsearch/ingest/Processor.java Lines 49 to 56 in de6f132
This explains the over-decrementing, and thus the negative values. |
To be clear about what I mean here, line 51 succeeds, |
+1 - I am also seeing this after an upgrade from 7.5.2 to 7.6.0. Although all my data nodes do show up on Stack Monitoring, there are zero metrics for any of them, and the node count does not include the data nodes. I was directed to this issue at the AMA booth of Elastic{ON} Anaheim. Please let me know if there is any information I can provide from my cluster to help. |
Thank you, @MakoWish. We're working on a fix for it and we'll ping you if additional information would be helpful. |
Reopening this issue, as we still see the failures in stats on negative processors number in 7.6.1 where the PR #52543 was merged |
See also the additional information that @srikwit provided on another instance of this bug here: #62087 (comment) |
Another case happening on 7.10.
|
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates #52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates elastic#52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates elastic#52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates #52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates #52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates elastic#52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
There was an obvious race here where async processor and final pipeline will run concurrently (or the final pipeline runs multiple times in from the while loop). relates #52339 (fixes one failure scenario here but since the failure also occurred in 7.10.x not all of them)
This should be fixed by #69818 and has not been reported since. |
We have a pretty detailed report about ingest stats not serializing properly in https://discuss.elastic.co/t/netty4tcpchannel-negative-longs-unsupported-repeated-in-logs/219235/6
What it comes down to is that the number of currently executing processors is a negative value somehow and doesn't serialise because of it (+ it should't be negative obviously) :
This seems to be caused by a pipeline throwing:
I didn't investigate the deeper cause here but I'm assuming on error there's too many
dec
calls toorg.elasticsearch.ingest.IngestMetric#ingestCurrent
by some path.The text was updated successfully, but these errors were encountered: