-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store XContentType of source in translog and index #22863
Comments
Do you mean |
I was not sure either, but as far as I can see, and I hope I am wrong, we simply store the bytes array of the incoming document, whatever content type that is. the conversion from a type to another happens when returning the source. |
Correct
I think that's tricky since the idea of
Maybe we intended it to always be |
The debate about whether |
we discussed this on fix it friday and as before this boils down to knowing how much performance boost (if at all) alternative/binary JSON formats (most notably Smile) gives us. This will help motivate keeping the current accept all methodology or give us the data we need to remove it and settle on, probably, textual JSON. I will talk to @danielmitterdorfer to see when we can make this happen. |
Double checking (as remove was mentioned above) that this issue is not about removing support for any currently supported format. I'd expect the outcome of this issue to be about whether it is necessary to store in the translog and in the lucene index exactly the same bytes that were received, in any supported content type, rather than settling on a format and handling conversions internally. |
@javanna Indeed this is issue is about explicitly storing the content type of a document alongside wherever we store the original documents bytes (translog and lucene). This is needed if we are going to remove the automatic content detection but are going to keep supporting multiple formats. However, if end up removing supporting multiple formats (or converting convenience formats like YAML to JSON) we don't need this explicit information. I hope this clarifies why these are tied together. |
Things are clear on my end, yet I think I should clarify that we should not discuss again if we want to remove support for multiple formats at REST, cause that won't happen anytime soon, see #22811 . The two options here are 1) we add the type to translog and index 2) we settle on one internal format and internally convert all of the others (for bw comp we will still need some auto-detection). |
Closing for now as we are in favor to remove support for CBOR, Smile and Yaml. We will reconsider this in the future. |
To be clear, this was closed in favour of #22811. |
Given that we closed #22811 as a won't fix and this was closed in favour of it, I think we should reconsider what we need to do here. The information about the content type is still not stored in the translog and guessed when reading from the translog, using a method (XContentHelper#xContentType(BytesReference)) that we deprecated years ago. Maybe it is just a matter of embracing that we are not going to get rid of automatic type detection in some places, but it would be good then to un-deprecate the needed methods as they are going to stay around for the time being. |
Pinging @elastic/es-distributed (Team:Distributed) |
As part of #22691 (comment), more automated content type detection places were discovered and one of these is in the Translog. When we store an Index operation in the translog, we only store the bytes and do not store the xcontent type so we rely on content type detection for replaying the translog.
We can modify the translog format to store the xcontent type and use that when reading from the translog. However, we then lose the type information when storing data in the index. Should we also store the xcontent type in the index so that we can make use of it when returning responses and avoiding auto detection there as well?
The text was updated successfully, but these errors were encountered: