Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffered lookahead for Jackson #489

Merged
merged 1 commit into from
Jan 18, 2023
Merged

Conversation

swallez
Copy link
Member

@swallez swallez commented Jan 17, 2023

JSON deserialization requires to look ahead in JSON objects on two occasions:

  • for polymorphic types that are distinguished by their type property (i.e. a property inside the JSON object). An example is the Property type, whose variant is defined by the inner type property.
  • for some types with variants that can be disambiguated by looking at the structure of the JSON objects. The variants do not have the same properties, and inspecting the properties allows knowing what is the actual variant that has to be deserialized. An exemple is the items of a multisearch response, that can be either errors if the error property is present and a successful result if the took, hits or a number of other properties are present.

The JSON-P API doesn't offer a way to buffer JSON events and replay them, and the initial approach was to parse the JSON stream as a JSON object (map of maps), pick the information we need, and, since the JSON stream had been consumed, serialize it back to create a parser. This is suboptimal but somehow acceptable for small objects.

However, the multisearch response can contain arbitrary large JSON objects containing index documents on which we have to perform a look ahead, and in that case the performance hit can be significant as shown in #471.

This PR introduces the LookAheadJsonParser extension to JSON-P's JsonParser that expose the look ahead functions we need in the deserizalization framework. And it provides an implementation for Jackson, which is by far the most often used JSON library, based on Jackson's TokenBuffer. This allows look ahead to consume the actual parser only until we find the information we need, and buffer the JSON stream in a data structure that can be efficiently replayed.

The previous implementation is kept as a fallback for parsers not implementing LookAheadJsonParser. It will be improved in a subsequent PR.

Fixes #471.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JsonpUtils#objectParser toString roundtrip to deserialise object uses ~30%more CPU
1 participant