Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building caches based on the resolver's content #207

Closed
pondzix opened this issue Oct 3, 2022 · 0 comments
Closed

Allow building caches based on the resolver's content #207

pondzix opened this issue Oct 3, 2022 · 0 comments

Comments

@pondzix
Copy link
Contributor

pondzix commented Oct 3, 2022

Current state

Client essentially consists of 2 steps:

  1. Lookup schema (or list of schemas) - by resolver.
  2. Validate data against resolved schema - by validator.

Some applications (e.g. enrich) use whole package - steps 1) + 2) in order to validate entities of events. There are also applications (e.g. rdb-loader or bigquery-loader) which do not need to validate data. Only step 1) - schema lookup - is the feature used by those components in order to build desired model. Usually it means converting resolved JSON schemas to proper schema-ddl model.

Potential improvements

It's all about caching. There is already cache for lookup in a resolver. In general, JSON schemas returned by resolver are often used by downstream components to perform some costly operation: like data validation or building schema-ddl models. And we want to cache it as well, by building different caches on top of the one from resolver.

Timestamps in resolver

At the moment resolver returns:

  • in schema lookup - for given schema key -> schema's content
  • in listing schemas - for given combination of vendor, name and model -> matching list of schema keys

Internally, those values before being returned by resolver are cached. It is possible to configure TTL, what means all stored items can eventually expire. If another cache is built on top of the one from resolver, we need a way to inform new cache about any changes in the original resolver's cache content.

Returning timestamp, indicating a moment when value was cached, alongside actual content would do the job. If content in resolver's cache expires:

  • Fresh one is fetched from Iglu registry and stored with a new timestamp.
  • New timestamp could trigger recalculation when used as a part of a key in downstream cache.

Cache for validator

We can take advantage of changes in resolver and create correlated cache in validator. Currently, for every validation in CirceValidator, Json (circe) schema instance is converted to JsonSchema instance from com.networknt library. This operation may be expensive from CPU point of view, so let's cache it!

The structure of a new cache could look like:

  • Key - (schemaKey, timestamp)
  • Value - schema evaluation result Either[ValidatorError.InvalidSchema, JsonSchema]

The goal is also to make it backward compatible, therefore separate client and new methods in resolver, mimicking the originals but returning the new data type (with timestamp).

See related caching changes in rdb-loader snowplow/snowplow-rdb-loader#1086

lmath added a commit that referenced this issue Oct 25, 2022
All I did was take the code from

#208

branch off master, and fix the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 25, 2022
All I did was take the code from

#208

branch off master, and fix the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 25, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 25, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 26, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 31, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Oct 31, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Nov 15, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Nov 18, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Nov 18, 2022
I just took the code from

#208

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
lmath added a commit that referenced this issue Nov 21, 2022
I just took the code from

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
Co-authored-by: Leigh-Anne Mathieson <[email protected]>
lmath added a commit that referenced this issue Nov 21, 2022
I just took the code from

then branched off master, and fixed the compilation errors

Co-authored-by: Ian Streeter <[email protected]>
Co-authored-by:  Piotr Poniedziałek <[email protected]>
Co-authored-by: Leigh-Anne Mathieson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant