Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs on caching (acceleration) are inconsistent #3805

Closed
ahirner opened this issue Dec 8, 2024 · 2 comments
Closed

Docs on caching (acceleration) are inconsistent #3805

ahirner opened this issue Dec 8, 2024 · 2 comments
Assignees

Comments

@ahirner
Copy link
Contributor

ahirner commented Dec 8, 2024

There are two places that left me guessing:

  1. Refresh Interval is valid for append?

This example includes the respective option with append:

Image

This explains that append is not a valid mode for refresh cycle:

Image

Also given #3702, it seems supported.

  1. Query outside time window will use cache partially?

Here it says "This configuration will only accelerate data from the federated source that ... is less than 1 day old".

Image

Below it says "By default, accelerated datasets only return locally materialized data.". What if use_source: true and the query spans 2 days, will the federated source be queried for 1 or 2 days?

The limitation to only fall back on full, makes me wonder how to append but limit that by a time window. Is refresh_data_window: true already changing the default behavior?


In general, a more formal specification of the caching behavior would be good. Such specification could start with combinations that are valid (not ignored and maybe required) and a minimal behavior that is always true when enabled.

@epa095
Copy link

epa095 commented Dec 12, 2024

I agree that the documentation is confusing, I got confused by the same things. But by looking at the OG issue I think there is an answer to your second question:

The solution works as a shorthand of refresh sql with temporal column constraints

The documentation of refresh sql is pretty clear that Queries for data that have been filtered out will not fallback to querying the federated table., so it will not read 2-week old data from the federated store and combine it with the fast local data.

Unfortunately.

The exception is if there is no data in the last week, so the result is completely empty, and you have on_zero_results: use_source.

@peasee peasee self-assigned this Dec 16, 2024
@peasee
Copy link
Contributor

peasee commented Dec 18, 2024

Hello!

Thank you for the report on these docs. I’ve updated them to clarify which options are supported in which modes and have revised the information around refresh SQL and data windows. In particular, I focused on the behavior of on_zero_results and how it interacts with refresh SQL.

Additionally, I’ve added some scenario-based examples to demonstrate different ways these parameters can be used together.

I’ll be closing this issue now, but please re-open it if these new docs haven't hit the mark! 😄

@peasee peasee closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants