Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: What determines when a materialized_view is fully recreated #288

Closed
the4thamigo-uk opened this issue Jun 4, 2024 · 7 comments
Closed

Comments

@the4thamigo-uk
Copy link
Contributor

the4thamigo-uk commented Jun 4, 2024

I'm seeing my materialized views being completely recreated from scratch and fully bakfilled, more than necessary. What logic determines when this happens?

@the4thamigo-uk
Copy link
Contributor Author

hmm, it always seems to fully repopulate the entire table every time I run...

the4thamigo-uk pushed a commit to the4thamigo-uk/dbt-clickhouse that referenced this issue Jun 5, 2024
@the4thamigo-uk
Copy link
Contributor Author

I suggested an idea as a PR for discussion.

@tema-popov
Copy link

tema-popov commented Jun 12, 2024

I have a little bit more broad issue with materialised views which includes the topic of the question.

I use material views over very large tables, so when I try to run a new material view model in dbt, it tries to populate it by running an insert into query.

In my case, it is very large so I Clickhouse fails to run it. For some reason, I expected that it would just create an mv and target table without populating it.

The current logic is also potentially prone to handling data, which will come moments in time between the populate query and creating materialized view, as both operations are not transactional and the populate query takes some time to run.

The potential workaround for me right now is to avoid this big populate query by adding additional conditions into model SQL that would filter only "fresh data".

The overall issue looks very similar to already existing challenges in incremental models which are handled by is_incremental logic. Maybe we should use the same approach both in incremental and materialised_view models?

It would include the issue with unsupported --full-refresh logic, as it is a part of expected incremental model behavior

Maybe I would even expect the materialized view to be a special subtype of the incremental model, where "increments" are handled by created mat view instead of dbt runs.

@the4thamigo-uk
Copy link
Contributor Author

@tema-popov I also have similar concerns about missing out some data. With the --full-refresh repopulation switched off, there is still a small time gap between dropping the MV and recreating it, in which time data might be missed.

@the4thamigo-uk
Copy link
Contributor Author

Maybe https://clickhouse.com/docs/en/sql-reference/statements/alter/view should be used to alter the MV query without interrupting the ingeation process?

the4thamigo-uk added a commit to the4thamigo-uk/dbt-clickhouse that referenced this issue Jun 19, 2024
@the4thamigo-uk
Copy link
Contributor Author

@tema-popov The PR for this is now merged. Maybe you want to raise a new issue for the problem you are facing?

@tema-popov
Copy link

@the4thamigo-uk sure, it is a far broader topic, I agree it should be discussed separately.

steffen030 pushed a commit to steffen030/dbt-clickhouse that referenced this issue Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants