Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force on cluster config option #328

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benjamin-awd
Copy link

Summary

Closes #327

This PR allows users to create relations on a cluster, regardless of their engine or materialization type.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@CLAassistant
Copy link

CLAassistant commented Jul 28, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

@emirkmo emirkmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Imo this is a good solution that makes this macro usable in many different contexts, including in pre/post-hooks.

Big question: What should happen if force is set but there is no cluster in config? Error or should can_on_cluster be set to None as it would be currently.

Nit: calling the parameter force instead of force_on_cluster is just as clear and a bit shorter.

Nice cleanup up the logic in the if X then X else '' statements. I cannot see any side effects.

Not sure about the extent of required tests. Great to have them. Wonder if you need to test the behavior of the various if scenarios within get_on_cluster with and without the force argument, to catch any future breakages. (Setting force=False shouldn't affect can_on_cluster if it can on cluster according to the base logic in get_on_cluster)

Note I am just a community member.

table and incremental materializations with non-replicated engine will not be affected by `cluster` setting (model would be created on the connected node only).
By default, tables and incremental materializations with non-replicated engines will not be affected by the `cluster` setting (model would be created on the connected node only).

To force relations to be created on a cluster regardless of their engine or materialization, use the `force_on_cluster` argument:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
To force relations to be created on a cluster regardless of their engine or materialization, use the `force_on_cluster` argument:
To force relations to be created on a cluster regardless of their engine or materialization, use the `force` argument:

The macro is called on_cluster, so a standard force argument makes sense IMO. no need to call it force_on_cluster. Follows typical pattern of adding a force or -f flag.

README.md Show resolved Hide resolved
dbt/adapters/clickhouse/impl.py Show resolved Hide resolved
dbt/adapters/clickhouse/relation.py Outdated Show resolved Hide resolved
@@ -118,14 +118,13 @@ def create_from(
if schema == relation_config.source_name and relation_config.database:
schema = relation_config.database

if relation_config.config.get("force_on_cluster") in ("True", "true"):
can_on_cluster = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if cluster is not set in config but force is enabled? What should happen? Should we error or set can_on_cluster to False?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's unnecessary to error out if the cluster isn't set, we can just keep the original behavior by letting the can_on_cluster variable default to None, which should have no impact downstream.

I've added an additional gate to prevent strange behaviour where there is no cluster var, but force_on_cluster is somehow enabled

@benjamin-awd
Copy link
Author

@emirkmo thanks for the review 😄

What should happen if force is set but there is no cluster in config? Error or should can_on_cluster be set to None as it would be currently.

Addressed here #328 (comment)

Not sure about the extent of required tests. Great to have them. Wonder if you need to test the behavior of the various if scenarios within get_on_cluster with and without the force argument, to catch any future breakages. (Setting force=False shouldn't affect can_on_cluster if it can on cluster according to the base logic in get_on_cluster)

If the force flag isn't set, the else condition should trigger and the default behavior should occur -- IMO it's not necessary to have a test for this

Nit: calling the parameter force instead of force_on_cluster is just as clear and a bit shorter.

In this case, my preference is to be more explicit with the naming. As a user that knows nothing about the adapter, if I see

{{ config(
        engine='Null',
        materialized='materialized_view',
        force='true'

in the config, it will not be apparent that this in fact refers to the ON CLUSTER clause. Otherwise, if cluster was a dictionary like {'name': 'cluster1', 'force': true} I would be open to the idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table with Null engine not created on cluster
3 participants