You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running Databricks on AWS and using dbt with dbt-databricks.
The process is quite standard, we don't have any custom macros in our case, just running incremental model that writes to delta table.
Sometimes we receive error on step show table extended in {{ relation }} like '*' and it's infrastructure error.
Databricks logs are saying Command failed because warehouse {warehouse_id} was stopped.
The problem is that DatabricksAdapter is handling this exception and returns empty list of tables and further logic decides to create or replace table ... which leads to table being overwritten and all previous partitions are lost.
Workaround: is to revert table to previous version and run model again, but this requires constant monitoring that all our models are not accidentally overwritten.
Steps To Reproduce
Run dbt model on Databricks AWS
Somehow emulate failure on show table extended in {{ relation }} like '*' step. Alternative would be to modify macros and raise exception manually:
{% macro spark__list_relations_without_caching(relation) %}
{% call statement('list_relations_without_caching', fetch_result=True) -%}
show table extended in {{ relation }} like '*'
{% do exceptions.raise_database_error("Failed retrieve tables from database.") %}
{% endcall %}
{% do return(load_result('list_relations_without_caching').table) %}
{% endmacro %}
Expected behavior
Exception should raised and allow process to fail.
Another option, retry few times to retrieve tables list and after couple of failures raise exception.
System information
The output of dbt --version:
Core:
- installed: 1.2.2
- latest: 1.4.1 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- databricks: 1.2.2 - Update available!
- spark: 1.2.0 - Update available!
The operating system you're using:
macOS Ventura
The output of python --version:
Python 3.8.12
Additional context
The problem is in line DatabricksAdapter line 135. If replace return [] with raise e for my case that would work, but I am not sure if anyone can rely on this behaviour.
As alternative, maybe create a flag in adapter config and raise exception on flag is True.
Let me know what do you think.
The text was updated successfully, but these errors were encountered:
Because of a AWS Glue issue, list releations was set to handle any exception and returning empty list of tables. The problem is that further logic decides to create or replace table, which leads to table being overwritten and all previous partitions are lost.
I believe that the problem the previous fix solves is no where near as important to Databricks, and the problem it causes is very bad.
resolves#266
Signed-off-by: Andre Furlan <[email protected]>
Because of a AWS Glue issue, list releations was set to handle any exception and returning empty list of tables. The problem is that further logic decides to create or replace table, which leads to table being overwritten and all previous partitions are lost.
I believe that the problem the previous fix solves is no where near as important to Databricks, and the problem it causes is very bad.
resolves#266
Signed-off-by: Andre Furlan <[email protected]>
Describe the bug
We are running Databricks on AWS and using
dbt
withdbt-databricks
.The process is quite standard, we don't have any custom macros in our case, just running incremental model that writes to delta table.
Sometimes we receive error on step
show table extended in {{ relation }} like '*'
and it's infrastructure error.Databricks logs are saying
Command failed because warehouse {warehouse_id} was stopped.
The problem is that DatabricksAdapter is handling this exception and returns empty list of tables and further logic decides to
create or replace table ...
which leads to table being overwritten and all previous partitions are lost.Workaround: is to revert table to previous version and run model again, but this requires constant monitoring that all our models are not accidentally overwritten.
Steps To Reproduce
show table extended in {{ relation }} like '*'
step. Alternative would be to modify macros and raise exception manually:Expected behavior
Exception should raised and allow process to fail.
Another option, retry few times to retrieve tables list and after couple of failures raise exception.
System information
The output of
dbt --version
:The operating system you're using:
macOS Ventura
The output of
python --version
:Python 3.8.12
Additional context
The problem is in line DatabricksAdapter line 135. If replace
return []
withraise e
for my case that would work, but I am not sure if anyone can rely on this behaviour.As alternative, maybe create a flag in adapter config and raise exception on flag is True.
Let me know what do you think.
The text was updated successfully, but these errors were encountered: