[Bug] Databricks SQL syntax error in marketo__lead_history #31

cdey-a16z · 2024-02-23T16:30:24Z

Is there an existing issue for this?

I have searched the existing issues

Describe the issue

I am working on implementing the Marketo package in a dbt instance using Databricks. The marketo__lead_history model is failing, and so far I have been able to pinpoint two issues.

fivetran_utils.dummy_coalesce_value macro is not working properly when applied to 'lead_status' which is a string field.
- This macro is called in the backfill cte , and is used to coalesce a dummy value, which in this case would be "DUMMY STRING", but is instead returning nothing, causing the coalesce to fail.
When correcting for the failing macro, the SQL still fails because it appears that the IGNORE NULLS statement in the window function is not recognized in Databricks.

Relevant error log or model output

dummy_coalesce_value macro (abbreviated for relevance):

{% macro dummy_coalesce_value(column) %}

{% set coalesce_value = {
 'STRING': "'DUMMY_STRING'",
 'BOOLEAN': 'null',
 'INT': 999999999,
 'FLOAT': 999999999.99,
 'TIMESTAMP': 'cast("2099-12-31" as timestamp)',
 'DATE': 'cast("2099-12-31" as date)',
} %}

{% if column.is_float() %}
{{ return(coalesce_value['FLOAT']) }}

{% elif column.is_numeric() %}
{{ return(coalesce_value['INT']) }}

{% elif column.is_string() %} -- this is the problematic statement
{{ return(coalesce_value['STRING']) }}

{% endif %}

{% endmacro %}

model output:

, backfill as (


   select
       date_day,
       lead_id   
       -- For each lead on each day, find the state of each column from the next record where a change occurred,
       -- identified by the presence of a record from the SCD table on that day
       
       , nullif(
           first_value(case when new_values_present then coalesce(lead_status, ) end ignore nulls) over (
               partition by lead_id
               order by date_day asc
               rows between current row and unbounded following),  ) -- nothing in the coalesce, causing failure
       as lead_status

syntax error:

16:12:08 Databricks adapter: <class 'databricks.sql.exc.ServerOperationError'>: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'ignore'.(line 60, pos 100)

, nullif(
            first_value(case when new_values_present then coalesce(lead_status, 'DUMMY_STRING') end ignore nulls) over (
----------------------------------------------------------------------------------------------------^^^
                partition by lead_id 
                order by date_day asc 
                rows between current row and unbounded following),  
            'DUMMY_STRING')
        as lead_status

Expected behavior

dummy_coalesce_value macro recognizes the 'lead_status' string, and inserts "DUMMY STRING" into the coalesce.

Expected Output:

nullif(
first_value(case when new_values_present then coalesce(lead_status, 'DUMMY_STRING') end ignore nulls) over (
partition by lead_id
order by date_day asc
rows between current row and unbounded following),
'DUMMY_STRING')
as lead_status

When I edit the macro from this:
{% elif column.is_string() %} -- this is the problematic statement
{{ return(coalesce_value['STRING']) }}

to this:

{% elif column.data_type|lower == 'string' %}
{{ return(coalesce_value['STRING']) }}

The dummy string is inserted properly into the coalesce. This indicates there is an issue with the column.is_string()

note: This fix only works when I edit the dummy_coalesce_value macro found in the fivetran_utils package, even though the macro also exists in the marketo package. Also, this isn't a permanent fix, because the file exists within a package, so any changes will be wiped out whenever dbt deps is run.

Once the above 'fix' is implemented, we run into a syntax error:

looks like ignore nulls is not allowed within a window function in Databricks, as far as I can tell. The following SQL may be a solution:

, COALESCE(
first_value(CASE WHEN new_values_present THEN lead_status ELSE NULL END) OVER (
partition by lead_id
order by date_day asc
rows between current row and unbounded following
),
'default_value' -- or however you wish to handle cases that would have relied on ignoring nulls
) AS lead_status

I think that it's possible that there could be a config that I may be missing here, but as you can see below, I have followed the dispatch config instructions provided with the package.

dbt Project configurations

vars:
fivetran_log:
fivetran_log_using_transformations: false
fivetran_log_using_triggers: false

fivetran_platform_database: raw # default is your target.database
fivetran_platform_schema: fivetran_log # default is fivetran_log

fivetran_platform_using_destination_membership: false # this will disable only the destination membership logic
fivetran_platform_using_user: false # this will disable only the user logic

marketo__enable_campaigns: true
marketo__enable_programs: true

marketo_database: raw
marketo_schema: marketo

dispatch:

macro_namespace: fivetran_utils
search_order: ['spark_utils', 'fivetran_utils']
macro_namespace: dbt_utils
search_order: ['spark_utils', 'fivetran_utils', 'dbt_utils']

Package versions

packages:

package: fivetran/salesforce_formula_utils
version: 0.9.3
package: dbt-labs/dbt_utils
version: 1.1.1
package: calogica/dbt_expectations
version: 0.10.3
package: dbt-labs/codegen
version: 0.12.1
package: data-mie/dbt_profiler
version: 0.8.1
package: dbt-labs/audit_helper
version: 0.10.0
package: fivetran/marketo
version: 0.10.0
package: dbt-labs/spark_utils
version: 0.3.0
package: fivetran/fivetran_log
version: 1.5.0

What database are you using dbt with?

databricks

dbt Version

dbt cloud version 1.7

Additional Context

No response

Are you willing to open a PR to help address this issue?

Yes.
Yes, but I will need assistance and will schedule time during our office hours for guidance
No.

fivetran-catfritz · 2024-02-23T23:11:58Z

Hi @cdey-a16z thanks for opening this issue! What Databricks runtime are you using? Our packages are tested on Databricks all-purpose clusters, but we are aware that there are some incompatibilities with the SQL Warehouses. If you're on a SQL Warehouse, we'll need to do a little more investigation into testing any updates.

cdey-a16z · 2024-02-27T17:20:53Z

Hi @fivetran-catfritz, thanks for getting back to me. Looks like our runtime is 13.3 LTS (see the screenshot below).

We experimented with pointing dbt to an all-purpose cluster yesterday, and the issue persisted.

Let me know if there is more helpful information I can provide.

fivetran-catfritz · 2024-02-29T18:44:47Z

Hi @cdey-a16z thank you for the info and trying that out on the all-purpose cluster. I was able to recreate the issue on our all-purpose cluster as well. While we do not have plans to address general Databricks SQL Warehouse compatibility, any issues that persist on a Databricks all-purpose cluster will definitely be addressed. We'll prioritize addressing the coalesce/ignore nulls issue in our upcoming sprint cycle, which would be approximately mid/end-March.

Thank you again for calling this to our attention!

cdey-a16z · 2024-02-29T20:29:24Z

@fivetran-catfritz sounds great I appreciate your help with this. Will I be updated here when this is fixed?

fivetran-catfritz · 2024-02-29T23:51:10Z

Yes we'll post here when the update is released!

fivetran-catfritz · 2024-03-20T21:53:30Z

Hi @cdey-a16z to update you, I have found a work-around for the issue you mention. I added this cte that removes the need for an ignore nulls statement and also makes the dummy string issue you found to be moot. If you would like to try out my updates, you can install my test branch using the below code in place of the marketo lines in your packages.yml file.

- git: https://github.com/fivetran/dbt_marketo.git
  revision: bug/databricks-syntax
  warn-unpinned: false

If you try it out, I'd love to hear how it works for you! If not, no worries, we'll proceed with release in the coming week or so and update you when it's released.

cdey-a16z · 2024-03-25T20:54:49Z

Hi @fivetran-catfritz, thanks for getting back to me. I just tested your test version, and it worked! The only thing I noticed was that I had to update field from marketo__leads called unsubscribed to is_unsubscribed. I'm not sure if that's related at all, but just wanted to call that out in case it is.

Looking forward to implementing the completed package!

fivetran-catfritz · 2024-04-01T15:03:40Z

@cdey-a16z Thank you for trying that out and confirming! Glad that it worked. We'll proceed with release in the coming days and post back here once it's live.

Regarding the is_subscribed column, I confirm we updated the source package to rename boolean columns to the format is_*. If you're interested, there are more details in the source changelog we plan to release alongside this update.

fivetran-catfritz · 2024-04-04T15:23:52Z

@cdey-a16z To update you, the new version has been released! You can install it with the below range:

packages:
  - package: fivetran/marketo
    version: [">=0.11.0", "<0.12.0"]

I am closing out this issue, but feel free to ping us here if you have any additional questions or feedback.

cdey-a16z added the bug Something isn't working label Feb 23, 2024

fivetran-catfritz added type:bug Something is broken or incorrect priority:p3 Affects many users; can wait status:accepted Scoped and accepted into queue update_type:models Primary focus requires model updates error:unforced and removed bug Something isn't working labels Feb 29, 2024

fivetran-catfritz added status:in_progress Currently being worked on and removed status:accepted Scoped and accepted into queue labels Mar 15, 2024

fivetran-catfritz self-assigned this Mar 20, 2024

fivetran-catfritz mentioned this issue Mar 20, 2024

Bug/databricks syntax #33

Merged

7 tasks

fivetran-catfritz closed this as completed Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Databricks SQL syntax error in marketo__lead_history #31

[Bug] Databricks SQL syntax error in marketo__lead_history #31

cdey-a16z commented Feb 23, 2024

fivetran-catfritz commented Feb 23, 2024

cdey-a16z commented Feb 27, 2024

fivetran-catfritz commented Feb 29, 2024 •

edited

Loading

cdey-a16z commented Feb 29, 2024

fivetran-catfritz commented Feb 29, 2024

fivetran-catfritz commented Mar 20, 2024

cdey-a16z commented Mar 25, 2024

fivetran-catfritz commented Apr 1, 2024

fivetran-catfritz commented Apr 4, 2024

[Bug] Databricks SQL syntax error in marketo__lead_history #31

[Bug] Databricks SQL syntax error in marketo__lead_history #31

Comments

cdey-a16z commented Feb 23, 2024

Is there an existing issue for this?

Describe the issue

Relevant error log or model output

Expected behavior

dbt Project configurations

Package versions

What database are you using dbt with?

dbt Version

Additional Context

Are you willing to open a PR to help address this issue?

fivetran-catfritz commented Feb 23, 2024

cdey-a16z commented Feb 27, 2024

fivetran-catfritz commented Feb 29, 2024 • edited Loading

cdey-a16z commented Feb 29, 2024

fivetran-catfritz commented Feb 29, 2024

fivetran-catfritz commented Mar 20, 2024

cdey-a16z commented Mar 25, 2024

fivetran-catfritz commented Apr 1, 2024

fivetran-catfritz commented Apr 4, 2024

fivetran-catfritz commented Feb 29, 2024 •

edited

Loading