Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zero value metric #96

Merged
merged 36 commits into from
Sep 26, 2022
Merged

Fix zero value metric #96

merged 36 commits into from
Sep 26, 2022

Conversation

dave-connors-3
Copy link
Contributor

@dave-connors-3 dave-connors-3 commented Aug 31, 2022

What is this PR?

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Closes #61

This initial approach parses expression metrics, checks for a / and wraps any divisors in the expression in the appropriate nullif( {{ dvisor }} , 0) expression. This approach solves the current issue:

Sample Metrics
version: 2

metrics:
  
  - name: total_value
    label: Total Value
    model: ref("incremental")
    type: sum
    sql: val

    dimensions:
      - state
      - customer_id
    
    filters:
      - field: state
        operator: '!='
        value: "'connecticut'"

    timestamp: report_date
    time_grains: [day, week, month]


  - name: non_division_exp
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "{{metric('total_value')}} + 1"
    dimensions:
      - state
      - customer_id

  - name: division_exp_1
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "1 /{{metric('total_value')}}"
    dimensions:
      - state
      - customer_id

  - name: division_exp_2
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "1 / {{metric('total_value') }} / 2"
    dimensions:
      - state
      - customer_id

  - name: division_exp_3
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "{{metric('division_exp_1') }} / 2"
    dimensions:
      - state
      - customer_id

Compiled SQL
...
, first_join_metrics as (

    select
        date_week
        , coalesce(
            total_value__final.state
            , NULL
        ) as state
        , coalesce(
            total_value__final.customer_id
            , NULL
        ) as customer_id
    
        , total_value as total_value  

    from
        total_value__final 
    )
    
, join_metrics__998 as (

    select 
    
        first_join_metrics.*
        , (1  / nullif(total_value, 0)) as division_exp_1
    
    from first_join_metrics


)
    
, join_metrics__999 as (

    select 
    
        join_metrics__998.*
        , (total_value + 1) as non_division_exp
        , (1  / nullif( total_value , 0) / nullif( 2, 0)) as division_exp_2
        , (division_exp_1  / nullif( 2, 0)) as division_exp_3
    
    from join_metrics__998


)

, joined_metrics as (

    select 
        first_join_metrics.date_week
        , first_join_metrics.state
        , first_join_metrics.customer_id
        , coalesce(first_join_metrics.total_value,0) as total_value
        , coalesce(non_division_exp, 0) as non_division_exp
        , coalesce(division_exp_1, 0) as division_exp_1
        , coalesce(division_exp_2, 0) as division_exp_2
        , coalesce(division_exp_3, 0) as division_exp_3




...

Results:

image

My open question here is that this treats observed 0 value results (true zeros from my data) the same way as the filled-in 0s from our spining behavior. Is it appropriate to assume that any zero in the source metric should be evaluated by the corresponding expression metric?

What have I become

in order to enable the control over whether or not zeros or nulls show up in the results of the output, this PR has evolved to do the following

  • create a mechanism for the package to accept and validate config blocks from the metric definition
  • adds tests for the treat_null_values_as_zero behavior
    • if false, no coalescing will happen in output or secondary output.
    • if true, default behavior fills missing values with 0
  • adds tests for the config validation behavior

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

Tenets to keep in mind

  • A metric value should be consistent everywhere that it is referenced
  • We prefer generalized metrics with many dimensions over specific metrics with few dimensions
  • It should be easier to use dbt’s metrics than it is to avoid them
  • Organization and discoverability are as important as precision
  • One-off models built to power metrics are an anti-pattern

@cla-bot cla-bot bot added the cla:yes The CLA has been signed label Aug 31, 2022
@dave-connors-3 dave-connors-3 requested review from callum-mcdata and joellabes and removed request for callum-mcdata August 31, 2022 13:42
@callum-mcdata callum-mcdata changed the base branch from main to fixing_bad_code August 31, 2022 13:52
@callum-mcdata
Copy link
Contributor

My open question here is that this treats observed 0 value results (true zeros from my data) the same way as the filled-in 0s from our spining behavior. Is it appropriate to assume that any zero in the source metric should be evaluated by the corresponding expression metric?

I think this is the most important thing for us to clarify on this issue.

Areas of Concern

  • Expression Metrics: if the expression uses division at all then having 0 values can produce failing metrics
  • Secondary Calculations: without 0 values in previous periods, we are unable to calculate many secondary calculations

What would happen if we completely abandoned the 0 coalesce and used null instead?

Relevant slack thread: https://getdbt.slack.com/archives/C02CCBBBR1D/p1660673393965409

Base automatically changed from fixing_bad_code to main August 31, 2022 21:54
@callum-mcdata
Copy link
Contributor

@dave-connors-3 crazy idea - what if we push this out into the metric definition itself? Have some sort of parameter in a config block that determines whether nulls are treated as 0's or as NULLS.

This might be the way to keep metric definition flexible while keeping results consistent.

@dave-connors-3
Copy link
Contributor Author

@callum-mcdata @joellabes I believe I have addressed the comments here w.r.t. naming, passing the args the appropriate way, and validating configs (that last piece open to feedback!)

last things we need here are:

  • consensus on the name
  • consensus on the logic (callum just added a great callout -- depends on a lot of assumptions on how division will be written)

once we get there, this should be close!

@joellabes
Copy link
Contributor

Have added my vote for name on that comment (treat_null_values_as_zero).
Consensus on the logic: I've made dbt-labs/dbt-core#5918 to see what the appetite is for turning this into structured data. Otherwise we just wait for it to become a problem.

callum-mcdata
callum-mcdata previously approved these changes Sep 23, 2022
Copy link
Contributor

@callum-mcdata callum-mcdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has the stamp of approval! Nice job pushing this over the line, this one was more of a 🐻 than expected

@callum-mcdata callum-mcdata dismissed joellabes’s stale review September 23, 2022 20:54

Each component is addressed and the open questions will be punted on

@callum-mcdata
Copy link
Contributor

@dave-connors-3 you've done the herculean work here -all yours to hit the big green button.

Copy link
Contributor

@callum-mcdata callum-mcdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Lets merge it

@dave-connors-3 dave-connors-3 merged commit 0d31155 into main Sep 26, 2022
@dave-connors-3 dave-connors-3 deleted the fix-zero-value-metric branch September 26, 2022 16:43
@joellabes
Copy link
Contributor

Love your work, you two!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes The CLA has been signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expression Calculations On 0's Return Nulls Incorrectly
3 participants