Fix zero value metric #96

dave-connors-3 · 2022-08-31T13:41:53Z

What is this PR?

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Closes #61

This initial approach parses expression metrics, checks for a / and wraps any divisors in the expression in the appropriate nullif( {{ dvisor }} , 0) expression. This approach solves the current issue:

Sample Metrics

version: 2

metrics:
  
  - name: total_value
    label: Total Value
    model: ref("incremental")
    type: sum
    sql: val

    dimensions:
      - state
      - customer_id
    
    filters:
      - field: state
        operator: '!='
        value: "'connecticut'"

    timestamp: report_date
    time_grains: [day, week, month]


  - name: non_division_exp
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "{{metric('total_value')}} + 1"
    dimensions:
      - state
      - customer_id

  - name: division_exp_1
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "1 /{{metric('total_value')}}"
    dimensions:
      - state
      - customer_id

  - name: division_exp_2
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "1 / {{metric('total_value') }} / 2"
    dimensions:
      - state
      - customer_id

  - name: division_exp_3
    label: Expression ($)
    timestamp: order_date
    time_grains: [day, week, month]
    type: expression
    sql: "{{metric('division_exp_1') }} / 2"
    dimensions:
      - state
      - customer_id

Compiled SQL

...
, first_join_metrics as (

    select
        date_week
        , coalesce(
            total_value__final.state
            , NULL
        ) as state
        , coalesce(
            total_value__final.customer_id
            , NULL
        ) as customer_id
    
        , total_value as total_value  

    from
        total_value__final 
    )
    
, join_metrics__998 as (

    select 
    
        first_join_metrics.*
        , (1  / nullif(total_value, 0)) as division_exp_1
    
    from first_join_metrics


)
    
, join_metrics__999 as (

    select 
    
        join_metrics__998.*
        , (total_value + 1) as non_division_exp
        , (1  / nullif( total_value , 0) / nullif( 2, 0)) as division_exp_2
        , (division_exp_1  / nullif( 2, 0)) as division_exp_3
    
    from join_metrics__998


)

, joined_metrics as (

    select 
        first_join_metrics.date_week
        , first_join_metrics.state
        , first_join_metrics.customer_id
        , coalesce(first_join_metrics.total_value,0) as total_value
        , coalesce(non_division_exp, 0) as non_division_exp
        , coalesce(division_exp_1, 0) as division_exp_1
        , coalesce(division_exp_2, 0) as division_exp_2
        , coalesce(division_exp_3, 0) as division_exp_3




...

Results:

My open question here is that this treats observed 0 value results (true zeros from my data) the same way as the filled-in 0s from our spining behavior. Is it appropriate to assume that any zero in the source metric should be evaluated by the corresponding expression metric?

What have I become

in order to enable the control over whether or not zeros or nulls show up in the results of the output, this PR has evolved to do the following

create a mechanism for the package to accept and validate config blocks from the metric definition
adds tests for the treat_null_values_as_zero behavior
- if false, no coalescing will happen in output or secondary output.
- if true, default behavior fills missing values with 0
adds tests for the config validation behavior

Checklist

I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
- BigQuery
- Postgres
- Redshift
- Snowflake
I have updated the README.md (if applicable)
I have added tests & descriptions to my models (and macros if applicable)
I have added an entry to CHANGELOG.md

Tenets to keep in mind

A metric value should be consistent everywhere that it is referenced
We prefer generalized metrics with many dimensions over specific metrics with few dimensions
It should be easier to use dbt’s metrics than it is to avoid them
Organization and discoverability are as important as precision
One-off models built to power metrics are an anti-pattern

callum-mcdata · 2022-08-31T20:28:20Z

My open question here is that this treats observed 0 value results (true zeros from my data) the same way as the filled-in 0s from our spining behavior. Is it appropriate to assume that any zero in the source metric should be evaluated by the corresponding expression metric?

I think this is the most important thing for us to clarify on this issue.

Areas of Concern

Expression Metrics: if the expression uses division at all then having 0 values can produce failing metrics
Secondary Calculations: without 0 values in previous periods, we are unable to calculate many secondary calculations

What would happen if we completely abandoned the 0 coalesce and used null instead?

Relevant slack thread: https://getdbt.slack.com/archives/C02CCBBBR1D/p1660673393965409

callum-mcdata · 2022-09-09T19:09:09Z

@dave-connors-3 crazy idea - what if we push this out into the metric definition itself? Have some sort of parameter in a config block that determines whether nulls are treated as 0's or as NULLS.

This might be the way to keep metric definition flexible while keeping results consistent.

…cs into fix-zero-value-metric

dave-connors-3 · 2022-09-22T22:43:18Z

@callum-mcdata @joellabes I believe I have addressed the comments here w.r.t. naming, passing the args the appropriate way, and validating configs (that last piece open to feedback!)

last things we need here are:

consensus on the name
consensus on the logic (callum just added a great callout -- depends on a lot of assumptions on how division will be written)

once we get there, this should be close!

joellabes · 2022-09-23T02:02:41Z

Have added my vote for name on that comment (treat_null_values_as_zero).
Consensus on the logic: I've made dbt-labs/dbt-core#5918 to see what the appetite is for turning this into structured data. Otherwise we just wait for it to become a problem.

callum-mcdata

this has the stamp of approval! Nice job pushing this over the line, this one was more of a 🐻 than expected

Each component is addressed and the open questions will be punted on

callum-mcdata · 2022-09-26T14:30:14Z

@dave-connors-3 you've done the herculean work here -all yours to hit the big green button.

…cs into fix-zero-value-metric

callum-mcdata

Looks good to me! Lets merge it

joellabes · 2022-09-26T22:56:37Z

Love your work, you two!

dave-connors-3 added 3 commits August 30, 2022 17:44

support exprssions built on division

0290ab9

smol whitespace update

8900c82

initial feature support

2b34ba6

cla-bot bot added the cla:yes The CLA has been signed label Aug 31, 2022

dave-connors-3 requested review from callum-mcdata and joellabes and removed request for callum-mcdata August 31, 2022 13:42

callum-mcdata changed the base branch from main to fixing_bad_code August 31, 2022 13:52

Base automatically changed from fixing_bad_code to main August 31, 2022 21:54

dave-connors-3 and others added 6 commits September 12, 2022 15:53

merge main

4e55289

remove coalesce behavior from secondary calcs

d23bd3b

fix one test

acf1443

Auto update table of contents

bc1bf38

update zero value integration tests

b7ea717

Merge branch 'fix-zero-value-metric' of github.com:dbt-labs/dbt_metri…

a7ba863

…cs into fix-zero-value-metric

jtcohen6 mentioned this pull request Sep 15, 2022

[CT-1183] [Feature] metrics should be configurable to show nulls for missing data after spining dbt-labs/dbt-core#5842

Closed

3 tasks

dave-connors-3 and others added 6 commits September 19, 2022 16:46

merge main

4dd1cb8

revert accidental main override

8c18251

revert tests to main

1d8aa78

Auto update table of contents

7fecf5c

fill zero behavior for metrics and secondary calcs

f80a8a7

Merge branch 'fix-zero-value-metric' of github.com:dbt-labs/dbt_metri…

d32180c

…cs into fix-zero-value-metric

callum-mcdata mentioned this pull request Sep 21, 2022

Allow specifying a timezone #56

Closed

dave-connors-3 added 4 commits September 21, 2022 17:02

reverse booleans (lol) and update tests for new behavior

00e5715

test divide by zero and default null behavior

b19e52f

pass configs to both macros the same way

b34e20c

resolve merge conflicts

2432f0d

dave-connors-3 added 4 commits September 22, 2022 17:35

reorg argument passing, validate configs, naming

6de7e54

more naming

bacc702

remove comment

c359127

Merge branch 'fix-zero-value-metric' of github.com:dbt-labs/dbt_metri…

564f379

…cs into fix-zero-value-metric

joellabes mentioned this pull request Sep 23, 2022

[CT-1230] [Feature] Derived metrics' expressions should be able to be a string or a numerator + denominator dict for division dbt-labs/dbt-core#5918

Closed

3 tasks

dave-connors-3 and others added 6 commits September 23, 2022 11:58

resolve conflicts

f6d2ace

Auto update table of contents

12d7764

rename config to treat_null_values_as_zero

156cd67

missed rename

2f5935f

metric validation tests and dummy config for develop metric dictionary

341f941

comment on divide 0 functionality

42025d6

dave-connors-3 requested review from joellabes and callum-mcdata September 23, 2022 19:08

dave-connors-3 and others added 3 commits September 23, 2022 14:34

back to jerco's original code

629931a

Merge branch 'main' into fix-zero-value-metric

e2574e3

Auto update table of contents

e7952d6

callum-mcdata previously approved these changes Sep 23, 2022

View reviewed changes

dave-connors-3 added 2 commits September 26, 2022 11:31

readme updates

4fc5152

Merge branch 'fix-zero-value-metric' of github.com:dbt-labs/dbt_metri…

9288869

…cs into fix-zero-value-metric

dave-connors-3 dismissed callum-mcdata’s stale review via 9288869 September 26, 2022 16:31

Auto update table of contents

e3754ef

callum-mcdata approved these changes Sep 26, 2022

View reviewed changes

dave-connors-3 merged commit 0d31155 into main Sep 26, 2022

dave-connors-3 deleted the fix-zero-value-metric branch September 26, 2022 16:43

Mylleranton mentioned this pull request Oct 10, 2022

Ordering the dataset returned by calculate/develop #130

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix zero value metric #96

Fix zero value metric #96

dave-connors-3 commented Aug 31, 2022 •

edited

Loading

callum-mcdata commented Aug 31, 2022

callum-mcdata commented Sep 9, 2022

dave-connors-3 commented Sep 22, 2022

joellabes commented Sep 23, 2022

callum-mcdata left a comment

callum-mcdata commented Sep 26, 2022

callum-mcdata left a comment

joellabes commented Sep 26, 2022

Fix zero value metric #96

Fix zero value metric #96

Conversation

dave-connors-3 commented Aug 31, 2022 • edited Loading

What is this PR?

Description & motivation

What have I become

Checklist

Tenets to keep in mind

callum-mcdata commented Aug 31, 2022

Areas of Concern

callum-mcdata commented Sep 9, 2022

dave-connors-3 commented Sep 22, 2022

joellabes commented Sep 23, 2022

callum-mcdata left a comment

Choose a reason for hiding this comment

callum-mcdata commented Sep 26, 2022

callum-mcdata left a comment

Choose a reason for hiding this comment

joellabes commented Sep 26, 2022

dave-connors-3 commented Aug 31, 2022 •

edited

Loading