Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1183] [Feature] metrics should be configurable to show nulls for missing data after spining #5842

Closed
3 tasks done
dave-connors-3 opened this issue Sep 14, 2022 · 6 comments
Labels
enhancement New feature or request semantic Issues related to the semantic layer

Comments

@dave-connors-3
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When using metrics in dbt, the metrics package will currently fill in any missing value after a metric has been date spined with the value 0. This has some implications for derived metrics that may divide by a zero value -- in discussing a fix for that particualr issue, and hearing some feedback from the community, we realized that we need a configurable, consistent way for using to decide the way that a metric is reported, either with null values or zeros.

A new optoinal metric attribute fill_missing_values would allow users to declare whether or not a metric returns 0 or null when there is missing data after datespining a metric.

Describe alternatives you've considered

Letting this be a option passed to the dbt_metrics.calculate() macro. This could be considered a display option rather than an attribute of a particular metric. However, @joellabes said it well:

I lean towards "it should be part of the metric spec". Allowing two different dashboards to use different processes to calculate a trailing average on the same metric feels counter to "same result, no matter what".

Who will this benefit?

metrics users new and old

Are you interested in contributing this feature?

yeah!

Anything else?

No response

@dave-connors-3 dave-connors-3 added enhancement New feature or request triage labels Sep 14, 2022
@github-actions github-actions bot changed the title [Feature] metrics should be configurable to show nulls for missing data after spining [CT-1183] [Feature] metrics should be configurable to show nulls for missing data after spining Sep 14, 2022
@dbeatty10 dbeatty10 removed the triage label Sep 14, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 15, 2022

Thanks @dave-connors-3!

Agree that this should always be consistent for all calculations of a given metric.

That said - this feels to me more like a standard configuration that could apply to several metrics, more so than an integral part of a metric's unique definition. Could we imagine adding other properties like this one, which don't define the metric per se, but configure how it should be executed (= calculated)?

Here's a thought / alternative approach:

  • We just added support for config on metrics in Add config to disable metrics/exposures #5815
  • config is already a dict that accepts arbitrary user-input values. If we want to add more on/off switches like this one in the future, we actually don't need a dbt-core PR to add a new property — if set, it just shows up in the metric node! We'd only need the PR in dbt-metrics, which can access the config as metric.get("config").get("fill_missing_values").
  • Of course, we'd still have the ability to PR dbt-core if we want to set an explicit default for that config (e.g. to plumb through the Metadata API), or to pass user input through custom validation logic at parse time.
  • If users want to override macros from the metrics package with custom implementations, which take custom configurations (like this one) as inputs — this would be a good pattern for doing it!

So in your dbt project code:

version: 2
metrics:
  - name: new_customers
    ...
    config:
      enabled: true
      fill_missing_values: true

Bonus: if this is a config, you could even set a default for all metrics in your project / a package:

# dbt_project.yml

metrics:
  my_project:
    +fill_missing_values: true

And somewhere in the dbt-metrics package (dbt-labs/dbt_metrics#96):

-- what should the default behavior be?
{% set fill_missing_values = dict_metric.get("config").get("fill_missing_values", False) %}
...
{% if fill_missing_values %}
    coalesce(...)
{% else %}
    ...
{% endif %}

@callum-mcdata @dbeatty10 Curious to get your thoughts here too. I see this as a potential pattern that could also enable fewer "migrations" to the metrics spec in the future.

Addendum: What's the difference between meta and config? They're a lot alike; both accept arbitrary key-value pairs. IMO, the distinction is twofold:

  • meta should be only be set by end users, and completely unused by dbt-core
  • meta keys should be metadata, totally optional decorators on a metric that provide additional context. They should not be tied to any functional changes in how a metric is actually calculated, how a model is materialized, etc — that's what config is for!

@joellabes
Copy link
Contributor

config is already a dict that accepts arbitrary user-input values If we want to add more on/off switches like this one in the future, we actually don't need a dbt-core PR to add a new property — if set, it just shows up in the metric node!

it WHAT.

Today I learned!

@leahwicz
Copy link
Contributor

Note: fill_missing_values might not be the most descriptive name to use. What is the missing value and fill with what?

@leahwicz leahwicz added the Refinement Maintainer input needed label Sep 15, 2022
@dave-connors-3
Copy link
Contributor Author

Totally in alignment with your thinking here @jtcohen6! This absolutely fits as a config to adjust the ergonomics of using a particular metric,more so than "an integral part of a metric's unique definition."

@leahwicz -- totally agree with your feedback here too! Do you have suggestions for better naming? was trying to balance brevity and clarity

@dave-connors-3
Copy link
Contributor Author

also I think we are clear to close this issue and #5843!

@jtcohen6
Copy link
Contributor

Closing as this can be handled solely within dbt-metrics for now

Agree with the naming suggestion! Verbose-r is better for something this subtle, and config: is nice because it can be set once (in dbt_project.yml) for all metrics

@jtcohen6 jtcohen6 removed the Refinement Maintainer input needed label Sep 16, 2022
@jtcohen6 jtcohen6 added the semantic Issues related to the semantic layer label Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request semantic Issues related to the semantic layer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants